annotate MoinMoin/parser/text_docbook.py @ 5474:a2128aa8b830

merge moin/1.7
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Mon, 01 Feb 2010 01:45:22 +0100
parents 44d1cd70e74c
children ff39884957af
rev   line source
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
2 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
3 MoinMoin - DocBook-XML Parser
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
4
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
5 This code was tested with 4Suite 1.0a4 and 1.0b1
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
6
1918
bb2e053067fb fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
7 @copyright: 2005 Henry Ho <henryho167 AT hotmail DOT com>,
bb2e053067fb fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
8 2005 MoinMoin:AlexanderSchremmer
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
9 @license: GNU GPL, see COPYING for details.
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
10
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
11 DOCBOOK Parser:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
12
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
13 Features:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
14 - image support through Attachment
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
15 - internal Wikilinks if a word is a strict wikiname
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
16 - image alt is perserved
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
17 - works with compiled xslt stylesheet for optimized performance
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
18
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
19 Configuration:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
20 - make sure you have installed the DocBook XSLT files
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
21 - set the path to the html directory of the DocBook XSLT files in your
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
22 wiki or farm configuration:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
23 docbook_html_dir = r"/usr/share/xml/docbook/stylesheet/nwalsh/html/"
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
24 Note that this directory needs to be writable because a cache file will
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
25 be created there.
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
26
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
27 >How can I use Ft API for DTD validation?
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
28 If you have PyXMl installed, you can use ValidatingReader rather than
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
29 NonvalidatingReader. See:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
30 http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
31 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
32
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
33 import os.path
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
34 import cPickle
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
35 import re
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
36
1791
6dd2e29acffe Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 657
diff changeset
37 from MoinMoin import Page
657
016a8a3ef354 wikiutil.MimeType class, renamed parsers to mimetype like module names
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 0
diff changeset
38 from MoinMoin.parser.text_xslt import Parser as XsltParser
016a8a3ef354 wikiutil.MimeType class, renamed parsers to mimetype like module names
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 0
diff changeset
39 from MoinMoin.parser.text_moin_wiki import Parser as WikiParser
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
40
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
41 Dependencies = []
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
42
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
43 class Parser(XsltParser):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
44 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
45 Send XML file formatted via XSLT.
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
46 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
47
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
48 caching = 1
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
49 Dependencies = Dependencies
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
50
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
51 def __init__(self, raw, request, **kw):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
52 XsltParser.__init__(self, raw, request)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
53
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
54 # relative path to docbook.xsl and compiled_xsl
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
55 docbook_html_directory = request.cfg.docbook_html_dir
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
56 self.db_xsl = os.path.join(docbook_html_directory, 'docbook.xsl')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
57 self.db_compiled_xsl = os.path.join(docbook_html_directory, 'db_compiled.dat')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
58
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
59 self.wikiParser = WikiParser(raw=self.raw, request=self.request, pretty_url=1)
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
60 self.key = 'docbook'
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
61
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
62 def format(self, formatter):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
63 self.wikiParser.formatter = formatter
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
64 XsltParser.format(self, formatter)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
65
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
66 def append_stylesheet(self):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
67 """"
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
68 virtual function, for docbook parser
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
69 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
70 abs_db_xsl = os.path.abspath(self.db_xsl)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
71 abs_db_compiled_xsl = os.path.abspath(self.db_compiled_xsl)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
72
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
73 # same as path.exists, but also test if it is a file
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
74 if not os.path.isfile(abs_db_compiled_xsl):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
75 _compile_xsl(abs_db_xsl, abs_db_compiled_xsl)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
76
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
77 assert os.path.isfile(abs_db_compiled_xsl)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
78
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
79 self.processor.appendStylesheetInstance(cPickle.load(file(abs_db_compiled_xsl, 'rb')))
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
80
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
81 def parse_result(self, result):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
82 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
83 additional parsing to the resulting XSLT'ed result (resultString) before saving
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
84
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
85 will do:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
86 BASIC CLEAN UP : remove unnecessary HTML tags
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
87 RESOLVE IMG SRC : fix src to find attachment
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
88 RESOLVE WikiNames: if a word is a valid wikiname & a valid wikipage,
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
89 replace word with hyperlink
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
90 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
91
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
92 # BASIC CLEAN UP
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
93 # remove from beginning until end of body tag
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
94 found = re.search('<body.*?>', result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
95 if found:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
96 result = result[found.end():]
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
97
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
98 # remove everything after & including </body>
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
99 found = result.rfind('</body>')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
100 if found != -1:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
101 result = result[:found]
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
102
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
103 # RESOLVE IMG SRC
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
104 found = re.finditer('<img.*?>', result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
105 if found:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
106 splitResult = _splitResult(found, result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
107 for index in range(len(splitResult)):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
108 if splitResult[index].startswith('<img'):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
109 found = re.search('src="(?P<source>.*?)"', splitResult[index])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
110 imageSrc = found.group('source')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
111 imageAlt = None # save alt
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
112 found = re.search('alt="(?P<alt>.*?)"', splitResult[index])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
113 if found:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
114 imageAlt = found.group('alt')
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
115 splitResult[index] = self.wikiParser.attachment(('attachment:' + imageSrc, ""))
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
116 if imageAlt: # restore alt
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
117 splitResult[index] = re.sub('alt=".*?"', 'alt="%s"' % imageAlt, splitResult[index])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
118
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
119 result = ''.join(splitResult)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
120
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
121
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
122 # RESOLVE WikiNames
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
123 # if a word is a valid wikiname & a valid wikipage,
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
124 # replace word with hyperlink
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
125
3558
44d1cd70e74c fix usage of WikiParser.word_rule (use re.VERBOSE) - fixes xapian indexing of WikiWords (index "WikiWords", "Wiki" and "Words"), fixes detection of WikiWords for the docbook parser
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2286
diff changeset
126 found = re.finditer(self.wikiParser.word_rule, result, re.UNICODE|re.VERBOSE)
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
127 if found:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
128 splitResult = _splitResult(found, result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
129
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
130 for index in range(len(splitResult)):
3558
44d1cd70e74c fix usage of WikiParser.word_rule (use re.VERBOSE) - fixes xapian indexing of WikiWords (index "WikiWords", "Wiki" and "Words"), fixes detection of WikiWords for the docbook parser
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2286
diff changeset
131 if (re.match(self.wikiParser.word_rule, splitResult[index], re.UNICODE|re.VERBOSE)
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
132 and Page.Page(self.request, splitResult[index]).exists()):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
133 splitResult[index] = self.wikiParser._word_repl(splitResult[index])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
134 result = ''.join(splitResult)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
135
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
136 # remove stuff that fail HTML 4.01 Strict verification
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
137
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
138 # remove unsupported attributes
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
139 result = re.sub(' target=".*?"| type=".*?"', '', result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
140 result = re.sub('<hr .*?>', '<hr>', result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
141
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
142 # remove <p>...</p> inside <a>...</a> or <caption>...</caption>
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
143 found = re.finditer('<a href=".*?</a>|<caption>.*?</caption>', result) # XXX re.DOTALL)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
144 if found:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
145 splitResult = _splitResult(found, result)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
146 for index in range(len(splitResult)):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
147 if (splitResult[index].startswith('<a href="')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
148 or splitResult[index].startswith('<caption>')):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
149 splitResult[index] = splitResult[index].replace('<p>', '').replace('</p>', '')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
150 result = ''.join(splitResult)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
151
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
152 return result
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
153
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
154
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
155
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
156 def _compile_xsl(XSLT_FILE, XSLT_COMPILED_FILE):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
157 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
158 compiling docbook stylesheet
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
159
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
160 reference: http://155.210.85.193:8010/ccia/nodes/2005-03-18/compileXslt?xslt=/akara/akara.xslt
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
161 """
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
162 from Ft.Xml.Xslt.Processor import Processor
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
163 from Ft.Xml.Xslt import Stylesheet
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
164 from Ft.Xml import InputSource
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
165 from Ft.Lib import Uri
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
166
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
167 # New docbook processor
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
168 db_processor = Processor()
0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
169
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
170 # Docbook Stylesheet
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
171 my_sheet_uri = Uri.OsPathToUri(XSLT_FILE, 1)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
172 sty_isrc = InputSource.DefaultFactory.fromUri(my_sheet_uri)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
173
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
174 # Append Stylesheet
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
175 db_processor.appendStylesheet(sty_isrc)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
176
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
177 # Pickled stylesheet will be self.abs_db_compiled_xsl file
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
178 db_root = db_processor.stylesheet.root
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
179 fw = file(XSLT_COMPILED_FILE, 'wb')
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
180 cPickle.dump(db_root, fw) # , protocol=2)
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
181 fw.close()
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
182
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
183
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
184 def _splitResult(iterator, result):
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
185 startpos = 0
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
186 splitResult = []
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
187
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
188 for f in iterator:
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
189 start, end = f.span()
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
190 splitResult.append(result[startpos:start])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
191 splitResult.append(result[start:end])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
192 startpos = end
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
193 splitResult.append(result[startpos:])
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
194
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
195 return splitResult
77665d8e2254 tag of nonpublic@localhost--archive/moin--enterprise--1.5--base-0
Thomas Waldmann <tw-public@gmx.de>
parents:
diff changeset
196