annotate MoinMoin/parser/_creole.py @ 5829:a744f573fe30

advancedsearch: fix problem with non-ascii chars breaking up words for OR and NOT terms
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Sat, 04 Feb 2012 16:52:24 +0100
parents 6c0e03fcc067
children
rev   line source
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
2 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
3 Creole wiki markup parser
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
4
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
5 See http://wikicreole.org/ for latest specs.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
6
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
7 Notes:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
8 * No markup allowed in headings.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
9 Creole 1.0 does not require us to support this.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
10 * No markup allowed in table headings.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
11 Creole 1.0 does not require us to support this.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
12 * No (non-bracketed) generic url recognition: this is "mission impossible"
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
13 except if you want to risk lots of false positives. Only known protocols
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
14 are recognized.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
15 * We do not allow ":" before "//" italic markup to avoid urls with
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
16 unrecognized schemes (like wtf://server/path) triggering italic rendering
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
17 for the rest of the paragraph.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
18
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
19 @copyright: 2007 MoinMoin:RadomirDopieralski (creole 0.5 implementation),
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
20 2007 MoinMoin:ThomasWaldmann (updates)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
21 @license: GNU GPL, see COPYING for details.
4718
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
22 @license: BSD, see COPYING for details.
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
23 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
24
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
25 import re
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
26 import sys
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
27
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
28 __version__ = '1.1'
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
29
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
30
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
31 class Rules:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
32 """Hold all the rules for generating regular expressions."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
33
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
34 # For the inline elements:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
35 proto = r'http|https|ftp|nntp|news|mailto|telnet|file|irc'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
36 link = r'''(?P<link>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
37 \[\[
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
38 (?P<link_target>.+?) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
39 ([|] \s* (?P<link_text>.+?) \s*)?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
40 ]]
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
41 )'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
42 image = r'''(?P<image>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
43 {{
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
44 (?P<image_target>.+?) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
45 ([|] \s* (?P<image_text>.+?) \s*)?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
46 }}
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
47 )'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
48 macro = r'''(?P<macro>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
49 <<
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
50 (?P<macro_name> \w+)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
51 (\( (?P<macro_args> .*?) \))? \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
52 ([|] \s* (?P<macro_text> .+?) \s* )?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
53 >>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
54 )'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
55 code = r'(?P<code> {{{ (?P<code_text>.*?) }}} )'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
56 emph = r'(?P<emph> (?<!:)// )' # there must be no : in front of the //
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
57 # avoids italic rendering in urls with
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
58 # unknown protocols
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
59 strong = r'(?P<strong> \*\* )'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
60 linebreak = r'(?P<break> \\\\ )'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
61 escape = r'(?P<escape> ~ (?P<escaped_char>\S) )'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
62 char = r'(?P<char> . )'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
63
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
64 # For the block elements:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
65 separator = r'(?P<separator> ^ \s* ---- \s* $ )' # horizontal line
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
66 line = r'(?P<line> ^ \s* $ )' # empty line that separates paragraphs
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
67 head = r'''(?P<head>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
68 ^ \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
69 (?P<head_head>=+) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
70 (?P<head_text> .*? ) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
71 (?P<head_tail>=*) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
72 $
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
73 )'''
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
74 text = r'(?P<text> .+ )'
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
75 list = r'''(?P<list>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
76 ^ [ \t]* ([*][^*\#]|[\#][^\#*]).* $
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
77 ( \n[ \t]* [*\#]+.* $ )*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
78 )''' # Matches the whole list, separate items are parsed later. The
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
79 # list *must* start with a single bullet.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
80 item = r'''(?P<item>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
81 ^ \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
82 (?P<item_head> [\#*]+) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
83 (?P<item_text> .*?)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
84 $
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
85 )''' # Matches single list items
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
86 pre = r'''(?P<pre>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
87 ^{{{ \s* $
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
88 (\n)?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
89 (?P<pre_text>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
90 ([\#]!(?P<pre_kind>\w*?)(\s+.*)?$)?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
91 (.|\n)+?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
92 )
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
93 (\n)?
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
94 ^}}} \s*$
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
95 )'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
96 pre_escape = r' ^(?P<indent>\s*) ~ (?P<rest> \}\}\} \s*) $'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
97 table = r'''(?P<table>
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
98 ^ \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
99 [|].*? \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
100 [|]? \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
101 $
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
102 )'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
103
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
104 # For splitting table cells:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
105 cell = r'''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
106 \| \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
107 (
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
108 (?P<head> [=][^|]+ ) |
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
109 (?P<cell> ( %s | [^|])+ )
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
110 ) \s*
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
111 ''' % '|'.join([link, macro, image, code])
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
112
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
113 def __init__(self, bloglike_lines=False, url_protocols=None,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
114 wiki_words=False):
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
115 c = re.compile
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
116 # For pre escaping, in creole 1.0 done with ~:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
117 self.pre_escape_re = c(self.pre_escape, re.M | re.X)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
118 # for link descriptions
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
119 self.link_re = c('|'.join([self.image, self.linebreak,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
120 self.char]), re.X | re.U)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
121 # for list items
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
122 self.item_re = c(self.item, re.X | re.U | re.M)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
123 # for table cells
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
124 self.cell_re = c(self.cell, re.X | re.U)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
125
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
126 # For block elements:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
127 if bloglike_lines:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
128 self.text = r'(?P<text> .+ ) (?P<break> (?<!\\)$\n(?!\s*$) )?'
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
129 self.block_re = c('|'.join([self.line, self.head, self.separator,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
130 self.pre, self.list, self.table,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
131 self.text]), re.X | re.U | re.M)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
132
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
133 # For inline elements:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
134 if url_protocols is not None:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
135 self.proto = '|'.join(re.escape(p) for p in url_protocols)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
136 self.url = r'''(?P<url>
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
137 (^ | (?<=\s | [.,:;!?()/=]))
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
138 (?P<escaped_url>~)?
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
139 (?P<url_target> (?P<url_proto> %s ):\S+? )
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
140 ($ | (?=\s | [,.:;!?()] (\s | $))))''' % self.proto
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
141 inline_elements = [self.link, self.url, self.macro,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
142 self.code, self.image, self.strong,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
143 self.emph, self.linebreak,
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
144 self.escape, self.char]
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
145 if wiki_words:
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
146 import unicodedata
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
147 up_case = u''.join(unichr(i) for i in xrange(sys.maxunicode)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
148 if unicodedata.category(unichr(i))=='Lu')
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
149 self.wiki = ur'''(?P<wiki>[%s]\w+[%s]\w+)''' % (up_case, up_case)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
150 inline_elements.insert(3, self.wiki)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
151 self.inline_re = c('|'.join(inline_elements), re.X | re.U)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
152
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
153 class Parser:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
154 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
155 Parse the raw text and create a document object
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
156 that can be converted into output using Emitter.
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
157
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
158 A separate instance should be created for parsing a new document.
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
159 The first parameter is the raw text to be parsed. An optional second
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
160 argument is the Rules object to use. You can customize the parsing
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
161 rules to enable optional features or extend the parser.
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
162 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
163
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
164 def __init__(self, raw, rules=None):
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
165 self.rules = rules or Rules()
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
166 self.raw = raw
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
167 self.root = DocNode('document', None)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
168 self.cur = self.root # The most recent document node
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
169 self.text = None # The node to add inline characters to
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
170
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
171 def _upto(self, node, kinds):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
172 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
173 Look up the tree to the first occurence
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
174 of one of the listed kinds of nodes or root.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
175 Start at the node node.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
176 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
177 while node.parent is not None and not node.kind in kinds:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
178 node = node.parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
179 return node
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
180
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
181 # The _*_repl methods called for matches in regexps. Sometimes the
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
182 # same method needs several names, because of group names in regexps.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
183
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
184 def _url_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
185 """Handle raw urls in text."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
186
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
187 if not groups.get('escaped_url'):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
188 # this url is NOT escaped
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
189 target = groups.get('url_target', '')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
190 node = DocNode('link', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
191 node.content = target
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
192 DocNode('text', node, node.content)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
193 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
194 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
195 # this url is escaped, we render it as text
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
196 if self.text is None:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
197 self.text = DocNode('text', self.cur, u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
198 self.text.content += groups.get('url_target')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
199 _url_target_repl = _url_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
200 _url_proto_repl = _url_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
201 _escaped_url = _url_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
202
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
203 def _link_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
204 """Handle all kinds of links."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
205
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
206 target = groups.get('link_target', '')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
207 text = (groups.get('link_text', '') or '').strip()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
208 parent = self.cur
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
209 self.cur = DocNode('link', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
210 self.cur.content = target
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
211 self.text = None
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
212 re.sub(self.rules.link_re, self._replace, text)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
213 self.cur = parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
214 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
215 _link_target_repl = _link_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
216 _link_text_repl = _link_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
217
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
218 def _wiki_repl(self, groups):
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
219 """Handle WikiWord links, if enabled."""
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
220
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
221 text = groups.get('wiki', '')
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
222 node = DocNode('link', self.cur)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
223 node.content = text
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
224 DocNode('text', node, node.content)
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
225 self.text = None
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
226
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
227 def _macro_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
228 """Handles macros using the placeholder syntax."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
229
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
230 name = groups.get('macro_name', '')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
231 text = (groups.get('macro_text', '') or '').strip()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
232 node = DocNode('macro', self.cur, name)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
233 node.args = groups.get('macro_args', '') or ''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
234 DocNode('text', node, text or name)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
235 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
236 _macro_name_repl = _macro_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
237 _macro_args_repl = _macro_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
238 _macro_text_repl = _macro_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
239
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
240 def _image_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
241 """Handles images and attachemnts included in the page."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
242
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
243 target = groups.get('image_target', '').strip()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
244 text = (groups.get('image_text', '') or '').strip()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
245 node = DocNode("image", self.cur, target)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
246 DocNode('text', node, text or node.content)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
247 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
248 _image_target_repl = _image_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
249 _image_text_repl = _image_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
250
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
251 def _separator_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
252 self.cur = self._upto(self.cur, ('document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
253 DocNode('separator', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
254
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
255 def _item_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
256 bullet = groups.get('item_head', u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
257 text = groups.get('item_text', u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
258 if bullet[-1] == '#':
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
259 kind = 'number_list'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
260 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
261 kind = 'bullet_list'
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
262 level = len(bullet)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
263 lst = self.cur
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
264 # Find a list of the same kind and level up the tree
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
265 while (lst and
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
266 not (lst.kind in ('number_list', 'bullet_list') and
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
267 lst.level == level) and
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
268 not lst.kind in ('document', 'section', 'blockquote')):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
269 lst = lst.parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
270 if lst and lst.kind == kind:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
271 self.cur = lst
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
272 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
273 # Create a new level of list
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
274 self.cur = self._upto(self.cur,
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
275 ('list_item', 'document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
276 self.cur = DocNode(kind, self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
277 self.cur.level = level
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
278 self.cur = DocNode('list_item', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
279 self.parse_inline(text)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
280 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
281 _item_text_repl = _item_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
282 _item_head_repl = _item_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
283
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
284 def _list_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
285 text = groups.get('list', u'')
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
286 self.rules.item_re.sub(self._replace, text)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
287
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
288 def _head_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
289 self.cur = self._upto(self.cur, ('document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
290 node = DocNode('header', self.cur, groups.get('head_text', '').strip())
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
291 node.level = len(groups.get('head_head', ' '))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
292 _head_head_repl = _head_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
293 _head_text_repl = _head_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
294
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
295 def _text_repl(self, groups):
4718
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
296 text = groups.get('text', '')
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
297 if self.cur.kind in ('table', 'table_row', 'bullet_list',
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
298 'number_list'):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
299 self.cur = self._upto(self.cur,
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
300 ('document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
301 if self.cur.kind in ('document', 'section', 'blockquote'):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
302 self.cur = DocNode('paragraph', self.cur)
4718
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
303 else:
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
304 text = u' ' + text
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
305 self.parse_inline(text)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
306 if groups.get('break') and self.cur.kind in ('paragraph',
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
307 'emphasis', 'strong', 'code'):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
308 DocNode('break', self.cur, '')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
309 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
310 _break_repl = _text_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
311
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
312 def _table_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
313 row = groups.get('table', '|').strip()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
314 self.cur = self._upto(self.cur, (
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
315 'table', 'document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
316 if self.cur.kind != 'table':
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
317 self.cur = DocNode('table', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
318 tb = self.cur
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
319 tr = DocNode('table_row', tb)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
320
4718
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
321 text = ''
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
322 for m in self.rules.cell_re.finditer(row):
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
323 cell = m.group('cell')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
324 if cell:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
325 self.cur = DocNode('table_cell', tr)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
326 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
327 self.parse_inline(cell)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
328 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
329 cell = m.group('head')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
330 self.cur = DocNode('table_head', tr)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
331 self.text = DocNode('text', self.cur, u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
332 self.text.content = cell.strip('=')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
333 self.cur = tb
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
334 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
335
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
336 def _pre_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
337 self.cur = self._upto(self.cur, ('document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
338 kind = groups.get('pre_kind', None)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
339 text = groups.get('pre_text', u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
340 def remove_tilde(m):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
341 return m.group('indent') + m.group('rest')
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
342 text = self.rules.pre_escape_re.sub(remove_tilde, text)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
343 node = DocNode('preformatted', self.cur, text)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
344 node.sect = kind or ''
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
345 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
346 _pre_text_repl = _pre_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
347 _pre_head_repl = _pre_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
348 _pre_kind_repl = _pre_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
349
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
350 def _line_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
351 self.cur = self._upto(self.cur, ('document', 'section', 'blockquote'))
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
352
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
353 def _code_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
354 DocNode('code', self.cur, groups.get('code_text', u'').strip())
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
355 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
356 _code_text_repl = _code_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
357 _code_head_repl = _code_repl
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
358
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
359 def _emph_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
360 if self.cur.kind != 'emphasis':
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
361 self.cur = DocNode('emphasis', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
362 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
363 self.cur = self._upto(self.cur, ('emphasis', )).parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
364 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
365
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
366 def _strong_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
367 if self.cur.kind != 'strong':
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
368 self.cur = DocNode('strong', self.cur)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
369 else:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
370 self.cur = self._upto(self.cur, ('strong', )).parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
371 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
372
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
373 def _break_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
374 DocNode('break', self.cur, None)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
375 self.text = None
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
376
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
377 def _escape_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
378 if self.text is None:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
379 self.text = DocNode('text', self.cur, u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
380 self.text.content += groups.get('escaped_char', u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
381
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
382 def _char_repl(self, groups):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
383 if self.text is None:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
384 self.text = DocNode('text', self.cur, u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
385 self.text.content += groups.get('char', u'')
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
386
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
387 def _replace(self, match):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
388 """Invoke appropriate _*_repl method. Called for every matched group."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
389
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
390 groups = match.groupdict()
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
391 for name, text in groups.iteritems():
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
392 if text is not None:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
393 replace = getattr(self, '_%s_repl' % name)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
394 replace(groups)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
395 return
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
396
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
397 def parse_inline(self, raw):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
398 """Recognize inline elements inside blocks."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
399
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
400 re.sub(self.rules.inline_re, self._replace, raw)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
401
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
402 def parse_block(self, raw):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
403 """Recognize block elements."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
404
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
405 re.sub(self.rules.block_re, self._replace, raw)
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
406
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
407 def parse(self):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
408 """Parse the text given as self.raw and return DOM tree."""
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
409
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
410 self.parse_block(self.raw)
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
411 return self.root
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
412
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
413 #################### Helper classes
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
414
5440
6c0e03fcc067 update the wikicreole parser to version 1.1
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 4718
diff changeset
415 ### The document model
2889
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
416
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
417 class DocNode:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
418 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
419 A node in the document.
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
420 """
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
421
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
422 def __init__(self, kind='', parent=None, content=None):
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
423 self.children = []
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
424 self.parent = parent
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
425 self.kind = kind
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
426 self.content = content
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
427 if self.parent is not None:
87881ad9ab3b Creole parser: split into true parser and moin glue code
Radomir Dopieralski <moindev@sheep.art.pl>
parents:
diff changeset
428 self.parent.children.append(self)
4718
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
429
634ed3db8df4 creole parser: Add second license: BSD, fix spaces in multiline elements.
Radomir Dopieralski <moindev@sheep.art.pl>
parents: 3211
diff changeset
430