annotate MoinMoin/search/queryparser.py @ 2952:1bb689d96ade

search: fix breaking queryparser tests
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Fri, 30 Nov 2007 14:16:41 +0100
parents 01f05e74aa9c
children c8a3731ce61c
rev   line source
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
2 """
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
3 MoinMoin - search query parser
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
4
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
5 @copyright: 2005 MoinMoin:FlorianFesti,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
6 2005 MoinMoin:NirSoffer,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
7 2005 MoinMoin:AlexanderSchremmer,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
8 2006 MoinMoin:ThomasWaldmann,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
9 2006 MoinMoin:FranzPletz
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
10 @license: GNU GPL, see COPYING for details
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
11 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
12
1180
c1c0d7f6f2b4 remove string module usage, some other cosmetic changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 926
diff changeset
13 import re
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
14 from MoinMoin import config, wikiutil
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
15 from MoinMoin.search.results import Match, TitleMatch, TextMatch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
16
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
17 try:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
18 from MoinMoin.search import Xapian
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
19 from MoinMoin.search.Xapian import Query, UnicodeQuery
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
20 except ImportError:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
21 pass
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
22
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
23 #############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
24 ### query objects
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
25 #############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
26
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
27 class BaseExpression:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
28 """ Base class for all search terms """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
29
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
30 def __init__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
31 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
32
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
33 def __str__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
34 return unicode(self).encode(config.charset, 'replace')
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
35
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
36 def negate(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
37 """ Negate the result of this term """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
38 self.negated = 1
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
39
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
40 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
41 """ Return a page filtering function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
42
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
43 This function is used to filter page list before we search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
44 it. Return a function that get a page name, and return bool.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
45
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
46 The default expression does not have any filter function and
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
47 return None. Sub class may define custom filter functions.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
48 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
49 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
50
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
51 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
52 """ Search a page
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
53
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
54 Returns a list of Match objects or None if term didn't find
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
55 anything (vice versa if negate() was called). Terms containing
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
56 other terms must call this method to aggregate the results.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
57 This Base class returns True (Match()) if not negated.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
58 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
59 if self.negated:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
60 return [Match()]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
61 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
62 return None
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
63
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
64 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
65 """ Return estimated time to calculate this term
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
66
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
67 Number is relative to other terms and has no real unit.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
68 It allows to do the fast searches first.
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
69 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
70 return 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
71
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
72 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
73 """ Return a regular expression of what the term searches for
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
74
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
75 Used to display the needle in the page.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
76 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
77 return ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
78
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
79 def _build_re(self, pattern, use_re=False, case=False, stemmed=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
80 """ Make a regular expression out of a text pattern """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
81 flags = case and re.U or (re.I | re.U)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
82 if use_re:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
83 try:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
84 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
85 except re.error:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
86 pattern = re.escape(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
87 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
88 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
89 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
90 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
91 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
92 pattern = re.escape(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
93 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
94 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
95
1447
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
96 def xapian_wanted(self):
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
97 return False
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
98
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
99
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
100 class AndExpression(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
101 """ A term connecting several sub terms with a logical AND """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
102
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
103 operator = ' '
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
104
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
105 def __init__(self, *terms):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
106 self._subterms = list(terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
107 self._costs = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
108 for t in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
109 self._costs += t.costs()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
110 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
111
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
112 def append(self, expression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
113 """ Append another term """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
114 self._subterms.append(expression)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
115 self._costs += expression.costs()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
116
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
117 def subterms(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
118 return self._subterms
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
119
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
120 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
121 return self._costs
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
122
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
123 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
124 result = ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
125 for t in self._subterms:
2952
1bb689d96ade search: fix breaking queryparser tests
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2286
diff changeset
126 result += self.operator + unicode(t)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
127 return u'[' + result[len(self.operator):] + u']'
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
128
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
129 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
130 """ Return a page filtering function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
131
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
132 This function is used to filter page list before we search it.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
133
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
134 Return a function that gets a page name, and return bool, or None.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
135 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
136 # Sort terms by cost, then get all title searches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
137 self.sortByCost()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
138 terms = [term for term in self._subterms if isinstance(term, TitleSearch)]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
139 if terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
140 # Create and return a filter function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
141 def filter(name):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
142 """ A function that return True if all terms filter name """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
143 for term in terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
144 filter = term.pageFilter()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
145 if not filter(name):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
146 return False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
147 return True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
148 return filter
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
149
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
150 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
151
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
152 def sortByCost(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
153 tmp = [(term.costs(), term) for term in self._subterms]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
154 tmp.sort()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
155 self._subterms = [item[1] for item in tmp]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
156
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
157 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
158 """ Search for each term, cheap searches first """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
159 self.sortByCost()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
160 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
161 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
162 result = term.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
163 if not result:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
164 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
165 matches.extend(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
166 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
167
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
168 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
169 result = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
170 for s in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
171 highlight_re = s.highlight_re()
1988
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
172 if highlight_re:
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
173 result.append(highlight_re)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
174
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
175 return '|'.join(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
176
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
177 def xapian_wanted(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
178 wanted = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
179 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
180 wanted = wanted and term.xapian_wanted()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
181 return wanted
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
182
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
183 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
184 for term in self._subterms:
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
185 if term.xapian_need_postproc():
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
186 return True
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
187 return False
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
188
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
189 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
190 # sort negated terms
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
191 terms = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
192 not_terms = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
193 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
194 if not term.negated:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
195 terms.append(term.xapian_term(request, allterms))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
196 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
197 not_terms.append(term.xapian_term(request, allterms))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
198
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
199 # prepare query for not negated terms
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
200 if len(terms) == 1:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
201 t1 = Query(terms[0])
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
202 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
203 t1 = Query(Query.OP_AND, terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
204
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
205 # negated terms?
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
206 if not not_terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
207 # no, just return query for not negated terms
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
208 return t1
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
209
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
210 # yes, link not negated and negated terms' query with a AND_NOT query
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
211 if len(not_terms) == 1:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
212 t2 = Query(not_terms[0])
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
213 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
214 t2 = Query(Query.OP_OR, not_terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
215
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
216 return Query(Query.OP_AND_NOT, t1, t2)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
217
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
218
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
219 class OrExpression(AndExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
220 """ A term connecting several sub terms with a logical OR """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
221
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
222 operator = ' or '
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
223
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
224 def search(self, page):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
225 """ Search page with terms
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
226
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
227 @param page: the page instance
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
228 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
229
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
230 # XXX Do we have any reason to sort here? we are not breaking out
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
231 # of the search in any case.
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
232 #self.sortByCost()
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
233 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
234 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
235 result = term.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
236 if result:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
237 matches.extend(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
238 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
239
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
240 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
241 # XXX: negated terms managed by _moinSearch?
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
242 return Query(Query.OP_OR, [term.xapian_term(request, allterms) for term in self._subterms])
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
243
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
244
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
245 class TextSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
246 """ A term that does a normal text search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
247
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
248 Both page content and the page title are searched, using an
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
249 additional TitleSearch term.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
250 """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
251
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
252 def __init__(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
253 """ Init a text search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
254
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
255 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
256 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
257 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
258 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
259 self._pattern = unicode(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
260 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
261 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
262 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
263 self._build_re(self._pattern, use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
264 self.titlesearch = TitleSearch(self._pattern, use_re=use_re, case=case)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
265
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
266 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
267 return 10000
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
268
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
269 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
270 neg = self.negated and '-' or ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
271 return u'%s"%s"' % (neg, unicode(self._pattern))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
272
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
273 def highlight_re(self):
1469
af6c7a0fcd27 Backed out changeset 6dfca61f2672122ce12804ae42b90278df5c8473
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1461
diff changeset
274 return u"(%s)" % self._pattern
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
275
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
276 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
277 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
278
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
279 # Search in page name
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
280 if self.titlesearch:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
281 results = self.titlesearch.search(page)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
282 if results:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
283 matches.extend(results)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
284
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
285 # Search in page body
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
286 body = page.get_raw_body()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
287 for match in self.search_re.finditer(body):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
288 if page.request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
289 # somewhere in regular word
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
290 if body[match.start()] not in config.chars_upper and \
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
291 body[match.start()-1] in config.chars_lower:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
292 continue
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
293
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
294 post = 0
1502
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
295 # XXX only do this for stemmd words. how?
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
296 #for c in body[match.end():]:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
297 # if c in config.chars_lower:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
298 # post += 1
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
299 # else:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
300 # break
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
301
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
302 matches.append(TextMatch(start=match.start(),
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
303 end=match.end()+post))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
304 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
305 matches.append(TextMatch(re_match=match))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
306
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
307 # Decide what to do with the results.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
308 if ((self.negated and matches) or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
309 (not self.negated and not matches)):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
310 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
311 elif matches:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
312 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
313 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
314 return []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
315
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
316 def xapian_wanted(self):
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
317 # XXX: Add option for term-based matching
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
318 return not self.use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
319
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
320 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
321 return self.case
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
322
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
323 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
324 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
325 # basic regex matching per term
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
326 terms = [term for term in allterms() if self.search_re.match(term)]
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
327 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
328 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
329 queries = [Query(Query.OP_OR, terms)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
330 else:
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
331 analyzer = Xapian.WikiAnalyzer(request=request, language=request.cfg.language_default)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
332 terms = self._pattern.split()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
333
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
334 # all parsed wikiwords, AND'ed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
335 queries = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
336 stemmed = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
337 for t in terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
338 if request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
339 # stemmed OR not stemmed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
340 tmp = []
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
341 for w, s, pos in analyzer.tokenize(t, flat_stemming=False):
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
342 tmp.append(UnicodeQuery(Query.OP_OR, (w, s)))
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
343 stemmed.append(s)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
344 t = tmp
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
345 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
346 # just not stemmed
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
347 t = [UnicodeQuery(w) for w, pos in analyzer.tokenize(t)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
348 queries.append(Query(Query.OP_AND, t))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
349
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
350 if not self.case and stemmed:
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
351 new_pat = ' '.join(stemmed)
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
352 self._pattern = new_pat
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
353 self._build_re(new_pat, use_re=False, case=self.case, stemmed=True)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
354
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
355 # titlesearch OR parsed wikiwords
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
356 return Query(Query.OP_OR,
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
357 (self.titlesearch.xapian_term(request, allterms),
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
358 Query(Query.OP_AND, queries)))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
359
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
360 class TitleSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
361 """ Term searches in pattern in page title only """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
362
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
363 def __init__(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
364 """ Init a title search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
365
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
366 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
367 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
368 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
369 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
370 self._pattern = unicode(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
371 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
372 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
373 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
374 self._build_re(self._pattern, use_re=use_re, case=case)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
375
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
376 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
377 return 100
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
378
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
379 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
380 neg = self.negated and '-' or ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
381 return u'%s!"%s"' % (neg, unicode(self._pattern))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
382
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
383 def highlight_re(self):
1469
af6c7a0fcd27 Backed out changeset 6dfca61f2672122ce12804ae42b90278df5c8473
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1461
diff changeset
384 return u"(%s)" % self._pattern
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
385
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
386 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
387 """ Page filter function for single title search """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
388 def filter(name):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
389 match = self.search_re.search(name)
2235
fcb18262e270 Simplify some code in the query parser.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 2219
diff changeset
390 return bool(self.negated) ^ bool(match)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
391 return filter
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
392
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
393 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
394 # Get matches in page name
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
395 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
396 for match in self.search_re.finditer(page.page_name):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
397 if page.request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
398 # somewhere in regular word
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
399 if not self.case and \
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
400 page.page_name[match.start()] not in config.chars_upper and \
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
401 page.page_name[match.start()-1] in config.chars_lower:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
402 continue
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
403
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
404 post = 0
1502
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
405 # XXX only do this for stemmd words. how?
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
406 #for c in page.page_name[match.end():]:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
407 # if c in config.chars_lower:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
408 # post += 1
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
409 # else:
f5835c19203a disable highlight code for stemmed words for now, does not behave well for short words
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
410 # break
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
411
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
412 matches.append(TitleMatch(start=match.start(),
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
413 end=match.end()+post))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
414 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
415 matches.append(TitleMatch(re_match=match))
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
416
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
417 if ((self.negated and matches) or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
418 (not self.negated and not matches)):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
419 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
420 elif matches:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
421 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
422 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
423 return []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
424
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
425 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
426 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
427
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
428 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
429 return self.case
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
430
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
431 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
432 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
433 # basic regex matching per term
1206
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
434 terms = []
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
435 found = False
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
436 for term in allterms():
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
437 if term[:4] == 'XFT:':
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
438 found = True
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
439 if self.search_re.findall(term[4:]):
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
440 terms.append(Query(term, 100))
1206
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
441 elif found:
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
442 break
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
443 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
444 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
445 queries = [Query(Query.OP_OR, terms)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
446 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
447 analyzer = Xapian.WikiAnalyzer(request=request,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
448 language=request.cfg.language_default)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
449 terms = self._pattern.split()
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
450 terms = [[w for w, pos in analyzer.raw_tokenize(t)] for t in terms]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
451
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
452 # all parsed wikiwords, ANDed
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
453 queries = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
454 stemmed = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
455 for t in terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
456 if request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
457 # stemmed OR not stemmed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
458 tmp = []
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
459 for w, s, pos in analyzer.tokenize(t, flat_stemming=False):
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
460 tmp.append(Query(Query.OP_OR,
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
461 [UnicodeQuery('%s%s' %
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
462 (Xapian.Index.prefixMap['title'], j),
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
463 100)
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
464 for j in (w, s)]))
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
465 stemmed.append(s)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
466 t = tmp
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
467 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
468 # just not stemmed
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
469 t = [UnicodeQuery(
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
470 '%s%s' % (Xapian.Index.prefixMap['title'], w),
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
471 100)
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
472 for w, pos in analyzer.tokenize(t)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
473
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
474 queries.append(Query(Query.OP_AND, t))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
475
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
476 if not self.case and stemmed:
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
477 new_pat = ' '.join(stemmed)
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
478 self._pattern = new_pat
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
479 self._build_re(new_pat, use_re=False, case=self.case, stemmed=True)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
480
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
481 return Query(Query.OP_AND, queries)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
482
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
483
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
484 class LinkSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
485 """ Search the term in the pagelinks """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
486
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
487 def __init__(self, pattern, use_re=False, case=True):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
488 """ Init a link search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
489
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
490 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
491 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
492 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
493 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
494 # used for search in links
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
495 self._pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
496 # used for search in text
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
497 self._textpattern = '(' + self._pattern.replace('/', '|') + ')'
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
498 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
499 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
500 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
501 self.textsearch = TextSearch(self._textpattern, use_re=1, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
502 self._build_re(unicode(pattern), use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
503
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
504 def _build_re(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
505 """ Make a regular expression out of a text pattern """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
506 flags = case and re.U or (re.I | re.U)
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
507 if use_re:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
508 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
509 self.static = False
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
510 else:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
511 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
512 self.static = True
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
513
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
514 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
515 return 5000 # cheaper than a TextSearch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
516
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
517 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
518 neg = self.negated and '-' or ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
519 return u'%s!"%s"' % (neg, unicode(self._pattern))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
520
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
521 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
522 return u"(%s)" % self._textpattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
523
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
524 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
525 # Get matches in page name
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
526 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
527 Found = True
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
528
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
529 for link in page.getPageLinks(page.request):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
530 if ((self.static and self.pattern == link) or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
531 (not self.static and self.search_re.match(link))):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
532 break
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
533 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
534 Found = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
535
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
536 if Found:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
537 # Search in page text
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
538 results = self.textsearch.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
539 if results:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
540 matches.extend(results)
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
541 else: # This happens e.g. for pages that use navigation macros
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
542 matches.append(TextMatch(0, 0))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
543
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
544 # Decide what to do with the results.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
545 if ((self.negated and matches) or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
546 (not self.negated and not matches)):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
547 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
548 elif matches:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
549 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
550 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
551 return []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
552
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
553 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
554 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
555
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
556 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
557 return self.case
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
558
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
559 def xapian_term(self, request, allterms):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
560 prefix = Xapian.Index.prefixMap['linkto']
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
561 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
562 # basic regex matching per term
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
563 terms = []
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
564 found = None
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
565 n = len(prefix)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
566 for term in allterms():
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
567 if prefix == term[:n]:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
568 found = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
569 if self.search_re.match(term[n+1:]):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
570 terms.append(term)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
571 elif found:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
572 continue
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
573
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
574 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
575 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
576 return Query(Query.OP_OR, terms)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
577 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
578 return UnicodeQuery('%s:%s' % (prefix, self.pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
579
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
580
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
581 class LanguageSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
582 """ Search the pages written in a language """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
583
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
584 def __init__(self, pattern, use_re=False, case=True):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
585 """ Init a language search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
586
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
587 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
588 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
589 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
590 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
591 # iso language code, always lowercase
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
592 self._pattern = pattern.lower()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
593 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
594 self.use_re = use_re
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
595 self.case = False # not case-sensitive!
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
596 self.xapian_called = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
597 self._build_re(self._pattern, use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
598
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
599 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
600 return 5000 # cheaper than a TextSearch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
601
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
602 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
603 neg = self.negated and '-' or ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
604 return u'%s!"%s"' % (neg, unicode(self._pattern))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
605
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
606 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
607 return ""
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
608
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
609 def search(self, page):
1493
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
610 match = False
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
611 body = page.getPageHeader()
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
612
1493
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
613 if re.findall('#language %s' % self.pattern, body):
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
614 match = True
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
615
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
616 # Decide what to do with the results.
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
617 if self.negated and match:
1493
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
618 return None
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
619 elif match or (self.negated and not match):
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
620 return [Match()]
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
621 else:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
622 return []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
623
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
624 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
625 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
626
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
627 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
628 return False # case-sensitivity would make no sense
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
629
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
630 def xapian_term(self, request, allterms):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
631 self.xapian_called = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
632 prefix = Xapian.Index.prefixMap['lang']
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
633 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
634 # basic regex matching per term
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
635 terms = []
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
636 found = None
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
637 n = len(prefix)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
638 for term in allterms():
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
639 if prefix == term[:n]:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
640 found = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
641 if self.search_re.match(term[n:]):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
642 terms.append(term)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
643 elif found:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
644 continue
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
645
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
646 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
647 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
648 return Query(Query.OP_OR, terms)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
649 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
650 pattern = self.pattern
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
651 return UnicodeQuery('%s%s' % (prefix, pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
652
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
653
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
654 class CategorySearch(TextSearch):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
655 """ Search the pages belonging to a category """
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
656
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
657 def __init__(self, *args, **kwargs):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
658 TextSearch.__init__(self, *args, **kwargs)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
659 self.titlesearch = None
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
660
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
661 def _build_re(self, pattern, **kwargs):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
662 kwargs['use_re'] = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
663 TextSearch._build_re(self,
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
664 r'(----(-*)(\r)?\n)(.*)Category%s\b' % pattern, **kwargs)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
665
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
666 def costs(self):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
667 return 5000 # cheaper than a TextSearch
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
668
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
669 def __unicode__(self):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
670 neg = self.negated and '-' or ''
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
671 return u'%s!"%s"' % (neg, unicode(self._pattern))
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
672
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
673 def highlight_re(self):
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
674 return u'(Category%s)' % self._pattern
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
675
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
676 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
677 return True # only easy regexps possible
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
678
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
679 def xapian_need_postproc(self):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
680 return self.case
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
681
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
682 def xapian_term(self, request, allterms):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
683 self.xapian_called = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
684 prefix = Xapian.Index.prefixMap['category']
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
685 if self.use_re:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
686 # basic regex matching per term
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
687 terms = []
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
688 found = None
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
689 n = len(prefix)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
690 for term in allterms():
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
691 if prefix == term[:n]:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
692 found = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
693 if self.search_re.match(term[n+1:]):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
694 terms.append(term)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
695 elif found:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
696 continue
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
697
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
698 if not terms:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
699 return Query()
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
700 return Query(Query.OP_OR, terms)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
701 else:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
702 pattern = self._pattern.lower()
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
703 return UnicodeQuery('%s:%s' % (prefix, pattern))
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
704
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
705
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
706 class MimetypeSearch(BaseExpression):
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
707 """ Search for files belonging to a specific mimetype """
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
708
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
709 def __init__(self, pattern, use_re=False, case=True):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
710 """ Init a mimetype search
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
711
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
712 @param pattern: pattern to search for, ascii string or unicode
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
713 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
714 @param case: do case sensitive search, bool
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
715 """
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
716 self._pattern = pattern.lower()
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
717 self.negated = 0
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
718 self.use_re = use_re
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
719 self.case = False # not case-sensitive!
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
720 self.xapian_called = False
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
721 self._build_re(self._pattern, use_re=use_re, case=case)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
722
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
723 def costs(self):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
724 return 5000 # cheaper than a TextSearch
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
725
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
726 def __unicode__(self):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
727 neg = self.negated and '-' or ''
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
728 return u'%s!"%s"' % (neg, unicode(self._pattern))
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
729
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
730 def highlight_re(self):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
731 return ""
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
732
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
733 def search(self, page):
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
734 return None
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
735
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
736 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
737 return True # only easy regexps possible
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
738
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
739 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
740 return False # case-sensitivity would make no sense
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
741
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
742 def xapian_term(self, request, allterms):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
743 self.xapian_called = True
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
744 prefix = Xapian.Index.prefixMap['mimetype']
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
745 if self.use_re:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
746 # basic regex matching per term
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
747 terms = []
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
748 found = None
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
749 n = len(prefix)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
750 for term in allterms():
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
751 if prefix == term[:n]:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
752 found = True
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
753 if self.search_re.match(term[n:]):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
754 terms.append(term)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
755 elif found:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
756 continue
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
757
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
758 if not terms:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
759 return Query()
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
760 return Query(Query.OP_OR, terms)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
761 else:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
762 pattern = self._pattern
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
763 return UnicodeQuery('%s%s' % (prefix, pattern))
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
764
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
765
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
766 class DomainSearch(BaseExpression):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
767 """ Search for pages belonging to a specific domain """
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
768
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
769 def __init__(self, pattern, use_re=False, case=True):
1377
bb37beca7545 fixed system pages search, added underlay search, started with mtime filtering
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1376
diff changeset
770 """ Init a domain search
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
771
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
772 @param pattern: pattern to search for, ascii string or unicode
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
773 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
774 @param case: do case sensitive search, bool
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
775 """
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
776 self._pattern = pattern.lower()
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
777 self.negated = 0
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
778 self.use_re = use_re
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
779 self.case = False # not case-sensitive!
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
780 self.xapian_called = False
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
781 self._build_re(self._pattern, use_re=use_re, case=case)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
782
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
783 def costs(self):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
784 return 5000 # cheaper than a TextSearch
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
785
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
786 def __unicode__(self):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
787 neg = self.negated and '-' or ''
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
788 return u'%s!"%s"' % (neg, unicode(self._pattern))
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
789
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
790 def highlight_re(self):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
791 return ""
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
792
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
793 def search(self, page):
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
794 checks = {'underlay': page.isUnderlayPage,
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
795 'standard': page.isStandardPage,
2219
dc5cd1a4b397 xapian search: fix crash when excluding system items
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1988
diff changeset
796 'system': lambda page: wikiutil.isSystemPage(page.request, page.page_name),
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
797 }
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
798
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
799 try:
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
800 match = checks[self.pattern]()
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
801 except KeyError:
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
802 match = False
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
803
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
804 # Decide what to do with the results.
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
805 if self.negated and match:
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
806 return None
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
807 elif match or (self.negated and not match):
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
808 return [Match()]
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
809 else:
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
810 return []
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
811
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
812 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
813 return True # only easy regexps possible
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
814
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
815 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
816 return False # case-sensitivity would make no sense
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
817
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
818 def xapian_term(self, request, allterms):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
819 self.xapian_called = True
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
820 prefix = Xapian.Index.prefixMap['domain']
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
821 if self.use_re:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
822 # basic regex matching per term
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
823 terms = []
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
824 found = None
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
825 n = len(prefix)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
826 for term in allterms():
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
827 if prefix == term[:n]:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
828 found = True
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
829 if self.search_re.match(term[n+1:]):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
830 terms.append(term)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
831 elif found:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
832 continue
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
833
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
834 if not terms:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
835 return Query()
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
836 return Query(Query.OP_OR, terms)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
837 else:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
838 pattern = self._pattern
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
839 return UnicodeQuery('%s:%s' % (prefix, pattern))
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
840
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
841
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
842 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
843 ### Parse Query
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
844 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
845
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
846 class QueryParser:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
847 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
848 Converts a String into a tree of Query objects
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
849 using recursive top/down parsing
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
850 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
851
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
852 def __init__(self, **kw):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
853 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
854 @keyword titlesearch: treat all terms as title searches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
855 @keyword case: do case sensitive search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
856 @keyword regex: treat all terms as regular expressions
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
857 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
858 self.titlesearch = kw.get('titlesearch', 0)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
859 self.case = kw.get('case', 0)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
860 self.regex = kw.get('regex', 0)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
861
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
862 def parse_query(self, query):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
863 """ transform an string into a tree of Query objects """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
864 if isinstance(query, str):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
865 query = query.decode(config.charset)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
866 self._query = query
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
867 result = self._or_expression()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
868 if result is None:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
869 result = BaseExpression()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
870 return result
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
871
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
872 def _or_expression(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
873 result = self._and_expression()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
874 if self._query:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
875 result = OrExpression(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
876 while self._query:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
877 q = self._and_expression()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
878 if q:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
879 result.append(q)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
880 return result
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
881
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
882 def _and_expression(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
883 result = None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
884 while not result and self._query:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
885 result = self._single_term()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
886 term = self._single_term()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
887 if term:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
888 result = AndExpression(result, term)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
889 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
890 return result
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
891 term = self._single_term()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
892 while term:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
893 result.append(term)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
894 term = self._single_term()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
895 return result
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
896
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
897 def _single_term(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
898 regex = (r'(?P<NEG>-?)\s*(' + # leading '-'
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
899 r'(?P<OPS>\(|\)|(or\b(?!$)))|' + # or, (, )
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
900 r'(?P<MOD>(\w+:)*)' +
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
901 r'(?P<TERM>("[^"]+")|' +
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
902 r"('[^']+')|([^\s\)]+)))") # search word itself
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
903 self._query = self._query.strip()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
904 match = re.match(regex, self._query, re.U)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
905 if not match:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
906 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
907 self._query = self._query[match.end():]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
908 ops = match.group("OPS")
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
909 if ops == '(':
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
910 result = self._or_expression()
1988
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
911 if match.group("NEG"):
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
912 result.negate()
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
913 return result
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
914 elif ops == ')':
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
915 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
916 elif ops == 'or':
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
917 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
918 modifiers = match.group('MOD').split(":")[:-1]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
919 text = match.group('TERM')
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
920 if self.isQuoted(text):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
921 text = text[1:-1]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
922
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
923 title_search = self.titlesearch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
924 regex = self.regex
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
925 case = self.case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
926 linkto = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
927 lang = False
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
928 category = False
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
929 mimetype = False
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
930 domain = False
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
931
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
932 for m in modifiers:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
933 if "title".startswith(m):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
934 title_search = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
935 elif "regex".startswith(m):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
936 regex = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
937 elif "case".startswith(m):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
938 case = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
939 elif "linkto".startswith(m):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
940 linkto = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
941 elif "language".startswith(m):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
942 lang = True
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
943 elif "category".startswith(m):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
944 category = True
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
945 elif "mimetype".startswith(m):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
946 mimetype = True
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
947 elif "domain".startswith(m):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
948 domain = True
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
949
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
950 # oh, let's better call xapian if we encounter this nasty regexp ;)
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
951 if not category:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
952 cat_re = re.compile(r'----\(-\*\)\(\\r\)\?\\n\)\(\.\*\)Category(.*)\\b', re.U)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
953 cat_match = cat_re.search(text)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
954 if cat_match:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
955 text = cat_match.groups()[0]
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
956 category = True
1203
10512e7ca243 we should allow regex with category search prefix
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1200
diff changeset
957 regex = False
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
958
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
959 if category:
1203
10512e7ca243 we should allow regex with category search prefix
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1200
diff changeset
960 obj = CategorySearch(text, use_re=regex, case=case)
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
961 elif mimetype:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
962 obj = MimetypeSearch(text, use_re=regex, case=False)
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
963 elif lang:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
964 obj = LanguageSearch(text, use_re=regex, case=False)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
965 elif linkto:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
966 obj = LinkSearch(text, use_re=regex, case=case)
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
967 elif domain:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
968 obj = DomainSearch(text, use_re=regex, case=False)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
969 elif title_search:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
970 obj = TitleSearch(text, use_re=regex, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
971 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
972 obj = TextSearch(text, use_re=regex, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
973
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
974 if match.group("NEG"):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
975 obj.negate()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
976 return obj
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
977
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
978 def isQuoted(self, text):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
979 # Empty string '' is not considered quoted
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
980 if len(text) < 3:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
981 return False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
982 return (text.startswith('"') and text.endswith('"') or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
983 text.startswith("'") and text.endswith("'"))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
984
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
985