annotate MoinMoin/search/queryparser.py @ 4468:fad2936d33a7 1.8.1

added dummy mig script for 1.8.1, bumped version number
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Wed, 24 Dec 2008 16:32:46 +0100
parents a53e20c3ebab
children af09c1b3a153
rev   line source
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
2 """
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
3 MoinMoin - search query parser
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
4
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
5 @copyright: 2005 MoinMoin:FlorianFesti,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
6 2005 MoinMoin:NirSoffer,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
7 2005 MoinMoin:AlexanderSchremmer,
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
8 2006-2008 MoinMoin:ThomasWaldmann,
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
9 2006 MoinMoin:FranzPletz
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
10 @license: GNU GPL, see COPYING for details
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
11 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
12
1180
c1c0d7f6f2b4 remove string module usage, some other cosmetic changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 926
diff changeset
13 import re
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
14
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
15 from MoinMoin import log
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
16 logging = log.getLogger(__name__)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
17
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
18 from MoinMoin import config, wikiutil
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
19 from MoinMoin.search.results import Match, TitleMatch, TextMatch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
20
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
21 try:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
22 from MoinMoin.search import Xapian
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
23 from MoinMoin.search.Xapian import Query, UnicodeQuery
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
24 except ImportError:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
25 pass
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
26
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
27 #############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
28 ### query objects
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
29 #############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
30
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
31 class BaseExpression:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
32 """ Base class for all search terms """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
33
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
34 _tag = ""
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
35
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
36 def __init__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
37 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
38
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
39 def __str__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
40 return unicode(self).encode(config.charset, 'replace')
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
41
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
42 def negate(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
43 """ Negate the result of this term """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
44 self.negated = 1
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
45
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
46 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
47 """ Return a page filtering function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
48
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
49 This function is used to filter page list before we search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
50 it. Return a function that get a page name, and return bool.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
51
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
52 The default expression does not have any filter function and
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
53 return None. Sub class may define custom filter functions.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
54 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
55 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
56
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
57 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
58 """ Search a page
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
59
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
60 Returns a list of Match objects or None if term didn't find
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
61 anything (vice versa if negate() was called). Terms containing
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
62 other terms must call this method to aggregate the results.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
63 This Base class returns True (Match()) if not negated.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
64 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
65 if self.negated:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
66 return [Match()]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
67 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
68 return None
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
69
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
70 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
71 """ Return estimated time to calculate this term
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
72
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
73 Number is relative to other terms and has no real unit.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
74 It allows to do the fast searches first.
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
75 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
76 return 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
77
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
78 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
79 """ Return a regular expression of what the term searches for
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
80
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
81 Used to display the needle in the page.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
82 """
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
83 return u''
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
84
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
85 def _build_re(self, pattern, use_re=False, case=False, stemmed=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
86 """ Make a regular expression out of a text pattern """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
87 flags = case and re.U or (re.I | re.U)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
88 if use_re:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
89 try:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
90 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
91 except re.error:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
92 pattern = re.escape(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
93 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
94 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
95 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
96 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
97 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
98 pattern = re.escape(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
99 self.search_re = re.compile(pattern, flags)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
100 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
101
1447
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
102 def xapian_wanted(self):
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
103 return False
2e4491ea9572 fix when searching for ()
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1377
diff changeset
104
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
105 def __unicode__(self):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
106 neg = self.negated and '-' or ''
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
107 return u'%s%s"%s"' % (neg, self._tag, unicode(self._pattern))
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
108
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
109
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
110 class AndExpression(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
111 """ A term connecting several sub terms with a logical AND """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
112
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
113 operator = ' '
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
114
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
115 def __init__(self, *terms):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
116 self._subterms = list(terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
117 self._costs = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
118 for t in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
119 self._costs += t.costs()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
120 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
121
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
122 def append(self, expression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
123 """ Append another term """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
124 self._subterms.append(expression)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
125 self._costs += expression.costs()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
126
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
127 def subterms(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
128 return self._subterms
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
129
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
130 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
131 return self._costs
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
132
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
133 def __unicode__(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
134 result = ''
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
135 for t in self._subterms:
2952
1bb689d96ade search: fix breaking queryparser tests
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2286
diff changeset
136 result += self.operator + unicode(t)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
137 return u'[' + result[len(self.operator):] + u']'
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
138
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
139 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
140 """ Return a page filtering function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
141
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
142 This function is used to filter page list before we search it.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
143
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
144 Return a function that gets a page name, and return bool, or None.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
145 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
146 # Sort terms by cost, then get all title searches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
147 self.sortByCost()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
148 terms = [term for term in self._subterms if isinstance(term, TitleSearch)]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
149 if terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
150 # Create and return a filter function
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
151 def filter(name):
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
152 """ A function that returns True if all terms filter name """
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
153 result = None
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
154 for term in terms:
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
155 _filter = term.pageFilter()
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
156 t = _filter(name)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
157 if t is False:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
158 result = False
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
159 break
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
160 elif t is True:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
161 result = True
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
162 logging.debug("pageFilter AND returns %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
163 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
164 return filter
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
165
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
166 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
167
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
168 def sortByCost(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
169 tmp = [(term.costs(), term) for term in self._subterms]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
170 tmp.sort()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
171 self._subterms = [item[1] for item in tmp]
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
172
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
173 def search(self, page):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
174 """ Search for each term, cheap searches first """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
175 self.sortByCost()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
176 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
177 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
178 result = term.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
179 if not result:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
180 return None
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
181 matches.extend(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
182 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
183
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
184 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
185 result = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
186 for s in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
187 highlight_re = s.highlight_re()
1988
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
188 if highlight_re:
dbf2b7426385 fixed some pylint warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1502
diff changeset
189 result.append(highlight_re)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
190
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
191 return u'|'.join(result)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
192
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
193 def xapian_wanted(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
194 wanted = True
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
195 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
196 wanted = wanted and term.xapian_wanted()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
197 return wanted
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
198
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
199 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
200 for term in self._subterms:
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
201 if term.xapian_need_postproc():
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
202 return True
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
203 return False
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
204
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
205 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
206 # sort negated terms
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
207 terms = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
208 not_terms = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
209 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
210 if not term.negated:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
211 terms.append(term.xapian_term(request, allterms))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
212 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
213 not_terms.append(term.xapian_term(request, allterms))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
214
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
215 # prepare query for not negated terms
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
216 if not terms:
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
217 t1 = None
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
218 elif len(terms) == 1:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
219 t1 = Query(terms[0])
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
220 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
221 t1 = Query(Query.OP_AND, terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
222
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
223 # prepare query for negated terms
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
224 if not not_terms:
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
225 t2 = None
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
226 elif len(not_terms) == 1:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
227 t2 = Query(not_terms[0])
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
228 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
229 t2 = Query(Query.OP_OR, not_terms)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
230
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
231 if t1 and not t2:
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
232 return t1
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
233 elif t2 and not t1:
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
234 return Query(Query.OP_AND_NOT, Query(""), t2) # Query("") == MatchAll
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
235 else:
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
236 # yes, link not negated and negated terms' query with a AND_NOT query
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
237 return Query(Query.OP_AND_NOT, t1, t2)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
238
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
239
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
240 class OrExpression(AndExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
241 """ A term connecting several sub terms with a logical OR """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
242
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
243 operator = ' or '
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
244
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
245 def pageFilter(self):
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
246 """ Return a page filtering function
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
247
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
248 This function is used to filter page list before we search it.
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
249
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
250 Return a function that gets a page name, and return bool, or None.
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
251 """
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
252 # Sort terms by cost, then get all title searches
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
253 self.sortByCost()
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
254 terms = [term for term in self._subterms if isinstance(term, TitleSearch)]
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
255 if terms:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
256 # Create and return a filter function
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
257 def filter(name):
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
258 """ A function that returns True if any term filters name """
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
259 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
260 for term in terms:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
261 _filter = term.pageFilter()
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
262 t = _filter(name)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
263 if t is True:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
264 return True
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
265 elif t is False:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
266 result = False
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
267 logging.debug("pageFilter OR returns %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
268 return result
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
269 return filter
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
270
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
271 return None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
272
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
273 def search(self, page):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
274 """ Search page with terms
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
275
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
276 @param page: the page instance
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
277 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
278
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
279 # XXX Do we have any reason to sort here? we are not breaking out
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
280 # of the search in any case.
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
281 #self.sortByCost()
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
282 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
283 for term in self._subterms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
284 result = term.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
285 if result:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
286 matches.extend(result)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
287 return matches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
288
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
289 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
290 # XXX: negated terms managed by _moinSearch?
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
291 return Query(Query.OP_OR, [term.xapian_term(request, allterms) for term in self._subterms])
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
292
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
293
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
294 class TextSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
295 """ A term that does a normal text search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
296
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
297 Both page content and the page title are searched, using an
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
298 additional TitleSearch term.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
299 """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
300
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
301 def __init__(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
302 """ Init a text search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
303
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
304 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
305 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
306 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
307 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
308 self._pattern = unicode(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
309 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
310 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
311 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
312 self._build_re(self._pattern, use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
313 self.titlesearch = TitleSearch(self._pattern, use_re=use_re, case=case)
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
314 self._tag = ''
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
315 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
316 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
317 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
318 self._tag += 'case:'
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
319
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
320 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
321 return 10000
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
322
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
323 def highlight_re(self):
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
324 return u"(%s)" % self.pattern
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
325
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
326 def search(self, page):
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
327 logging.debug("TextSearch searching page %r for (negated = %r) %r" % (page.page_name, self.negated, self._pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
328 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
329
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
330 # Search in page name
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
331 if self.titlesearch:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
332 results = self.titlesearch.search(page)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
333 if results:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
334 matches.extend(results)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
335
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
336 # Search in page body
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
337 body = page.get_raw_body()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
338 for match in self.search_re.finditer(body):
3570
eb5c6f776bce xapian_stemming: removed some strange code (fixes search title:lowercaseword with xapian_stemming enabled)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3506
diff changeset
339 matches.append(TextMatch(re_match=match))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
340
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
341 # Decide what to do with the results.
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
342 if self.negated:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
343 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
344 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
345 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
346 result = [Match()] # represents "matched" (but as it was a negative match, we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
347 else: # not negated
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
348 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
349 result = matches
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
350 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
351 result = None
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
352 logging.debug("TextSearch returning %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
353 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
354
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
355 def xapian_wanted(self):
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
356 # XXX: Add option for term-based matching
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
357 return not self.use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
358
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
359 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
360 return self.case
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
361
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
362 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
363 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
364 # basic regex matching per term
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
365 terms = [term for term in allterms() if self.search_re.match(term)]
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
366 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
367 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
368 queries = [Query(Query.OP_OR, terms)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
369 else:
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
370 analyzer = Xapian.WikiAnalyzer(request=request, language=request.cfg.language_default)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
371 terms = self._pattern.split()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
372
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
373 # all parsed wikiwords, AND'ed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
374 queries = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
375 stemmed = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
376 for t in terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
377 if request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
378 # stemmed OR not stemmed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
379 tmp = []
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
380 for w, s, pos in analyzer.tokenize(t, flat_stemming=False):
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
381 tmp.append(UnicodeQuery(Query.OP_OR, (w, s)))
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
382 stemmed.append(s)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
383 t = tmp
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
384 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
385 # just not stemmed
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
386 t = [UnicodeQuery(w) for w, pos in analyzer.tokenize(t)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
387 queries.append(Query(Query.OP_AND, t))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
388
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
389 if not self.case and stemmed:
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
390 new_pat = ' '.join(stemmed)
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
391 self._pattern = new_pat
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
392 self._build_re(new_pat, use_re=False, case=self.case, stemmed=True)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
393
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
394 # titlesearch OR parsed wikiwords
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
395 return Query(Query.OP_OR,
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
396 (self.titlesearch.xapian_term(request, allterms),
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
397 Query(Query.OP_AND, queries)))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
398
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
399
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
400 class TitleSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
401 """ Term searches in pattern in page title only """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
402
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
403 def __init__(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
404 """ Init a title search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
405
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
406 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
407 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
408 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
409 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
410 self._pattern = unicode(pattern)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
411 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
412 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
413 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
414 self._build_re(self._pattern, use_re=use_re, case=case)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
415
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
416 self._tag = 'title:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
417 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
418 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
419 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
420 self._tag += 'case:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
421
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
422 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
423 return 100
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
424
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
425 def highlight_re(self):
3014
dc970f2d146c avoid empty highlight query arg, do not use highlight for titlesearch terms (port from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2955
diff changeset
426 return u'' # do not highlight text with stuff from titlesearch,
dc970f2d146c avoid empty highlight query arg, do not use highlight for titlesearch terms (port from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2955
diff changeset
427 # was: return u"(%s)" % self._pattern
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
428
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
429 def pageFilter(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
430 """ Page filter function for single title search """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
431 def filter(name):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
432 match = self.search_re.search(name)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
433 result = bool(self.negated) ^ bool(match)
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
434 logging.debug("pageFilter title returns %r (%r)" % (result, self.pattern))
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
435 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
436 return filter
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
437
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
438 def search(self, page):
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
439 """ Get matches in page name """
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
440 logging.debug("TitleSearch searching page %r for (negated = %r) %r" % (page.page_name, self.negated, self._pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
441 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
442 for match in self.search_re.finditer(page.page_name):
3570
eb5c6f776bce xapian_stemming: removed some strange code (fixes search title:lowercaseword with xapian_stemming enabled)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3506
diff changeset
443 matches.append(TitleMatch(re_match=match))
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
444
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
445 if self.negated:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
446 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
447 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
448 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
449 result = [Match()] # represents "matched" (but as it was a negative match, we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
450 else: # not negated
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
451 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
452 result = matches
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
453 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
454 result = None
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
455 logging.debug("TitleSearch returning %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
456 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
457
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
458 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
459 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
460
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
461 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
462 return self.case
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
463
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
464 def xapian_term(self, request, allterms):
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
465 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
466 # basic regex matching per term
1206
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
467 terms = []
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
468 found = False
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
469 for term in allterms():
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
470 if term[:4] == 'XFT:':
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
471 found = True
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
472 if self.search_re.findall(term[4:]):
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
473 terms.append(Query(term, 100))
1206
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
474 elif found:
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1203
diff changeset
475 break
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
476 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
477 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
478 queries = [Query(Query.OP_OR, terms)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
479 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
480 analyzer = Xapian.WikiAnalyzer(request=request,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
481 language=request.cfg.language_default)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
482 terms = self._pattern.split()
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
483 terms = [[w for w, pos in analyzer.raw_tokenize(t)] for t in terms]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
484
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
485 # all parsed wikiwords, ANDed
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
486 queries = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
487 stemmed = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
488 for t in terms:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
489 if request.cfg.xapian_stemming:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
490 # stemmed OR not stemmed
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
491 tmp = []
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
492 for w, s, pos in analyzer.tokenize(t, flat_stemming=False):
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
493 tmp.append(Query(Query.OP_OR,
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
494 [UnicodeQuery('%s%s' %
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
495 (Xapian.Index.prefixMap['title'], j),
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
496 100)
925
4508fc92fcb1 index exact positions of terms (postings)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
497 for j in (w, s)]))
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
498 stemmed.append(s)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
499 t = tmp
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
500 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
501 # just not stemmed
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
502 t = [UnicodeQuery(
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
503 '%s%s' % (Xapian.Index.prefixMap['title'], w),
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
504 100)
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1236
diff changeset
505 for w, pos in analyzer.tokenize(t)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
506
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
507 queries.append(Query(Query.OP_AND, t))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
508
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
509 if not self.case and stemmed:
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
510 new_pat = ' '.join(stemmed)
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
511 self._pattern = new_pat
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
512 self._build_re(new_pat, use_re=False, case=self.case, stemmed=True)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
513
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
514 return Query(Query.OP_AND, queries)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
515
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
516
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
517 class LinkSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
518 """ Search the term in the pagelinks """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
519
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
520 def __init__(self, pattern, use_re=False, case=True):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
521 """ Init a link search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
522
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
523 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
524 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
525 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
526 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
527 # used for search in links
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
528 self._pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
529 # used for search in text
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
530 self._textpattern = '(' + self._pattern.replace('/', '|') + ')'
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
531 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
532 self.use_re = use_re
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
533 self.case = case
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
534 self.textsearch = TextSearch(self._textpattern, use_re=1, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
535 self._build_re(unicode(pattern), use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
536
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
537 self._tag = 'linkto:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
538 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
539 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
540 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
541 self._tag += 'case:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
542
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
543 def _build_re(self, pattern, use_re=False, case=False):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
544 """ Make a regular expression out of a text pattern """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
545 flags = case and re.U or (re.I | re.U)
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
546 if use_re:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
547 self.search_re = re.compile(pattern, flags)
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
548 self.pattern = pattern
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
549 self.static = False
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
550 else:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
551 self.pattern = pattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
552 self.static = True
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
553
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
554 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
555 return 5000 # cheaper than a TextSearch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
556
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
557 def highlight_re(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
558 return u"(%s)" % self._textpattern
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
559
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
560 def search(self, page):
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
561 # Get matches in page links
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
562 logging.debug("LinkSearch searching page %r for (negated = %r) %r" % (page.page_name, self.negated, self._pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
563 matches = []
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
564 Found = True
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
565
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
566 for link in page.getPageLinks(page.request):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
567 if ((self.static and self.pattern == link) or
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
568 (not self.static and self.search_re.match(link))):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
569 break
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
570 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
571 Found = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
572
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
573 if Found:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
574 # Search in page text
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
575 results = self.textsearch.search(page)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
576 if results:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
577 matches.extend(results)
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
578 else: # This happens e.g. for pages that use navigation macros
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
579 matches.append(TextMatch(0, 0))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
580
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
581 # Decide what to do with the results.
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
582 if self.negated:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
583 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
584 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
585 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
586 result = [Match()] # represents "matched" (but as it was a negative match, we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
587 else: # not negated
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
588 if matches:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
589 result = matches
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
590 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
591 result = None
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
592 logging.debug("LinkSearch returning %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
593 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
594
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
595 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
596 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
597
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
598 def xapian_need_postproc(self):
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
599 return self.case
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
600
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
601 def xapian_term(self, request, allterms):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
602 prefix = Xapian.Index.prefixMap['linkto']
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
603 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
604 # basic regex matching per term
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
605 terms = []
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
606 found = None
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
607 n = len(prefix)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
608 for term in allterms():
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
609 if prefix == term[:n]:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
610 found = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
611 if self.search_re.match(term[n+1:]):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
612 terms.append(term)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
613 elif found:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
614 continue
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
615
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
616 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
617 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
618 return Query(Query.OP_OR, terms)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
619 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
620 return UnicodeQuery('%s:%s' % (prefix, self.pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
621
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
622
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
623 class LanguageSearch(BaseExpression):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
624 """ Search the pages written in a language """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
625
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
626 def __init__(self, pattern, use_re=False, case=True):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
627 """ Init a language search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
628
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
629 @param pattern: pattern to search for, ascii string or unicode
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
630 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
631 @param case: do case sensitive search, bool
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
632 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
633 # iso language code, always lowercase
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
634 self._pattern = pattern.lower()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
635 self.negated = 0
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
636 self.use_re = use_re
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
637 self.case = False # not case-sensitive!
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
638 self.xapian_called = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
639 self._build_re(self._pattern, use_re=use_re, case=case)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
640
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
641 self._tag = 'language:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
642 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
643 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
644 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
645 self._tag += 'case:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
646
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
647 def costs(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
648 return 5000 # cheaper than a TextSearch
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
649
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
650 def highlight_re(self):
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
651 return u""
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
652
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
653 def search(self, page):
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
654 logging.debug("LanguageSearch searching page %r for (negated = %r) %r" % (page.page_name, self.negated, self._pattern))
1493
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
655 match = False
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
656 body = page.getPageHeader()
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
657
3113
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
658 comma = re.compile(',')
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
659 iterator = comma.finditer(self.pattern)
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
660 temp = 0
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
661 for m_obj in iterator:
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
662 if re.findall('#language %s' % self.pattern[temp:m_obj.end()-2], body):
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
663 match = True
3489709aa99f AdvancedSearch: make multipe categories/languages/mimetype selections possible, thanks to James Sun (crptone@gmail.com, ported from 1.6)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3014
diff changeset
664 temp = m_obj.end()
1493
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
665
91f8863f0c13 language search for moinSearch implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1492
diff changeset
666 # Decide what to do with the results.
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
667 if self.negated:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
668 if match:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
669 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
670 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
671 result = [Match()] # represents "matched" (but as it was a negative match, we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
672 else: # not negated
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
673 if match:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
674 result = [Match()] # represents "matched" (but we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
675 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
676 result = None
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
677 logging.debug("LanguageSearch returning %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
678 return result
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
679
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
680 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
681 return True # only easy regexps possible
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
682
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
683 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
684 return False # case-sensitivity would make no sense
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
685
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
686 def xapian_term(self, request, allterms):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
687 self.xapian_called = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
688 prefix = Xapian.Index.prefixMap['lang']
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
689 if self.use_re:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
690 # basic regex matching per term
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
691 terms = []
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
692 found = None
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
693 n = len(prefix)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
694 for term in allterms():
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
695 if prefix == term[:n]:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
696 found = True
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
697 if self.search_re.match(term[n:]):
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
698 terms.append(term)
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
699 elif found:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
700 continue
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
701
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
702 if not terms:
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
703 return Query()
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
704 return Query(Query.OP_OR, terms)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
705 else:
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
706 pattern = self.pattern
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
707 return UnicodeQuery('%s%s' % (prefix, pattern))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
708
1236
d2d160c344b7 improved highlighting code to work better with stemming and special searches, extended SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
709
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
710 class CategorySearch(TextSearch):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
711 """ Search the pages belonging to a category """
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
712
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
713 def __init__(self, *args, **kwargs):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
714 TextSearch.__init__(self, *args, **kwargs)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
715 self.titlesearch = None
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
716
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
717 self._tag = 'category:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
718
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
719 def _build_re(self, pattern, **kwargs):
3506
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
720 """ match categories like this:
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
721 ... some page text ...
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
722 ----
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
723 ## optionally some comments, e.g. about possible categories:
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
724 ## CategoryFoo
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
725 CategoryTheRealAndOnly
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
726
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
727 Note: there might be multiple comment lines, but all real categories
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
728 must be on a single line either directly below the ---- or
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
729 directly below some comment lines.
6b6851d0a3c8 search: match categories even if there are comment lines
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3461
diff changeset
730 """
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
731 kwargs['use_re'] = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
732 TextSearch._build_re(self,
3813
a3cf0aa7bf97 category search: ignore traling whitespace after ----
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3794
diff changeset
733 r'(?m)(^-----*\s*\r?\n)(^##.*\r?\n)*^(?!##)(.*)\b%s\b' % pattern, **kwargs)
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
734
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
735 def costs(self):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
736 return 5000 # cheaper than a TextSearch
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
737
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
738 def highlight_re(self):
3398
f212fe993c14 fix category highlight re
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3397
diff changeset
739 return u'(\\b%s\\b)' % self._pattern
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
740
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
741 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
742 return True # only easy regexps possible
1200
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
743
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
744 def xapian_need_postproc(self):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
745 return self.case
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
746
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
747 def xapian_term(self, request, allterms):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
748 self.xapian_called = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
749 prefix = Xapian.Index.prefixMap['category']
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
750 if self.use_re:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
751 # basic regex matching per term
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
752 terms = []
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
753 found = None
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
754 n = len(prefix)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
755 for term in allterms():
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
756 if prefix == term[:n]:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
757 found = True
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
758 if self.search_re.match(term[n+1:]):
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
759 terms.append(term)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
760 elif found:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
761 continue
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
762
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
763 if not terms:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
764 return Query()
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
765 return Query(Query.OP_OR, terms)
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
766 else:
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
767 pattern = self._pattern.lower()
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
768 return UnicodeQuery('%s:%s' % (prefix, pattern))
b953b5ff4877 CategorySearch is live
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1199
diff changeset
769
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
770
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
771 class MimetypeSearch(BaseExpression):
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
772 """ Search for files belonging to a specific mimetype """
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
773
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
774 def __init__(self, pattern, use_re=False, case=True):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
775 """ Init a mimetype search
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
776
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
777 @param pattern: pattern to search for, ascii string or unicode
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
778 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
779 @param case: do case sensitive search, bool
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
780 """
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
781 self._pattern = pattern.lower()
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
782 self.negated = 0
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
783 self.use_re = use_re
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
784 self.case = False # not case-sensitive!
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
785 self.xapian_called = False
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
786 self._build_re(self._pattern, use_re=use_re, case=case)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
787
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
788 self._tag = 'mimetype:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
789 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
790 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
791 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
792 self._tag += 'case:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
793
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
794 def costs(self):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
795 return 5000 # cheaper than a TextSearch
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
796
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
797 def highlight_re(self):
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
798 return u""
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
799
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
800 def search(self, page):
3633
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
801 page_mimetype = u'text/%s' % page.pi['format']
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
802 matches = self.search_re.search(page_mimetype)
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
803 if matches and not self.negated or not matches and self.negated:
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
804 return [Match()]
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
805 else:
25822f14e4ea search: support mimetype: search for builtin (slow) engine, too
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3570
diff changeset
806 return None
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
807
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
808 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
809 return True # only easy regexps possible
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
810
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
811 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
812 return False # case-sensitivity would make no sense
1374
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
813
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
814 def xapian_term(self, request, allterms):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
815 self.xapian_called = True
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
816 prefix = Xapian.Index.prefixMap['mimetype']
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
817 if self.use_re:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
818 # basic regex matching per term
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
819 terms = []
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
820 found = None
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
821 n = len(prefix)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
822 for term in allterms():
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
823 if prefix == term[:n]:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
824 found = True
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
825 if self.search_re.match(term[n:]):
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
826 terms.append(term)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
827 elif found:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
828 continue
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
829
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
830 if not terms:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
831 return Query()
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
832 return Query(Query.OP_OR, terms)
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
833 else:
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
834 pattern = self._pattern
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
835 return UnicodeQuery('%s%s' % (prefix, pattern))
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
836
aa33bb2b40d6 mimetype-search is live, i18n fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1244
diff changeset
837
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
838 class DomainSearch(BaseExpression):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
839 """ Search for pages belonging to a specific domain """
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
840
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
841 def __init__(self, pattern, use_re=False, case=True):
1377
bb37beca7545 fixed system pages search, added underlay search, started with mtime filtering
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1376
diff changeset
842 """ Init a domain search
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
843
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
844 @param pattern: pattern to search for, ascii string or unicode
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
845 @param use_re: treat pattern as re of plain text, bool
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2235
diff changeset
846 @param case: do case sensitive search, bool
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
847 """
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
848 self._pattern = pattern.lower()
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
849 self.negated = 0
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
850 self.use_re = use_re
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
851 self.case = False # not case-sensitive!
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
852 self.xapian_called = False
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
853 self._build_re(self._pattern, use_re=use_re, case=case)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
854
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
855 self._tag = 'domain:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
856 if use_re:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
857 self._tag += 're:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
858 if case:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
859 self._tag += 'case:'
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
860
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
861 def costs(self):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
862 return 5000 # cheaper than a TextSearch
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
863
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
864 def highlight_re(self):
3285
bfd3f600650b fix highlighting for search results and WantedPages
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3162
diff changeset
865 return u""
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
866
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
867 def search(self, page):
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
868 logging.debug("DomainSearch searching page %r for (negated = %r) %r" % (page.page_name, self.negated, self._pattern))
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
869 checks = {'underlay': page.isUnderlayPage,
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
870 'standard': page.isStandardPage,
3114
338ad6aee6da AdvancedSearch: fix domain:system search crashing
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3113
diff changeset
871 'system': lambda page=page: wikiutil.isSystemPage(page.request, page.page_name),
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
872 }
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
873
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
874 try:
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
875 match = checks[self.pattern]()
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
876 except KeyError:
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
877 match = False
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
878
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1493
diff changeset
879 # Decide what to do with the results.
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
880 if self.negated:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
881 if match:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
882 result = None
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
883 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
884 result = [Match()] # represents "matched" (but as it was a negative match, we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
885 else: # not negated
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
886 if match:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
887 result = [Match()] # represents "matched" (but we have nothing to show)
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
888 else:
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
889 result = None
3162
153681321f8c logging: use module-level logger for MoinMoin.search.*
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3114
diff changeset
890 logging.debug("DomainSearch returning %r" % result)
2955
c8a3731ce61c search: add tests, add debug logging, fix bugs (negative search terms failing, and others)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2952
diff changeset
891 return result
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
892
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
893 def xapian_wanted(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
894 return True # only easy regexps possible
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
895
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
896 def xapian_need_postproc(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1495
diff changeset
897 return False # case-sensitivity would make no sense
1376
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
898
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
899 def xapian_term(self, request, allterms):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
900 self.xapian_called = True
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
901 prefix = Xapian.Index.prefixMap['domain']
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
902 if self.use_re:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
903 # basic regex matching per term
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
904 terms = []
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
905 found = None
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
906 n = len(prefix)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
907 for term in allterms():
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
908 if prefix == term[:n]:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
909 found = True
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
910 if self.search_re.match(term[n+1:]):
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
911 terms.append(term)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
912 elif found:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
913 continue
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
914
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
915 if not terms:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
916 return Query()
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
917 return Query(Query.OP_OR, terms)
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
918 else:
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
919 pattern = self._pattern
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
920 return UnicodeQuery('%s:%s' % (prefix, pattern))
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
921
649b60a4064a domain-secific search (for system pages)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1374
diff changeset
922
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
923 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
924 ### Parse Query
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
925 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
926
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
927 class QueryParser:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
928 """
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
929 Converts a String into a tree of Query objects.
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
930 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
931
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
932 def __init__(self, **kw):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
933 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
934 @keyword titlesearch: treat all terms as title searches
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
935 @keyword case: do case sensitive search
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
936 @keyword regex: treat all terms as regular expressions
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
937 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
938 self.titlesearch = kw.get('titlesearch', 0)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
939 self.case = kw.get('case', 0)
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
940 self.regex = kw.get('regex', 0)
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
941 self._M = wikiutil.ParserPrefix('-')
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
942
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
943 def _analyse_items(self, items):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
944 terms = AndExpression()
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
945 M = self._M
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
946 while items:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
947 item = items[0]
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
948 items = items[1:]
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
949
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
950 if isinstance(item, unicode):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
951 if item.lower() == 'or':
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
952 sub = terms.subterms()
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
953 if len(sub) >= 1:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
954 last = sub[-1]
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
955 if last.__class__ == OrExpression:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
956 orexpr = last
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
957 else:
3691
a5e33ae8f469 fix xapian search for '-a or b' query (see also 3669:0cbd8b41aebd), tests broken
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3678
diff changeset
958 # Note: do NOT reduce "terms" when it has a single subterm only!
a5e33ae8f469 fix xapian search for '-a or b' query (see also 3669:0cbd8b41aebd), tests broken
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3678
diff changeset
959 # Doing that would break "-someterm" searches as we rely on AndExpression
a5e33ae8f469 fix xapian search for '-a or b' query (see also 3669:0cbd8b41aebd), tests broken
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3678
diff changeset
960 # doing a "MatchAll AND_NOT someterm" for that case!
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
961 orexpr = OrExpression(terms)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
962 terms = AndExpression(orexpr)
3794
c62beb39c64c queryparser: reject invalid queries consisting only of "OR"
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3691
diff changeset
963 else:
c62beb39c64c queryparser: reject invalid queries consisting only of "OR"
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3691
diff changeset
964 raise ValueError('Nothing to OR')
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
965 remaining = self._analyse_items(items)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
966 if remaining.__class__ == OrExpression:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
967 for sub in remaining.subterms():
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
968 orexpr.append(sub)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
969 else:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
970 orexpr.append(remaining)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
971 break
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
972 elif item.lower() == 'and':
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
973 pass
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
974 else:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
975 # odd workaround; we should instead ignore this term
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
976 # and reject expressions that contain nothing after
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
977 # being parsed rather than rejecting an empty string
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
978 # before parsing...
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
979 if not item:
3408
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
980 raise ValueError("Term too short")
3460
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
981 regex = self.regex
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
982 case = self.case
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
983 if self.titlesearch:
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
984 terms.append(TitleSearch(item, use_re=regex, case=case))
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
985 else:
90893e33a312 QueryParser: use global configuration with text strings (fixes titlesearch)
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3408
diff changeset
986 terms.append(TextSearch(item, use_re=regex, case=case))
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
987 elif isinstance(item, tuple):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
988 negate = item[0] == M
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
989 title_search = self.titlesearch
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
990 regex = self.regex
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
991 case = self.case
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
992 linkto = False
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
993 lang = False
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
994 category = False
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
995 mimetype = False
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
996 domain = False
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
997 while len(item) > 1:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
998 m = item[0]
3827
a53e20c3ebab search: make query parser reject more invalid input
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3825
diff changeset
999 if m is None:
a53e20c3ebab search: make query parser reject more invalid input
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3825
diff changeset
1000 raise ValueError("Invalid search prefix")
a53e20c3ebab search: make query parser reject more invalid input
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3825
diff changeset
1001 elif m == M:
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1002 negate = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1003 elif "title".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1004 title_search = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1005 elif "regex".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1006 regex = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1007 elif "case".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1008 case = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1009 elif "linkto".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1010 linkto = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1011 elif "language".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1012 lang = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1013 elif "category".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1014 category = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1015 elif "mimetype".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1016 mimetype = True
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1017 elif "domain".startswith(m):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1018 domain = True
3827
a53e20c3ebab search: make query parser reject more invalid input
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3825
diff changeset
1019 else:
a53e20c3ebab search: make query parser reject more invalid input
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3825
diff changeset
1020 raise ValueError("Invalid search prefix")
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1021 item = item[1:]
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1022
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1023 text = item[0]
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1024 if category:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1025 obj = CategorySearch(text, use_re=regex, case=case)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1026 elif mimetype:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1027 obj = MimetypeSearch(text, use_re=regex, case=False)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1028 elif lang:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1029 obj = LanguageSearch(text, use_re=regex, case=False)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1030 elif linkto:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1031 obj = LinkSearch(text, use_re=regex, case=case)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1032 elif domain:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1033 obj = DomainSearch(text, use_re=regex, case=False)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1034 elif title_search:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1035 obj = TitleSearch(text, use_re=regex, case=case)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1036 else:
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1037 obj = TextSearch(text, use_re=regex, case=case)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1038 obj.negated = negate
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1039 terms.append(obj)
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1040 elif isinstance(item, list):
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1041 # strip off the opening parenthesis
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1042 terms.append(self._analyse_items(item[1:]))
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1043
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1044 # Note: do NOT reduce "terms" when it has a single subterm only!
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1045 # Doing that would break "-someterm" searches as we rely on AndExpression
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1046 # doing a "MatchAll AND_NOT someterm" for that case!
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1047 return terms
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1048
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1049 def parse_query(self, query):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1050 """ transform an string into a tree of Query objects """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1051 if isinstance(query, str):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1052 query = query.decode(config.charset)
3408
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1053 try:
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1054 items = wikiutil.parse_quoted_separated_ext(query,
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1055 name_value_separator=':',
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1056 prefixes='-',
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1057 multikey=True,
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1058 brackets=('()', ),
7fd796b2e573 queryparser: catch exceptions, raise exception on '""'
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3406
diff changeset
1059 quotes='\'"')
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1060 logging.debug("parse_quoted_separated items: %r" % items)
3825
df0218925ef8 search query parser: if query parsing raises a BracketError, at least tell what the problem is (and not just raise empty ValueError)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3813
diff changeset
1061 except wikiutil.BracketError, err:
df0218925ef8 search query parser: if query parsing raises a BracketError, at least tell what the problem is (and not just raise empty ValueError)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3813
diff changeset
1062 raise ValueError(str(err))
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1063 query = self._analyse_items(items)
3678
0cbd8b41aebd Xapian search: fix searching for negative terms only
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3633
diff changeset
1064 logging.debug("analyse_items query: %r" % query)
3406
ccdd2bbf76f7 reimplement and thereby fix QueryParser
Johannes Berg <johannes AT sipsolutions DOT net>
parents: 3398
diff changeset
1065 return query