annotate MoinMoin/search/builtin.py @ 2338:b902f2397c68

rename server and request Implementations by adding a prefix server_, request_
author Reimar Bauer <rb.proj AT googlemail DOT com>
date Thu, 05 Jul 2007 21:23:29 +0200
parents 01f05e74aa9c
children c8a3731ce61c
rev   line source
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
2 """
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
3 MoinMoin - search engine internals
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
4
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
5 @copyright: 2005 MoinMoin:FlorianFesti,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
6 2005 MoinMoin:NirSoffer,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
7 2005 MoinMoin:AlexanderSchremmer,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
8 2006 MoinMoin:ThomasWaldmann,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
9 2006 MoinMoin:FranzPletz
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
10 @license: GNU GPL, see COPYING for details
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
11 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
12
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
13 import time, os, errno, codecs, logging
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
14
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
15 from MoinMoin import wikiutil, config
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
16 from MoinMoin.Page import Page
1791
6dd2e29acffe Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1524
diff changeset
17 from MoinMoin.util import lock
920
a2498260eca5 do result processing in results.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
18 from MoinMoin.search.results import getSearchResults
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
19 from MoinMoin.search.queryparser import Match, TextMatch, TitleMatch
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
20
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
21 ##############################################################################
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
22 # Search Engine Abstraction
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
23 ##############################################################################
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
24
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
25 class UpdateQueue:
1979
79189058f117 search: add comment about possible refactoring
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1920
diff changeset
26 """ Represents a locked page queue on the disk
79189058f117 search: add comment about possible refactoring
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1920
diff changeset
27
79189058f117 search: add comment about possible refactoring
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1920
diff changeset
28 XXX: check whether we just can use the caching module
79189058f117 search: add comment about possible refactoring
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1920
diff changeset
29 """
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
30
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
31 def __init__(self, f, lock_dir):
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
32 """
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
33 @param f: file to write to
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
34 @param lock_dir: directory to save the lock files
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
35 """
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
36 self.file = f
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
37 self.writeLock = lock.WriteLock(lock_dir, timeout=10.0)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
38 self.readLock = lock.ReadLock(lock_dir, timeout=10.0)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
39
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
40 def exists(self):
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
41 """ Checks if the queue exists on the filesystem """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
42 return os.path.exists(self.file)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
43
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
44 def append(self, pagename):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
45 """ Append a page to queue
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
46
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
47 @param pagename: string to save
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
48 """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
49 if not self.writeLock.acquire(60.0):
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
50 logging.warning("can't add %r to xapian update queue: can't lock queue" % pagename)
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
51 return
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
52 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
53 f = codecs.open(self.file, 'a', config.charset)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
54 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
55 f.write(pagename + "\n")
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
56 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
57 f.close()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
58 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
59 self.writeLock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
60
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
61 def pages(self):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
62 """ Return list of pages in the queue """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
63 if self.readLock.acquire(1.0):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
64 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
65 return self._decode(self._read())
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
66 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
67 self.readLock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
68 return []
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
69
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
70 def remove(self, pages):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
71 """ Remove pages from the queue
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
72
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
73 When the queue is empty, the queue file is removed, so exists()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
74 can tell if there is something waiting in the queue.
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
75
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
76 @param pages: list of pagenames to remove
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
77 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
78 if self.writeLock.acquire(30.0):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
79 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
80 queue = self._decode(self._read())
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
81 for page in pages:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
82 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
83 queue.remove(page)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
84 except ValueError:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
85 pass
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
86 if queue:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
87 self._write(queue)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
88 else:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
89 self._removeFile()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
90 return True
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
91 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
92 self.writeLock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
93 return False
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
94
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
95 # Private -------------------------------------------------------
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
96
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
97 def _decode(self, data):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
98 """ Decode queue data
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
99
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
100 @param data: the data to decode
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
101 """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
102 pages = data.splitlines()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
103 return self._filterDuplicates(pages)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
104
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
105 def _filterDuplicates(self, pages):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
106 """ Filter duplicates in page list, keeping the order
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
107
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
108 @param pages: list of pages to filter
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
109 """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
110 unique = []
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
111 seen = {}
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
112 for name in pages:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
113 if not name in seen:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
114 unique.append(name)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
115 seen[name] = 1
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
116 return unique
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
117
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
118 def _read(self):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
119 """ Read and return queue data
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
120
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
121 This does not do anything with the data so we can release the
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
122 lock as soon as possible, enabling others to update the queue.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
123 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
124 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
125 f = codecs.open(self.file, 'r', config.charset)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
126 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
127 return f.read()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
128 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
129 f.close()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
130 except (OSError, IOError), err:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
131 if err.errno != errno.ENOENT:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
132 raise
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
133 return ''
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
134
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
135 def _write(self, pages):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
136 """ Write pages to queue file
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
137
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
138 Requires queue write locking.
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
139
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
140 @param pages: list of pages to write
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
141 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
142 # XXX use tmpfile/move for atomic replace on real operating systems
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
143 data = '\n'.join(pages) + '\n'
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
144 f = codecs.open(self.file, 'w', config.charset)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
145 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
146 f.write(data)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
147 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
148 f.close()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
149
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
150 def _removeFile(self):
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
151 """ Remove queue file
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
152
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
153 Requires queue write locking.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
154 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
155 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
156 os.remove(self.file)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
157 except OSError, err:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
158 if err.errno != errno.ENOENT:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
159 raise
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
160
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
161
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
162 class BaseIndex:
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
163 """ Represents a search engine index """
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
164
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
165 class LockedException(Exception):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
166 pass
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
167
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
168 def __init__(self, request):
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
169 """
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
170 @param request: current request
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
171 """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
172 self.request = request
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
173 main_dir = self._main_dir()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
174 self.dir = os.path.join(main_dir, 'index')
1430
fd7166892bd3 check for non-existing dir before calling os.makedirs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1428
diff changeset
175 if not os.path.exists(self.dir):
fd7166892bd3 check for non-existing dir before calling os.makedirs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1428
diff changeset
176 os.makedirs(self.dir)
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
177 self.sig_file = os.path.join(main_dir, 'complete')
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
178 lock_dir = os.path.join(main_dir, 'index-lock')
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
179 self.lock = lock.WriteLock(lock_dir, timeout=3600.0, readlocktimeout=60.0)
946
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
180 #self.read_lock = lock.ReadLock(lock_dir, timeout=3600.0)
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
181 self.update_queue = UpdateQueue(os.path.join(main_dir, 'update-queue'),
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
182 os.path.join(main_dir, 'update-queue-lock'))
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
183 self.remove_queue = UpdateQueue(os.path.join(main_dir, 'remove-queue'),
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
184 os.path.join(main_dir, 'remove-queue-lock'))
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
185
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
186 # Disabled until we have a sane way to build the index with a
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
187 # queue in small steps.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
188 ## if not self.exists():
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
189 ## self.indexPagesInNewThread(request)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
190
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
191 def _main_dir(self):
1211
d028d37e7105 raise NotImplemented instance
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
192 raise NotImplemented('...')
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
193
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
194 def exists(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
195 """ Check if index exists """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
196 return os.path.exists(self.sig_file)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
197
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
198 def mtime(self):
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
199 """ Modification time of the index """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
200 return os.path.getmtime(self.dir)
1205
73f576c4bca3 fix multiconfig merge and more informative SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1197
diff changeset
201
73f576c4bca3 fix multiconfig merge and more informative SystemInfo macro
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1197
diff changeset
202 def touch(self):
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
203 """ Touch the index """
1206
a9ffe6479012 nicer regexp support for TitleSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1205
diff changeset
204 os.utime(self.dir, None)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
205
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
206 def _search(self, query):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
207 """ Actually perfom the search (read-lock acquired)
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
208
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
209 @param query: the search query objects tree
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
210 """
1211
d028d37e7105 raise NotImplemented instance
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
211 raise NotImplemented('...')
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
212
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
213 def search(self, query, **kw):
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
214 """ Search for items in the index
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
215
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
216 @param query: the search query objects to pass to the index
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
217 """
946
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
218 #if not self.read_lock.acquire(1.0):
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
219 # raise self.LockedException
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
220 #try:
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
221 hits = self._search(query, **kw)
946
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
222 #finally:
72aeb2ba133d support complete rebuild of the index
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 945
diff changeset
223 # self.read_lock.release()
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
224 return hits
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
225
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
226 def update_page(self, pagename, now=1):
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
227 """ Update a single page in the index
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
228
1473
b5864c9492fb ensure new attachments trigger an index update, doc update for MoinMoin.search.Xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1467
diff changeset
229 @param pagename: the name of the page to update
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
230 @keyword now: do all updates now (default: 1)
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
231 """
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
232 self.update_queue.append(pagename)
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
233 if now:
1920
b06ef2a53efa 'make pylint', fixed lots of minor stuff found by pylint (and there is still lots left to do)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1832
diff changeset
234 self._do_queued_updates_InNewThread()
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
235
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
236 def remove_item(self, pagename, attachment=None, now=1):
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
237 """ Removes a page and all its revisions or a single attachment
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
238
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
239 @param pagename: name of the page to be removed
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
240 @keyword attachment: optional, only remove this attachment of the page
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
241 @keyword now: do all updates now (default: 1)
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
242 """
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
243 self.remove_queue.append('%s//%s' % (pagename, attachment or ''))
1480
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
244 if now:
c222d149e93f renaming and deleting pages works for all revisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1478
diff changeset
245 self._do_queued_updates_InNewThread()
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
246
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
247 def indexPages(self, files=None, mode='update'):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
248 """ Index all pages (and files, if given)
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
249
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
250 Can be called only from a script. To index pages during a user
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
251 request, use indexPagesInNewThread.
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
252 @keyword files: iterator or list of files to index additionally
1466
500e043cf7cd code documentation update
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1465
diff changeset
253 @keyword mode: set the mode of indexing the pages, either 'update', 'add' or 'rebuild'
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
254 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
255 if not self.lock.acquire(1.0):
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
256 logging.warning("can't index: can't acquire lock")
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
257 return
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
258 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
259 self._unsign()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
260 start = time.time()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
261 request = self._indexingRequest(self.request)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
262 self._index_pages(request, files, mode)
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
263 logging.info("indexing completed successfully in %0.2f seconds." %
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
264 (time.time() - start))
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
265 self._sign()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
266 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
267 self.lock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
268
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
269 def indexPagesInNewThread(self, files=None, mode='update'):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
270 """ Index all pages in a new thread
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
271
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
272 Should be called from a user request. From a script, use indexPages.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
273 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
274 # Prevent rebuilding the index just after it was finished
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
275 if self.exists():
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
276 return
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
277
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
278 from threading import Thread
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
279 indexThread = Thread(target=self._index_pages, args=(files, mode))
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
280 indexThread.setDaemon(True)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
281
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
282 # Join the index thread after current request finish, prevent
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
283 # Apache CGI from killing the process.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
284 def joinDecorator(finish):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
285 def func():
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
286 finish()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
287 indexThread.join()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
288 return func
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
289
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
290 self.request.finish = joinDecorator(self.request.finish)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
291 indexThread.start()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
292
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
293 def _index_pages(self, request, files=None, mode='update'):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
294 """ Index all pages (and all given files)
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
295
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
296 This should be called from indexPages or indexPagesInNewThread only!
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
297
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
298 This may take some time, depending on the size of the wiki and speed
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
299 of the machine.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
300
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
301 When called in a new thread, lock is acquired before the call,
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
302 and this method must release it when it finishes or fails.
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
303
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
304 @param request: current request
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
305 @keyword files: iterator or list of files to index additionally
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
306 @keyword mode: set the mode of indexing the pages, either 'update',
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
307 'add' or 'rebuild'
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
308 """
1211
d028d37e7105 raise NotImplemented instance
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
309 raise NotImplemented('...')
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
310
1478
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
311 def _remove_item(self, writer, page, attachment=None):
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
312 """ Remove a page and all its revisions from the index or just
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
313 an attachment of that page
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
314
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
315 @param pagename: name of the page to remove
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
316 @keyword attachment: optionally, just remove this attachment
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
317 """
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
318 raise NotImplemented('...')
53e9c1db5ace support for page/attachment removal and renaming (preliminary commit to show activity, needs more testing)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1473
diff changeset
319
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
320 def _do_queued_updates_InNewThread(self):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
321 """ do queued index updates in a new thread
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
322
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
323 Should be called from a user request. From a script, use indexPages.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
324 """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
325 if not self.lock.acquire(1.0):
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
326 logging.warning("can't index: can't acquire lock")
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
327 return
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
328 try:
1186
3172214eac96 Fixed some bugs, whitespaces at EOL, better i18n for SystemInfo
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 980
diff changeset
329 def lockedDecorator(f):
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
330 def func(*args, **kwargs):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
331 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
332 return f(*args, **kwargs)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
333 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
334 self.lock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
335 return func
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
336
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
337 from threading import Thread
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
338 indexThread = Thread(
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
339 target=lockedDecorator(self._do_queued_updates),
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
340 args=(self._indexingRequest(self.request), ))
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
341 indexThread.setDaemon(True)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
342
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
343 # Join the index thread after current request finish, prevent
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
344 # Apache CGI from killing the process.
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
345 def joinDecorator(finish):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
346 def func():
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
347 finish()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
348 indexThread.join()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
349 return func
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
350
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
351 self.request.finish = joinDecorator(self.request.finish)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
352 indexThread.start()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
353 except:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
354 self.lock.release()
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
355 raise
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
356
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
357 def _do_queued_updates(self, request, amount=5):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
358 """ Perform updates in the queues (read-lock acquired)
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
359
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
360 @param request: the current request
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
361 @keyword amount: how many updates to perform at once (default: 5)
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
362 """
1211
d028d37e7105 raise NotImplemented instance
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
363 raise NotImplemented('...')
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
364
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
365 def optimize(self):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
366 """ Optimize the index if possible """
1211
d028d37e7105 raise NotImplemented instance
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1206
diff changeset
367 raise NotImplemented('...')
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
368
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
369 def contentfilter(self, filename):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
370 """ Get a filter for content of filename and return unicode content.
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
371
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
372 @param filename: name of the file
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
373 """
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
374 request = self.request
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
375 mt = wikiutil.MimeType(filename=filename)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
376 for modulename in mt.module_name():
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
377 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
378 execute = wikiutil.importPlugin(request.cfg, 'filter', modulename)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
379 break
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
380 except wikiutil.PluginMissingError:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
381 pass
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
382 else:
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
383 logging.info("Cannot load filter for mimetype." + modulename)
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
384 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
385 data = execute(self, filename)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
386 #if debug:
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
387 # logging.info("Filter %s returned %d characters for file %s" % (modulename, len(data), filename))
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
388 except (OSError, IOError), err:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
389 data = ''
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
390 logging.warning("Filter %s threw error '%s' for file %s" % (modulename, str(err), filename))
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
391 return mt.mime_type(), data
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
392
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
393 def _indexingRequest(self, request):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
394 """ Return a new request that can be used for index building.
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
395
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
396 This request uses a security policy that lets the current user
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
397 read any page. Without this policy some pages will not render,
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
398 which will create broken pagelinks index.
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
399
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
400 @param request: current request
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
401 """
2338
b902f2397c68 rename server and request Implementations by adding a prefix server_, request_
Reimar Bauer <rb.proj AT googlemail DOT com>
parents: 2286
diff changeset
402 from MoinMoin.request.request_cli import Request
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
403 from MoinMoin.security import Permissions
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
404 request = Request(request.url)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
405 class SecurityPolicy(Permissions):
1791
6dd2e29acffe Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1524
diff changeset
406 def read(self, *args, **kw):
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
407 return True
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
408 request.user.may = SecurityPolicy(request.user)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
409 return request
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
410
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
411 def _unsign(self):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
412 """ Remove sig file - assume write lock acquired """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
413 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
414 os.remove(self.sig_file)
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
415 except OSError, err:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
416 if err.errno != errno.ENOENT:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
417 raise
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
418
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
419 def _sign(self):
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
420 """ Add sig file - assume write lock acquired """
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
421 f = file(self.sig_file, 'w')
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
422 try:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
423 f.write('')
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
424 finally:
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
425 f.close()
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
426
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
427
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
428 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
429 ### Searching
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
430 ##############################################################################
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
431
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
432 class Search:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
433 """ A search run """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
434
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
435 def __init__(self, request, query, sort='weight', mtime=None,
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
436 historysearch=0):
1499
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
437 """
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
438 @param request: current request
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
439 @param query: search query objects tree
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
440 @keyword sort: the sorting of the results (default: 'weight')
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
441 @keyword mtime: only show items newer than this timestamp (default: None)
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
442 @keyword historysearch: whether to show old revisions of a page (default: 0)
ffa0d1f81059 final polishing round adding docstrings, comments and fixing small issues
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1497
diff changeset
443 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
444 self.request = request
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
445 self.query = query
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
446 self.sort = sort
1433
6b0ea72d7665 mtime search works, added MoinMoin.support.parsedatetime, small fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1241
diff changeset
447 self.mtime = mtime
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
448 self.historysearch = historysearch
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
449 self.filtered = False
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
450 self.fs_rootpage = "FS" # XXX FS hardcoded
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
451
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
452 def run(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
453 """ Perform search and return results object """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
454 start = time.time()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
455 if self.request.cfg.xapian_search:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
456 hits = self._xapianSearch()
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
457 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
458 hits = self._moinSearch()
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
459
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
460 # important - filter deleted pages or pages the user may not read!
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
461 if not self.filtered:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
462 hits = self._filter(hits)
920
a2498260eca5 do result processing in results.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 919
diff changeset
463
1455
d20d04b46ed4 correct pagelinks for older revisions and seach results pages links
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1453
diff changeset
464 # when xapian was used, we can estimate the numer of matches
1497
ed3845759431 update comments/docstrings
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1496
diff changeset
465 # Note: hits can't be estimated by xapian with historysearch enabled
1484
499517b99c5f fix small typo
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1481
diff changeset
466 if not self.request.cfg.xapian_index_history and \
1481
d92f2261387e disable xapian hitcount estimation if index has multiple pagerevisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1480
diff changeset
467 self.request.cfg.xapian_search:
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
468 self.sort = None
1241
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
469 mset = self._xapianMset
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
470 estimated_hits = (
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
471 (mset.get_matches_estimated() == mset.get_matches_upper_bound()
1481
d92f2261387e disable xapian hitcount estimation if index has multiple pagerevisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1480
diff changeset
472 and
d92f2261387e disable xapian hitcount estimation if index has multiple pagerevisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1480
diff changeset
473 mset.get_matches_estimated() == mset.get_matches_lower_bound())
d92f2261387e disable xapian hitcount estimation if index has multiple pagerevisions
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1480
diff changeset
474 and '' or 'about',
1241
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
475 mset.get_matches_estimated())
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
476 else:
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
477 estimated_hits = None
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
478
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
479 return getSearchResults(self.request, self.query, hits, start,
1241
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
480 self.sort, estimated_hits)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
481
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
482 # ----------------------------------------------------------------
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
483 # Private!
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
484
980
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
485 def _xapianIndex(request):
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
486 """ Get the xapian index if possible
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
487
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
488 @param request: current request
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
489 """
980
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
490 try:
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
491 from MoinMoin.search.Xapian import Index
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
492 index = Index(request)
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
493 except ImportError:
1506
21e85d9d355e small cleanups, advanced ui can preserve search terms from other ui
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
494 return None
980
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
495
1506
21e85d9d355e small cleanups, advanced ui can preserve search terms from other ui
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1499
diff changeset
496 if index.exists():
980
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
497 return index
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
498
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
499 _xapianIndex = staticmethod(_xapianIndex)
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
500
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
501 def _xapianSearch(self):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
502 """ Search using Xapian
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
503
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
504 Get a list of pages using fast xapian search and
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
505 return moin search in those pages if needed.
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
506 """
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
507 clock = self.request.clock
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
508 pages = None
980
f472ddeba121 SystemInfo macro extended with the state of the index, ensure fallback to moinSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 946
diff changeset
509 index = self._xapianIndex(self.request)
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
510
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
511 if index and self.query.xapian_wanted():
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
512 clock.start('_xapianSearch')
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
513 try:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
514 from MoinMoin.support import xapwrap
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
515
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
516 clock.start('_xapianQuery')
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
517 query = self.query.xapian_term(self.request, index.allterms)
2222
90484c86443a xapian: handling of deprecated get_description
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1979
diff changeset
518 try:
90484c86443a xapian: handling of deprecated get_description
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1979
diff changeset
519 description = query.get_description() # deprecated since xapian 1.0, removal in 1.1
90484c86443a xapian: handling of deprecated get_description
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1979
diff changeset
520 except AttributeError:
90484c86443a xapian: handling of deprecated get_description
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1979
diff changeset
521 description = str(query)
90484c86443a xapian: handling of deprecated get_description
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1979
diff changeset
522 logging.info("xapianSearch: query = %r" % description)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
523 query = xapwrap.index.QObjQuery(query)
1465
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
524 enq, mset, hits = index.search(query, sort=self.sort,
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
525 historysearch=self.historysearch)
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
526 clock.stop('_xapianQuery')
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
527
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
528 #logging.info("xapianSearch: finds: %r" % hits)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
529 def dict_decode(d):
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
530 """ decode dict values to unicode """
1793
2a4caa295346 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1792
diff changeset
531 for key in d:
2a4caa295346 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1792
diff changeset
532 d[key] = d[key].decode(config.charset)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
533 return d
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
534 pages = [dict_decode(hit['values']) for hit in hits]
1792
c907c2942372 Eclipse PyDev Check: fixed lots of its errors and warnings
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1791
diff changeset
535 logging.info("xapianSearch: finds pages: %r" % pages)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
536
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
537 self._xapianEnquire = enq
1241
cba856bc0c05 estimate numer of hits correctly
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1237
diff changeset
538 self._xapianMset = mset
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
539 self._xapianIndex = index
921
45e286183872 abstraction work on search engine index & cleanups
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 920
diff changeset
540 except BaseIndex.LockedException:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
541 pass
924
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 921
diff changeset
542 #except AttributeError:
22f6f589162a term-based regexp search
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 921
diff changeset
543 # pages = []
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
544
1225
d7d17b4285f5 ensure timings are correct for _xapianSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1211
diff changeset
545 try:
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
546 # xapian handled the full query
1225
d7d17b4285f5 ensure timings are correct for _xapianSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1211
diff changeset
547 if not self.query.xapian_need_postproc():
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
548 clock.start('_xapianProcess')
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
549 try:
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
550 return self._getHits(hits, self._xapianMatch)
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
551 finally:
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
552 clock.stop('_xapianProcess')
1225
d7d17b4285f5 ensure timings are correct for _xapianSearch
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1211
diff changeset
553 finally:
1234
e38e27967a97 added more timers to xapian code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1225
diff changeset
554 clock.stop('_xapianSearch')
1237
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
555 else:
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
556 # we didn't use xapian in this request
0a947454dec7 use xapian for sorting search results
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1234
diff changeset
557 self.request.cfg.xapian_search = 0
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
558
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
559 # some postprocessing by _moinSearch is required
1197
fc2d00e2bb6b case-sensitive searches work again by using moinSearch for post processing
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1195
diff changeset
560 return self._moinSearch(pages)
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
561
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
562 def _xapianMatchDecider(self, term, pos):
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
563 """ Returns correct Match object for a Xapian match
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
564
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
565 @param term: the term as string
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
566 @param pos: starting position of the match
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
567 """
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
568 if term[0] == 'S': # TitleMatch
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
569 return TitleMatch(start=pos, end=pos+len(term)-1)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
570 else: # TextMatch (incl. headers)
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
571 return TextMatch(start=pos, end=pos+len(term))
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
572
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
573 def _xapianMatch(self, uid, page=None):
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
574 """ Get all relevant Xapian matches per document id
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
575
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
576 @param uid: the id of the document in the xapian index
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
577 """
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
578 positions = {}
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
579 term = self._xapianEnquire.get_matching_terms_begin(uid)
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
580 while term != self._xapianEnquire.get_matching_terms_end(uid):
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
581 term_name = term.get_term()
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
582 for pos in self._xapianIndex.termpositions(uid, term.get_term()):
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
583 if pos not in positions or \
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
584 len(positions[pos]) < len(term_name):
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
585 positions[pos] = term_name
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
586 term.next()
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
587 matches = [self._xapianMatchDecider(term, pos) for pos, term
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
588 in positions.iteritems()]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
589
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
590 if not matches:
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
591 return [Match()] # dummy for metadata, we got a match!
1195
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
592
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
593 return matches
d40445ea30af bugfixes for regexp & link search and file indexing (i.e. due to xapwrap modifications)
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1186
diff changeset
594
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
595 def _moinSearch(self, pages=None):
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
596 """ Search pages using moin's built-in full text search
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
597
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
598 Return list of tuples (page, match). The list may contain
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
599 deleted pages or pages the user may not read.
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
600
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
601 @keyword pages: optional list of pages to search in
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
602 """
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
603 self.request.clock.start('_moinSearch')
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
604 if pages is None:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
605 # if we are not called from _xapianSearch, we make a full pagelist,
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
606 # but don't search attachments (thus attachment name = '')
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
607 pages = [{'pagename': p, 'attachment': '', 'wikiname': 'Self', } for p in self._getPageList()]
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
608 hits = self._getHits(pages, self._moinMatch)
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
609 self.request.clock.stop('_moinSearch')
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
610 return hits
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
611
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
612 def _moinMatch(self, page, uid=None):
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
613 """ Get all matches from regular moinSearch
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
614
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
615 @param page: the current page instance
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
616 """
1522
98b68b87a6f1 fix mimetype selection and showing attachment matches
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1506
diff changeset
617 if page:
98b68b87a6f1 fix mimetype selection and showing attachment matches
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1506
diff changeset
618 return self.query.search(page)
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
619
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
620 def _getHits(self, pages, matchSearchFunction):
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
621 """ Get the hit tuples in pages through matchSearchFunction """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
622 hits = []
1465
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
623 revisionCache = {}
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
624 fs_rootpage = self.fs_rootpage
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
625 for hit in pages:
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
626 if 'values' in hit:
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
627 valuedict = hit['values']
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
628 uid = hit['uid']
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
629 else:
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
630 valuedict = hit
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
631 uid = None
926
134b5ee99046 basic fetching of matches for terms with xapian
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 924
diff changeset
632
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
633 wikiname = valuedict['wikiname']
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
634 pagename = valuedict['pagename']
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
635 attachment = valuedict['attachment']
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
636
1453
5b60d84fddb7 check if there's the revision
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1451
diff changeset
637 if 'revision' in valuedict and valuedict['revision']:
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
638 revision = int(valuedict['revision'])
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
639 else:
1455
d20d04b46ed4 correct pagelinks for older revisions and seach results pages links
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1453
diff changeset
640 revision = 0
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
641
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
642 if wikiname in (self.request.cfg.interwikiname, 'Self'): # THIS wiki
1441
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
643 page = Page(self.request, pagename, rev=revision)
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
644 if not self.historysearch and revision and \
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
645 page.getRevList()[0] != revision:
05482b439f89 optional history indexing and search is working
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1433
diff changeset
646 continue
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
647 if attachment:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
648 if pagename == fs_rootpage: # not really an attachment
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
649 page = Page(self.request, "%s/%s" % (fs_rootpage, attachment))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
650 hits.append((wikiname, page, None, None))
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
651 else:
1522
98b68b87a6f1 fix mimetype selection and showing attachment matches
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1506
diff changeset
652 matches = matchSearchFunction(page=None, uid=uid)
98b68b87a6f1 fix mimetype selection and showing attachment matches
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1506
diff changeset
653 hits.append((wikiname, page, attachment, matches))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
654 else:
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
655 matches = matchSearchFunction(page=page, uid=uid)
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
656 if matches:
1465
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
657 if not self.historysearch and \
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
658 pagename in revisionCache and \
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
659 revisionCache[pagename][0] < revision:
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
660 hits.remove(revisionCache[pagename][1])
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
661 del revisionCache[pagename]
945
248789a3f155 improving positions fetched from xapian, TitleMatch support, bugfixes for the current code
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 926
diff changeset
662 hits.append((wikiname, page, attachment, matches))
1465
f61493ff5e10 added support for xapwrap for collapsing and sorting by relevance/key, ensure to just show the newest revision in _getHits if historysearch is disabled
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1455
diff changeset
663 revisionCache[pagename] = (revision, hits[-1])
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
664 else: # other wiki
1455
d20d04b46ed4 correct pagelinks for older revisions and seach results pages links
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1453
diff changeset
665 hits.append((wikiname, pagename, attachment, None, revision))
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
666 return hits
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
667
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
668 def _getPageList(self):
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
669 """ Get list of pages to search in
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
670
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
671 If the query has a page filter, use it to filter pages before
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
672 searching. If not, get a unfiltered page list. The filtering
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
673 will happen later on the hits, which is faster with current
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
674 slow storage.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
675 """
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1484
diff changeset
676 filter_ = self.query.pageFilter()
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1484
diff changeset
677 if filter_:
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
678 # There is no need to filter the results again.
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
679 self.filtered = True
1494
cdcdab722a96 domain-specific search is implemented
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1484
diff changeset
680 return self.request.rootpage.getPageList(filter=filter_)
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
681 else:
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
682 return self.request.rootpage.getPageList(user='', exists=0)
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
683
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
684 def _filter(self, hits):
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
685 """ Filter out deleted or acl protected pages
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 2222
diff changeset
686
1467
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
687 @param hits: list of hits
26c8ab85dc86 completed code documentation for MoinMoin.search.builtin
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1466
diff changeset
688 """
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
689 userMayRead = self.request.user.may.read
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
690 fs_rootpage = self.fs_rootpage + "/"
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
691 thiswiki = (self.request.cfg.interwikiname, 'Self')
1433
6b0ea72d7665 mtime search works, added MoinMoin.support.parsedatetime, small fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1241
diff changeset
692 filtered = [(wikiname, page, attachment, match)
6b0ea72d7665 mtime search works, added MoinMoin.support.parsedatetime, small fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1241
diff changeset
693 for wikiname, page, attachment, match in hits
6b0ea72d7665 mtime search works, added MoinMoin.support.parsedatetime, small fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1241
diff changeset
694 if (not wikiname in thiswiki or
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
695 page.exists() and userMayRead(page.page_name) or
1433
6b0ea72d7665 mtime search works, added MoinMoin.support.parsedatetime, small fixes
Franz Pletz <fpletz AT franz-pletz DOT org>
parents: 1241
diff changeset
696 page.page_name.startswith(fs_rootpage)) and
1496
70e94a679c47 cleanup whitespace, add/fix comments
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1494
diff changeset
697 (not self.mtime or self.mtime <= page.mtime_usecs()/1000000)]
919
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
698 return filtered
5469c8b911a4 Splitting out MoinMoin/search.py to MoinMoin/search/*.py
Franz Pletz <fpletz AT franz-pletz DOT org>
parents:
diff changeset
699