annotate MoinMoin/storage/middleware/indexing.py @ 1918:f0b346ac6338

added return_rev param to store_revision() to avoid returning a revision object without need
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Sun, 20 Jan 2013 23:36:37 +0100
parents 4a054063fbb0
children 2bd864ed640e 5281fec759b1
rev   line source
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1 # Copyright: 2011 MoinMoin:RonnyPfannschmidt
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
2 # Copyright: 2011 MoinMoin:ThomasWaldmann
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
3 # Copyright: 2011 MoinMoin:MichaelMayorov
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
4 # License: GNU GPL v2 (or any later version), see LICENSE.txt for details.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
5
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
6 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
7 MoinMoin - indexing middleware
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
8
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
9 The backends and stores moin uses are rather simple, it is mostly just a
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
10 unsorted / unordered bunch of revisions (meta and data) with iteration.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
11
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
12 The indexer middleware adds the needed power: after all metadata and data
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
13 is indexed, we can do all sorts of operations on the indexer level:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
14 * searching
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
15 * lookup by name, uuid, ...
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
16 * selecting
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
17 * listing
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
18
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
19 Using Whoosh (a fast pure-Python indexing and search library), we build,
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
20 maintain and use 2 indexes:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
21
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
22 * "all revisions" index (big, needed for history search)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
23 * "latest revisions" index (smaller, just the current revisions)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
24
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
25 When creating or destroying revisions, indexes are automatically updated.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
26
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
27 There is also code to do a full index rebuild in case it gets damaged, lost
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
28 or needs rebuilding for other reasons. There is also index update code to
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
29 do a quick "intelligent" update of a "mostly ok" index, that just adds,
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
30 updates, deletes stuff that is different in backend compared to current index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
31
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
32 Indexing is the only layer that can easily deal with **names** (it can
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
33 easily translate names to UUIDs and vice versa) and with **items** (it
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
34 knows current revision, it can easily list and order historial revisions),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
35 using the index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
36
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
37 The layers below are using UUIDs to identify revisions meta and data:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
38
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
39 * revid (metaid) - a UUID identifying a specific revision (revision metadata)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
40 * dataid - a UUID identifying some specific revision data (optional), it is
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
41 just stored into revision metadata.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
42 * itemid - a UUID identifying an item (== a set of revisions), it is just
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
43 stored into revision metadata. itemid is only easily usable on indexing
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
44 level.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
45
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
46 Many methods provided by the indexing middleware will be fast, because they
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
47 will not access the layers below (like the backend), but just the index files,
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
48 usually it is even just the small and thus quick latest-revs index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
49 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
50
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
51
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
52 from __future__ import absolute_import, division
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
53
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
54 import os
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
55 import shutil
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
56 import itertools
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
57 import time
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
58 import datetime
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
59 from StringIO import StringIO
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
60
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
61 from flask import request
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
62 from flask import g as flaskg
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
63 from flask import current_app as app
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
64
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
65 from whoosh.fields import Schema, TEXT, ID, IDLIST, NUMERIC, DATETIME, KEYWORD, BOOLEAN
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
66 from whoosh.index import EmptyIndexError
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
67 from whoosh.writing import AsyncWriter
1009
11a415638e12 fixes #68 search has now the PseudoField username:bla which translates to userid:id_of_bla
Bruno Gola <brunogola@gmail.com>
parents: 975
diff changeset
68 from whoosh.qparser import QueryParser, MultifieldParser, RegexPlugin, \
11a415638e12 fixes #68 search has now the PseudoField username:bla which translates to userid:id_of_bla
Bruno Gola <brunogola@gmail.com>
parents: 975
diff changeset
69 PseudoFieldPlugin
11a415638e12 fixes #68 search has now the PseudoField username:bla which translates to userid:id_of_bla
Bruno Gola <brunogola@gmail.com>
parents: 975
diff changeset
70 from whoosh.qparser import WordNode
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
71 from whoosh.query import Every, Term
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
72 from whoosh.sorting import FieldFacet
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
73
1267
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
74 from MoinMoin import log
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
75 logging = log.getLogger(__name__)
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
76
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
77 from MoinMoin.config import WIKINAME, NAME, NAME_EXACT, MTIME, CONTENTTYPE, TAGS, \
1357
5b88a6ae67a5 Add summary meta key to items. Summary is shown on +show and +search, but not
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1337
diff changeset
78 LANGUAGE, USERID, ADDRESS, HOSTNAME, SIZE, ACTION, COMMENT, SUMMARY, \
1334
7775b101e182 implement EXTERNALLINKS metadata (modified/extended patch of Reimar Bauer)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1274
diff changeset
79 CONTENT, EXTERNALLINKS, ITEMLINKS, ITEMTRANSCLUSIONS, ACL, EMAIL, OPENID, \
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
80 ITEMID, REVID, CURRENT, PARENTID, \
1438
f89fffd81540 Add ptime meta key to blog post items, logo and supertags meta keys to blog home items.
Pavel Sviderski <pavel@psviderski.name>
parents: 1357
diff changeset
81 PTIME, \
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
82 LATEST_REVS, ALL_REVS, \
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
83 CONTENTTYPE_USER
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
84 from MoinMoin.constants import keys
1647
8652dc6f6443 Add ITEMTYPE to Whoosh index.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1449
diff changeset
85 from MoinMoin.constants.keys import ITEMTYPE
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
86
1009
11a415638e12 fixes #68 search has now the PseudoField username:bla which translates to userid:id_of_bla
Bruno Gola <brunogola@gmail.com>
parents: 975
diff changeset
87 from MoinMoin import user
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
88 from MoinMoin.search.analyzers import item_name_analyzer, MimeTokenizer, AclTokenizer
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
89 from MoinMoin.themes import utctimestamp
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
90 from MoinMoin.util.crypto import make_uuid
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
91 from MoinMoin.storage.middleware.validation import ContentMetaSchema, UserMetaSchema
1274
8c275efc6e8c refactored and cleaned up user module code and tests
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1267
diff changeset
92 from MoinMoin.storage.error import NoSuchItemError, ItemAlreadyExistsError
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
93
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
94
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
95 WHOOSH_FILESTORAGE = 'FileStorage'
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
96 INDEXES = [LATEST_REVS, ALL_REVS, ]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
97
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
98
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
99 def backend_to_index(meta, content, schema, wikiname):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
100 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
101 Convert backend metadata/data to a whoosh document.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
102
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
103 :param meta: revision meta from moin backend
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
104 :param content: revision data converted to indexable content
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
105 :param schema: whoosh schema
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
106 :param wikiname: interwikiname of this wiki
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
107 :returns: document to put into whoosh index
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
108 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
109 doc = dict([(str(key), value)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
110 for key, value in meta.items()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
111 if key in schema])
1440
b4d9048ce48a Change the uppercase variable to lowercase
Pavel Sviderski <pavel@psviderski.name>
parents: 1438
diff changeset
112 for key in [MTIME, PTIME]:
b4d9048ce48a Change the uppercase variable to lowercase
Pavel Sviderski <pavel@psviderski.name>
parents: 1438
diff changeset
113 if key in doc:
1438
f89fffd81540 Add ptime meta key to blog post items, logo and supertags meta keys to blog home items.
Pavel Sviderski <pavel@psviderski.name>
parents: 1357
diff changeset
114 # we have UNIX UTC timestamp (int), whoosh wants datetime
1440
b4d9048ce48a Change the uppercase variable to lowercase
Pavel Sviderski <pavel@psviderski.name>
parents: 1438
diff changeset
115 doc[key] = datetime.datetime.utcfromtimestamp(doc[key])
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
116 doc[NAME_EXACT] = doc[NAME]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
117 doc[WIKINAME] = wikiname
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
118 doc[CONTENT] = content
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
119 return doc
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
120
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
121
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
122 from MoinMoin.util.mime import Type, type_moin_document
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
123 from MoinMoin.util.tree import moin_page
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
124 from MoinMoin.converter import default_registry
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
125 from MoinMoin.util.iri import Iri
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
126
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
127 def convert_to_indexable(meta, data, is_new=False):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
128 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
129 Convert revision data to a indexable content.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
130
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
131 :param meta: revision metadata (gets updated as a side effect)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
132 :param data: revision data (file-like)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
133 please make sure that the content file is
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
134 ready to read all indexable content from it. if you have just
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
135 written that content or already read from it, you need to call
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
136 rev.seek(0) before calling convert_to_indexable(rev).
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
137 :param is_new: if this is for a new revision and we shall modify
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
138 metadata as a side effect
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
139 :returns: indexable content, text/plain, unicode object
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
140 """
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
141 class PseudoRev(object):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
142 def __init__(self, meta, data):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
143 self.meta = meta
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
144 self.data = data
894
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
145 self.revid = meta.get(REVID)
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
146 class PseudoItem(object):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
147 def __init__(self, name):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
148 self.name = name
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
149 self.item = PseudoItem(meta.get(NAME))
894
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
150 def read(self, *args, **kw):
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
151 return self.data.read(*args, **kw)
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
152 def seek(self, *args, **kw):
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
153 return self.data.seek(*args, **kw)
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
154 def tell(self, *args, **kw):
93ae039cb363 fixed indexable and diff views, use revids not revnos for diffs and urls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 893
diff changeset
155 return self.data.tell(*args, **kw)
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
156
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
157 rev = PseudoRev(meta, data)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
158 try:
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
159 # TODO use different converter mode?
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
160 # Maybe we want some special mode for the input converters so they emit
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
161 # different output than for normal rendering), esp. for the non-markup
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
162 # content types (images, etc.).
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
163 input_contenttype = meta[CONTENTTYPE]
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
164 output_contenttype = 'text/plain'
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
165 type_input_contenttype = Type(input_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
166 type_output_contenttype = Type(output_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
167 reg = default_registry
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
168 # first try a direct conversion (this could be useful for extraction
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
169 # of (meta)data from binary types, like from images or audio):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
170 conv = reg.get(type_input_contenttype, type_output_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
171 if conv:
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
172 doc = conv(rev, input_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
173 return doc
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
174 # otherwise try via DOM as intermediate format (this is useful if
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
175 # input type is markup, to get rid of the markup):
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
176 input_conv = reg.get(type_input_contenttype, type_moin_document)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
177 refs_conv = reg.get(type_moin_document, type_moin_document, items='refs')
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
178 output_conv = reg.get(type_moin_document, type_output_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
179 if input_conv and output_conv:
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
180 doc = input_conv(rev, input_contenttype)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
181 # We do not convert smileys, includes, macros, links, because
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
182 # it does not improve search results or even makes results worse.
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
183 # We do run the referenced converter, though, to extract links and
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
184 # transclusions.
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
185 if is_new:
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
186 # we only can modify new, uncommitted revisions, not stored revs
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
187 i = Iri(scheme='wiki', authority='', path='/' + meta[NAME])
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
188 doc.set(moin_page.page_href, unicode(i))
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
189 refs_conv(doc)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
190 # side effect: we update some metadata:
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
191 meta[ITEMLINKS] = refs_conv.get_links()
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
192 meta[ITEMTRANSCLUSIONS] = refs_conv.get_transclusions()
1334
7775b101e182 implement EXTERNALLINKS metadata (modified/extended patch of Reimar Bauer)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1274
diff changeset
193 meta[EXTERNALLINKS] = refs_conv.get_external_links()
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
194 doc = output_conv(doc)
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
195 return doc
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
196 # no way
969
5bf6d7a2ffcf Convert all %r, %s, %x, %d, %i, %o to new Format String Syntax
Vedran Mileti? <rivanvx@gmail.com>
parents: 954
diff changeset
197 raise TypeError("No converter for {0} --> {1}".format(input_contenttype, output_contenttype))
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
198 except Exception as e: # catch all exceptions, we don't want to break an indexing run
969
5bf6d7a2ffcf Convert all %r, %s, %x, %d, %i, %o to new Format String Syntax
Vedran Mileti? <rivanvx@gmail.com>
parents: 954
diff changeset
199 logging.exception("Exception happened in conversion of item {0!r} rev {1} contenttype {2}:".format(meta[NAME], meta.get(REVID, 'new'), meta.get(CONTENTTYPE, '')))
5bf6d7a2ffcf Convert all %r, %s, %x, %d, %i, %o to new Format String Syntax
Vedran Mileti? <rivanvx@gmail.com>
parents: 954
diff changeset
200 doc = u'ERROR [{0!s}]'.format(e)
893
3cbdb8f14e98 content indexing: do a real convert_to_indexable implementation
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 892
diff changeset
201 return doc
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
202
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
203
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
204 class IndexingMiddleware(object):
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
205 def __init__(self, index_storage, backend, wiki_name=None, acl_rights_contents=[], **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
206 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
207 Store params, create schemas.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
208 """
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
209 self.index_storage = index_storage
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
210 self.backend = backend
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
211 self.wikiname = wiki_name
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
212 self.ix = {} # open indexes
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
213 self.schemas = {} # existing schemas
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
214
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
215 common_fields = {
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
216 # wikiname so we can have a shared index in a wiki farm, always check this!
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
217 WIKINAME: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
218 # tokenized NAME from metadata - use this for manual searching from UI
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
219 NAME: TEXT(stored=True, multitoken_query="and", analyzer=item_name_analyzer(), field_boost=2.0),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
220 # unmodified NAME from metadata - use this for precise lookup by the code.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
221 # also needed for wildcard search, so the original string as well as the query
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
222 # (with the wildcard) is not cut into pieces.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
223 NAME_EXACT: ID(field_boost=3.0),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
224 # revision id (aka meta id)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
225 REVID: ID(unique=True, stored=True),
892
79f80ce3cf78 implement parents (key PARENTID), add it to index schema
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 886
diff changeset
226 # parent revision id
79f80ce3cf78 implement parents (key PARENTID), add it to index schema
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 886
diff changeset
227 PARENTID: ID(stored=True),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
228 # MTIME from revision metadata (converted to UTC datetime)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
229 MTIME: DATETIME(stored=True),
1777
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
230 # publish time from metadata (converted to UTC datetime)
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
231 PTIME: DATETIME(stored=True),
1647
8652dc6f6443 Add ITEMTYPE to Whoosh index.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1449
diff changeset
232 # ITEMTYPE from metadata, always matched exactly hence ID
8652dc6f6443 Add ITEMTYPE to Whoosh index.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1449
diff changeset
233 ITEMTYPE: ID(stored=True),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
234 # tokenized CONTENTTYPE from metadata
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
235 CONTENTTYPE: TEXT(stored=True, multitoken_query="and", analyzer=MimeTokenizer()),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
236 # unmodified list of TAGS from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
237 TAGS: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
238 LANGUAGE: ID(stored=True),
883
70bd8a85eee5 indexing: checked some stuff from comments, cosmetic changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 870
diff changeset
239 # USERID from metadata
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
240 USERID: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
241 # ADDRESS from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
242 ADDRESS: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
243 # HOSTNAME from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
244 HOSTNAME: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
245 # SIZE from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
246 SIZE: NUMERIC(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
247 # ACTION from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
248 ACTION: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
249 # tokenized COMMENT from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
250 COMMENT: TEXT(stored=True),
1357
5b88a6ae67a5 Add summary meta key to items. Summary is shown on +show and +search, but not
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1337
diff changeset
251 # SUMMARY from metadata
5b88a6ae67a5 Add summary meta key to items. Summary is shown on +show and +search, but not
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1337
diff changeset
252 SUMMARY: TEXT(stored=True),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
253 # data (content), converted to text/plain and tokenized
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
254 CONTENT: TEXT(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
255 }
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
256
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
257 latest_revs_fields = {
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
258 # ITEMID from metadata - as there is only latest rev of same item here, it is unique
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
259 ITEMID: ID(unique=True, stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
260 # unmodified list of ITEMLINKS from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
261 ITEMLINKS: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
262 # unmodified list of ITEMTRANSCLUSIONS from metadata
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
263 ITEMTRANSCLUSIONS: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
264 # tokenized ACL from metadata
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
265 ACL: TEXT(analyzer=AclTokenizer(acl_rights_contents), multitoken_query="and", stored=True),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
266 }
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
267 latest_revs_fields.update(**common_fields)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
268
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
269 userprofile_fields = {
1369
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
270 # Note: email / openid (if given) should be unique, but we might
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
271 # have lots of empty values if it is not given and thus it is NOT
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
272 # unique overall! Wrongly declaring it unique would lead to whoosh
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
273 # killing other users from index when update_document() is called!
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
274 EMAIL: ID(stored=True),
e34b627881e4 EMAIL and OPENID fields are not unique as we store empty values there, fixes #185
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1357
diff changeset
275 OPENID: ID(stored=True),
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
276 }
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
277 latest_revs_fields.update(**userprofile_fields)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
278
1678
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
279 # XXX This is a highly adhoc way to support indexing of ticket items.
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
280 ticket_fields = {
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
281 'effort': NUMERIC(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
282 'difficulty': NUMERIC(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
283 'severity': NUMERIC(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
284 'priority': NUMERIC(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
285 'status': ID(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
286 'assigned_to': ID(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
287 'superseded_by': ID(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
288 'depends_on': ID(stored=True),
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
289 }
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
290 latest_revs_fields.update(**ticket_fields)
51e1e8b0b7f2 Implement Ticket item as 'ticket' itemtype.
Cheer Xiao <xiaqqaix@gmail.com>
parents: 1647
diff changeset
291
1449
b472a506db2e Fix supertags widget. Clicking on supertags modify the filter for the blog entries main listing.
Pavel Sviderski <pavel@psviderski.name>
parents: 1445
diff changeset
292 blog_entry_fields = {
1438
f89fffd81540 Add ptime meta key to blog post items, logo and supertags meta keys to blog home items.
Pavel Sviderski <pavel@psviderski.name>
parents: 1357
diff changeset
293 }
1449
b472a506db2e Fix supertags widget. Clicking on supertags modify the filter for the blog entries main listing.
Pavel Sviderski <pavel@psviderski.name>
parents: 1445
diff changeset
294 latest_revs_fields.update(**blog_entry_fields)
1438
f89fffd81540 Add ptime meta key to blog post items, logo and supertags meta keys to blog home items.
Pavel Sviderski <pavel@psviderski.name>
parents: 1357
diff changeset
295
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
296 all_revs_fields = {
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
297 ITEMID: ID(stored=True),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
298 }
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
299 all_revs_fields.update(**common_fields)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
300
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
301 latest_revisions_schema = Schema(**latest_revs_fields)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
302 all_revisions_schema = Schema(**all_revs_fields)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
303
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
304 # Define dynamic fields
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
305 dynamic_fields = [("*_id", ID(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
306 ("*_text", TEXT(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
307 ("*_keyword", KEYWORD(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
308 ("*_numeric", NUMERIC(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
309 ("*_datetime", DATETIME(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
310 ("*_boolean", BOOLEAN(stored=True)),
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
311 ]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
312
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
313 # Adding dynamic fields to schemas
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
314 for glob, field_type in dynamic_fields:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
315 latest_revisions_schema.add(glob, field_type, glob=True)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
316 all_revisions_schema.add(glob, field_type, glob=True)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
317
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
318 # schemas are needed by query parser and for index creation
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
319 self.schemas[ALL_REVS] = all_revisions_schema
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
320 self.schemas[LATEST_REVS] = latest_revisions_schema
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
321
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
322 # what fields could whoosh result documents have (no matter whether all revs index
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
323 # or latest revs index):
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
324 self.common_fields = set(latest_revs_fields.keys()) & set(all_revs_fields.keys())
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
325
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
326 def get_storage_params(self, tmp=False):
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
327 kind, params, kw = self.index_storage
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
328 params, kw = list(params), dict(kw) # better make a (mutable) copy
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
329 if kind == WHOOSH_FILESTORAGE:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
330 # index_storage = 'FileStorage', (index_dir, ), {}
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
331 if tmp:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
332 params[0] += '.temp'
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
333 from whoosh.filedb.filestore import FileStorage
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
334 cls = FileStorage
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
335 else:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
336 raise ValueError("index_storage = {0!r} is not supported!".format(kind))
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
337 return kind, cls, params, kw
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
338
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
339 def get_storage(self, tmp=False, create=False):
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
340 """
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
341 Get the whoosh storage (whoosh supports different kinds of storage,
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
342 e.g. to filesystem or to GAE).
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
343 Currently we only support the FileStorage.
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
344 """
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
345 kind, cls, params, kw = self.get_storage_params(tmp)
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
346 if kind == WHOOSH_FILESTORAGE:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
347 if create:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
348 index_dir = params[0]
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
349 try:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
350 os.mkdir(index_dir)
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
351 except:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
352 # ignore exception, we'll get another exception below
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
353 # in case there are problems with the index_dir
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
354 pass
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
355 return cls(*params, **kw)
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
356
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
357 def open(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
358 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
359 Open all indexes.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
360 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
361 storage = self.get_storage()
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
362 for name in INDEXES:
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
363 self.ix[name] = storage.open_index(name)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
364
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
365 def close(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
366 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
367 Close all indexes.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
368 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
369 for name in self.ix:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
370 self.ix[name].close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
371 self.ix = {}
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
372
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
373 def create(self, tmp=False):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
374 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
375 Create all indexes (empty).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
376 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
377 storage = self.get_storage(tmp, create=True)
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
378 for name in INDEXES:
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
379 storage.create_index(self.schemas[name], indexname=name)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
380
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
381 def destroy(self, tmp=False):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
382 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
383 Destroy all indexes.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
384 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
385 # XXX this is whoosh backend specific and currently only works for FileStorage.
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
386 kind, cls, params, kw = self.get_storage_params(tmp)
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
387 if kind == WHOOSH_FILESTORAGE:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
388 index_dir = params[0]
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
389 if os.path.exists(index_dir):
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
390 shutil.rmtree(index_dir)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
391
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
392 def move_index(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
393 """
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
394 Move freshly built indexes from tmp storage to normal storage
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
395 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
396 # XXX this is whoosh backend specific and currently only works for FileStorage.
1780
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
397 kind, cls, params, kw = self.get_storage_params(False)
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
398 if kind == WHOOSH_FILESTORAGE:
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
399 _, _, params_tmp, _ = self.get_storage_params(True)
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
400 self.destroy()
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
401 index_dir, index_dir_tmp = params[0], params_tmp[0]
4a054063fbb0 make whoosh storage type and params configurable, updated docs
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1779
diff changeset
402 os.rename(index_dir_tmp, index_dir)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
403
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
404 def index_revision(self, meta, content, async=True):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
405 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
406 Index a single revision, add it to all-revs and latest-revs index.
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
407
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
408 :param meta: metadata dict
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
409 :param content: preprocessed (filtered) indexable content
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
410 :param async: if True, use the AsyncWriter, otherwise use normal writer
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
411 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
412 doc = backend_to_index(meta, content, self.schemas[ALL_REVS], self.wikiname)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
413 if async:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
414 writer = AsyncWriter(self.ix[ALL_REVS])
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
415 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
416 writer = self.ix[ALL_REVS].writer()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
417 with writer as writer:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
418 writer.update_document(**doc) # update, because store_revision() may give us an existing revid
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
419 doc = backend_to_index(meta, content, self.schemas[LATEST_REVS], self.wikiname)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
420 if async:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
421 writer = AsyncWriter(self.ix[LATEST_REVS])
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
422 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
423 writer = self.ix[LATEST_REVS].writer()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
424 with writer as writer:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
425 writer.update_document(**doc)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
426
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
427 def remove_revision(self, revid, async=True):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
428 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
429 Remove a single revision from indexes.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
430 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
431 if async:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
432 writer = AsyncWriter(self.ix[ALL_REVS])
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
433 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
434 writer = self.ix[ALL_REVS].writer()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
435 with writer as writer:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
436 writer.delete_by_term(REVID, revid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
437 if async:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
438 writer = AsyncWriter(self.ix[LATEST_REVS])
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
439 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
440 writer = self.ix[LATEST_REVS].writer()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
441 with writer as writer:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
442 # find out itemid related to the revid we want to remove:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
443 with self.ix[LATEST_REVS].searcher() as searcher:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
444 docnum_remove = searcher.document_number(revid=revid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
445 if docnum_remove is not None:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
446 itemid = searcher.stored_fields(docnum_remove)[ITEMID]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
447 if docnum_remove is not None:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
448 # we are removing a revid that is in latest revs index
1337
3b1de2b38587 search/indexing: enable the query parser RegexPlugin, remove a workaround for a fixed whoosh issue
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1336
diff changeset
449 latest_names_revids = self._find_latest_names_revids(self.ix[ALL_REVS], Term(ITEMID, itemid))
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
450 if latest_names_revids:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
451 # we have a latest revision, just update the document in the index:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
452 assert len(latest_names_revids) == 1 # this item must have only one latest revision
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
453 latest_name_revid = latest_names_revids[0]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
454 # we must fetch from backend because schema for LATEST_REVS is different than for ALL_REVS
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
455 # (and we can't be sure we have all fields stored, too)
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
456 meta, _ = self.backend.retrieve(*latest_name_revid)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
457 # we only use meta (not data), because we do not want to transform data->content again (this
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
458 # is potentially expensive) as we already have the transformed content stored in ALL_REVS index:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
459 with self.ix[ALL_REVS].searcher() as searcher:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
460 doc = searcher.document(revid=latest_name_revid[1])
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
461 content = doc[CONTENT]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
462 doc = backend_to_index(meta, content, self.schemas[LATEST_REVS], self.wikiname)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
463 writer.update_document(**doc)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
464 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
465 # this is no revision left in this item that could be the new "latest rev", just kill the rev
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
466 writer.delete_document(docnum_remove)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
467
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
468 def _modify_index(self, index, schema, wikiname, revids, mode='add', procs=1, limitmb=256):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
469 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
470 modify index contents - add, update, delete the indexed documents for all given revids
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
471
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
472 Note: mode == 'add' is faster but you need to make sure to not create duplicate
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
473 documents in the index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
474 """
1336
b8652b85a5fa simplify creation of whoosh's index writer for whoosh >2.3.2
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1334
diff changeset
475 with index.writer(procs=procs, limitmb=limitmb) as writer:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
476 for mountpoint, revid in revids:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
477 if mode in ['add', 'update', ]:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
478 meta, data = self.backend.retrieve(mountpoint, revid)
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
479 content = convert_to_indexable(meta, data, is_new=False)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
480 doc = backend_to_index(meta, content, schema, wikiname)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
481 if mode == 'update':
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
482 writer.update_document(**doc)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
483 elif mode == 'add':
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
484 writer.add_document(**doc)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
485 elif mode == 'delete':
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
486 writer.delete_by_term(REVID, revid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
487 else:
969
5bf6d7a2ffcf Convert all %r, %s, %x, %d, %i, %o to new Format String Syntax
Vedran Mileti? <rivanvx@gmail.com>
parents: 954
diff changeset
488 raise ValueError("mode must be 'update', 'add' or 'delete', not '{0}'".format(mode))
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
489
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
490 def _find_latest_names_revids(self, index, query=None):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
491 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
492 find the latest revids using the all-revs index
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
493
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
494 :param index: an up-to-date and open ALL_REVS index
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
495 :param query: query to search only specific revisions (optional, default: all items/revisions)
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
496 :returns: a list of tuples (name, latest revid)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
497 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
498 if query is None:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
499 query = Every()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
500 with index.searcher() as searcher:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
501 result = searcher.search(query, groupedby=ITEMID, sortedby=FieldFacet(MTIME, reverse=True))
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
502 by_item = result.groups(ITEMID)
952
308ba8de39c4 simplify code for finding latest revisions, whoosh >= 2.3.0 now has group value lists in right order
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 951
diff changeset
503 # values in v list are in same relative order as in results, so latest MTIME is first:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
504 latest_names_revids = [(searcher.stored_fields(v[0])[NAME],
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
505 searcher.stored_fields(v[0])[REVID])
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
506 for v in by_item.values()]
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
507 return latest_names_revids
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
508
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
509 def rebuild(self, tmp=False, procs=1, limitmb=256):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
510 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
511 Add all items/revisions from the backends of this wiki to the index
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
512 (which is expected to have no items/revisions from this wiki yet).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
513
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
514 Note: index might be shared by multiple wikis, so it is:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
515 create, rebuild wiki1, rebuild wiki2, ...
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
516 create (tmp), rebuild wiki1, rebuild wiki2, ..., move
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
517 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
518 storage = self.get_storage(tmp)
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
519 index = storage.open_index(ALL_REVS)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
520 try:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
521 # build an index of all we have (so we know what we have)
883
70bd8a85eee5 indexing: checked some stuff from comments, cosmetic changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 870
diff changeset
522 all_revids = self.backend # the backend is an iterator over all revids
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
523 self._modify_index(index, self.schemas[ALL_REVS], self.wikiname, all_revids, 'add', procs, limitmb)
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
524 latest_names_revids = self._find_latest_names_revids(index)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
525 finally:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
526 index.close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
527 # now build the index of the latest revisions:
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
528 index = storage.open_index(LATEST_REVS)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
529 try:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
530 self._modify_index(index, self.schemas[LATEST_REVS], self.wikiname, latest_names_revids, 'add', procs, limitmb)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
531 finally:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
532 index.close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
533
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
534 def update(self, tmp=False):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
535 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
536 Make sure index reflects current backend state, add missing stuff, remove outdated stuff.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
537
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
538 This is intended to be used:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
539 * after a full rebuild that was done at tmp location
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
540 * after wiki is made read-only or taken offline
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
541 * after the index was moved to the normal index location
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
542
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
543 Reason: new revisions that were created after the rebuild started might be missing in new index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
544
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
545 :returns: index changed (bool)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
546 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
547 storage = self.get_storage(tmp)
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
548 index_all = storage.open_index(ALL_REVS)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
549 try:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
550 # NOTE: self.backend iterator gives (mountpoint, revid) tuples, which is NOT
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
551 # the same as (name, revid), thus we do the set operations just on the revids.
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
552 # first update ALL_REVS index:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
553 revids_mountpoints = dict((revid, mountpoint) for mountpoint, revid in self.backend)
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
554 backend_revids = set(revids_mountpoints)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
555 with index_all.searcher() as searcher:
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
556 ix_revids_names = dict((doc[REVID], doc[NAME]) for doc in searcher.all_stored_fields())
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
557 revids_mountpoints.update(ix_revids_names) # this is needed for stuff that was deleted from storage
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
558 ix_revids = set(ix_revids_names)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
559 add_revids = backend_revids - ix_revids
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
560 del_revids = ix_revids - backend_revids
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
561 changed = add_revids or del_revids
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
562 add_revids = [(revids_mountpoints[revid], revid) for revid in add_revids]
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
563 del_revids = [(revids_mountpoints[revid], revid) for revid in del_revids]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
564 self._modify_index(index_all, self.schemas[ALL_REVS], self.wikiname, add_revids, 'add')
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
565 self._modify_index(index_all, self.schemas[ALL_REVS], self.wikiname, del_revids, 'delete')
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
566
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
567 backend_latest_names_revids = set(self._find_latest_names_revids(index_all))
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
568 finally:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
569 index_all.close()
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
570 index_latest = storage.open_index(LATEST_REVS)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
571 try:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
572 # now update LATEST_REVS index:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
573 with index_latest.searcher() as searcher:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
574 ix_revids = set(doc[REVID] for doc in searcher.all_stored_fields())
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
575 backend_latest_revids = set(revid for name, revid in backend_latest_names_revids)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
576 upd_revids = backend_latest_revids - ix_revids
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
577 upd_revids = [(revids_mountpoints[revid], revid) for revid in upd_revids]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
578 self._modify_index(index_latest, self.schemas[LATEST_REVS], self.wikiname, upd_revids, 'update')
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
579 self._modify_index(index_latest, self.schemas[LATEST_REVS], self.wikiname, del_revids, 'delete')
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
580 finally:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
581 index_latest.close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
582 return changed
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
583
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
584 def optimize_backend(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
585 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
586 Optimize backend / collect garbage to safe space:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
587
904
46263060782e remove references to "trash" (is an implementation detail and might change)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 894
diff changeset
588 * deleted items: destroy them? use a deleted_max_age?
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
589 * user profiles: only keep latest revision?
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
590 * normal wiki items: keep by max_revisions_count / max_age
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
591 * deduplicate data (determine dataids with same hash, fix references to point to one of them)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
592 * remove unreferenced dataids (destroyed revisions, deduplicated stuff)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
593 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
594 # TODO
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
595
917
772c4c8db164 fix optimize_index to accept a tmp param
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 916
diff changeset
596 def optimize_index(self, tmp=False):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
597 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
598 Optimize whoosh index.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
599 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
600 storage = self.get_storage(tmp)
917
772c4c8db164 fix optimize_index to accept a tmp param
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 916
diff changeset
601 for name in INDEXES:
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
602 ix = storage.open_index(name)
945
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
603 try:
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
604 ix.optimize()
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
605 finally:
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
606 ix.close()
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
607
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
608 def dump(self, tmp=False, idx_name=LATEST_REVS):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
609 """
870
04cdc3823cac add index-dump script command
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 869
diff changeset
610 Yield key/value tuple lists for all documents in the indexes, fields sorted.
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
611 """
1779
082581e8688c indexing: move creation of the whoosh storage object to get_storage method
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1778
diff changeset
612 storage = self.get_storage(tmp)
1778
d56cd193cca0 whoosh: do not use convenience functions open_dir and create_in
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1777
diff changeset
613 ix = storage.open_index(idx_name)
945
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
614 try:
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
615 with ix.searcher() as searcher:
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
616 for doc in searcher.all_stored_fields():
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
617 name = doc.pop(NAME, u"")
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
618 content = doc.pop(CONTENT, u"")
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
619 yield [(NAME, name), ] + sorted(doc.items()) + [(CONTENT, content), ]
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
620 finally:
2ba60188efde indexing: add index close at some places, use try/finally
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 929
diff changeset
621 ix.close()
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
622
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
623 def query_parser(self, default_fields, idx_name=LATEST_REVS):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
624 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
625 Build a query parser for a list of default fields.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
626 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
627 schema = self.schemas[idx_name]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
628 if len(default_fields) > 1:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
629 qp = MultifieldParser(default_fields, schema=schema)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
630 elif len(default_fields) == 1:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
631 qp = QueryParser(default_fields[0], schema=schema)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
632 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
633 raise ValueError("default_fields list must at least contain one field name")
1337
3b1de2b38587 search/indexing: enable the query parser RegexPlugin, remove a workaround for a fixed whoosh issue
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1336
diff changeset
634 qp.add_plugin(RegexPlugin())
1740
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
635 def userid_pseudo_field_factory(fieldname):
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
636 """generate a translator function, that searches for the userid
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
637 in the given fieldname when provided with the username
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
638 """
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
639 def userid_pseudo_field(node):
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
640 username = node.text
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
641 users = user.search_users(**{NAME_EXACT: username})
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
642 if users:
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
643 userid = users[0].meta[ITEMID]
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
644 node = WordNode(userid)
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
645 node.set_fieldname(fieldname)
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
646 return node
1009
11a415638e12 fixes #68 search has now the PseudoField username:bla which translates to userid:id_of_bla
Bruno Gola <brunogola@gmail.com>
parents: 975
diff changeset
647 return node
1740
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
648 return userid_pseudo_field
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
649 qp.add_plugin(PseudoFieldPlugin(dict(
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
650 # username:JoeDoe searches for revisions modified by JoeDoe
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
651 username=userid_pseudo_field_factory(keys.USERID),
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
652 # assigned:JoeDoe searches for tickets assigned to JoeDoe
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
653 assigned=userid_pseudo_field_factory('assigned_to'), # XXX should be keys.ASSIGNED_TO
87bed5e28b0b extend parser for username search related to assigned_to ticket field
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1678
diff changeset
654 )))
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
655 return qp
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
656
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
657 def search(self, q, idx_name=LATEST_REVS, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
658 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
659 Search with query q, yield Revisions.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
660 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
661 with self.ix[idx_name].searcher() as searcher:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
662 # Note: callers must consume everything we yield, so the for loop
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
663 # ends and the "with" is left to close the index files.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
664 for hit in searcher.search(q, **kw):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
665 doc = hit.fields()
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
666 latest_doc = doc if idx_name == LATEST_REVS else None
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
667 item = Item(self, latest_doc=latest_doc, itemid=doc[ITEMID])
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
668 yield item.get_revision(doc[REVID], doc=doc)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
669
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
670 def search_page(self, q, idx_name=LATEST_REVS, pagenum=1, pagelen=10, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
671 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
672 Same as search, but with paging support.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
673 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
674 with self.ix[idx_name].searcher() as searcher:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
675 # Note: callers must consume everything we yield, so the for loop
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
676 # ends and the "with" is left to close the index files.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
677 for hit in searcher.search_page(q, pagenum, pagelen=pagelen, **kw):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
678 doc = hit.fields()
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
679 latest_doc = doc if idx_name == LATEST_REVS else None
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
680 item = Item(self, latest_doc=latest_doc, itemid=doc[ITEMID])
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
681 yield item.get_revision(doc[REVID], doc=doc)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
682
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
683 def documents(self, idx_name=LATEST_REVS, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
684 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
685 Yield Revisions matching the kw args.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
686 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
687 for doc in self._documents(idx_name, **kw):
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
688 latest_doc = doc if idx_name == LATEST_REVS else None
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
689 item = Item(self, latest_doc=latest_doc, itemid=doc[ITEMID])
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
690 yield item.get_revision(doc[REVID], doc=doc)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
691
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
692 def _documents(self, idx_name=LATEST_REVS, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
693 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
694 Yield documents matching the kw args (internal use only).
948
77ef4841b4f4 yielding all documents for no kw args is done by whoosh now
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 945
diff changeset
695
77ef4841b4f4 yielding all documents for no kw args is done by whoosh now
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 945
diff changeset
696 If no kw args are given, this yields all documents.
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
697 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
698 with self.ix[idx_name].searcher() as searcher:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
699 # Note: callers must consume everything we yield, so the for loop
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
700 # ends and the "with" is left to close the index files.
948
77ef4841b4f4 yielding all documents for no kw args is done by whoosh now
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 945
diff changeset
701 for doc in searcher.documents(**kw):
77ef4841b4f4 yielding all documents for no kw args is done by whoosh now
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 945
diff changeset
702 yield doc
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
703
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
704 def document(self, idx_name=LATEST_REVS, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
705 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
706 Return a Revision matching the kw args.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
707 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
708 doc = self._document(idx_name, **kw)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
709 if doc:
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
710 latest_doc = doc if idx_name == LATEST_REVS else None
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
711 item = Item(self, latest_doc=latest_doc, itemid=doc[ITEMID])
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
712 return item.get_revision(doc[REVID], doc=doc)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
713
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
714 def _document(self, idx_name=LATEST_REVS, **kw):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
715 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
716 Return a document matching the kw args (internal use only).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
717 """
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
718 with self.ix[idx_name].searcher() as searcher:
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
719 return searcher.document(**kw)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
720
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
721 def has_item(self, name):
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
722 item = self[name]
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
723 return bool(item)
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
724
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
725 def __getitem__(self, name):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
726 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
727 Return item with <name> (may be a new or existing item).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
728 """
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
729 return Item(self, name_exact=name)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
730
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
731 def get_item(self, **query):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
732 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
733 Return item identified by the query (may be a new or existing item).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
734
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
735 :kwargs **query: e.g. name_exact=u"Foo" or itemid="..." or ...
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
736 (must be a unique fieldname=value for the latest-revs index)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
737 """
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
738 return Item(self, **query)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
739
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
740 def create_item(self, **query):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
741 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
742 Return item identified by the query (must be a new item).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
743
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
744 :kwargs **query: e.g. name_exact=u"Foo" or itemid="..." or ...
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
745 (must be a unique fieldname=value for the latest-revs index)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
746 """
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
747 return Item.create(self, **query)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
748
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
749 def existing_item(self, **query):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
750 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
751 Return item identified by query (must be an existing item).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
752
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
753 :kwargs **query: e.g. name_exact=u"Foo" or itemid="..." or ...
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
754 (must be a unique fieldname=value for the latest-revs index)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
755 """
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
756 return Item.existing(self, **query)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
757
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
758
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
759 class Item(object):
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
760 def __init__(self, indexer, latest_doc=None, **query):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
761 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
762 :param indexer: indexer middleware instance
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
763 :param latest_doc: if caller already has a latest-revs index whoosh document
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
764 it can be given there, to avoid us fetching same doc again
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
765 from the index
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
766 :kwargs **query: any unique fieldname=value for the latest-revs index, e.g.:
862
e1a68b3721cc use the right tokenizers/analyzers in the schema, use name_exact for 1:1 matching
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 851
diff changeset
767 name_exact="foo" or itemid="....." to fetch the item's current
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
768 doc from the index (if not given via latest_doc).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
769 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
770 self.indexer = indexer
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
771 self.backend = self.indexer.backend
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
772 if latest_doc is None:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
773 # we need to call the method without acl check to avoid endless recursion:
953
326f7a365007 search/indexing: remove all_revs=False params (it is the default)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 952
diff changeset
774 latest_doc = self.indexer._document(**query) or {}
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
775 self._current = latest_doc
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
776
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
777 def _get_itemid(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
778 return self._current.get(ITEMID)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
779 def _set_itemid(self, value):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
780 self._current[ITEMID] = value
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
781 itemid = property(_get_itemid, _set_itemid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
782
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
783 @property
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
784 def acl(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
785 return self._current.get(ACL)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
786
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
787 @property
1777
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
788 def ptime(self):
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
789 dt = self._current.get(PTIME)
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
790 if dt is not None:
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
791 return utctimestamp(dt)
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
792
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
793 @property
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
794 def mtime(self):
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
795 dt = self._current.get(MTIME)
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
796 if dt is not None:
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
797 return utctimestamp(dt)
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
798
1db99f8f5144 add 'pubread' capability/permission (needs index rebuild!)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1740
diff changeset
799 @property
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
800 def name(self):
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
801 return self._current.get(NAME, 'DoesNotExist')
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
802
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
803 @classmethod
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
804 def create(cls, indexer, **query):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
805 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
806 Create a new item and return it, raise exception if it already exists.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
807 """
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
808 item = cls(indexer, **query)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
809 if not item:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
810 return item
1274
8c275efc6e8c refactored and cleaned up user module code and tests
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1267
diff changeset
811 raise ItemAlreadyExistsError(repr(query))
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
812
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
813 @classmethod
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
814 def existing(cls, indexer, **query):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
815 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
816 Get an existing item and return it, raise exception if it does not exist.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
817 """
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
818 item = cls(indexer, **query)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
819 if item:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
820 return item
1274
8c275efc6e8c refactored and cleaned up user module code and tests
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1267
diff changeset
821 raise NoSuchItemError(repr(query))
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
822
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
823 def __nonzero__(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
824 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
825 Item exists (== has at least one revision)?
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
826 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
827 return self.itemid is not None
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
828
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
829 def iter_revs(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
830 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
831 Iterate over Revisions belonging to this item.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
832 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
833 if self:
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
834 for rev in self.indexer.documents(idx_name=ALL_REVS, itemid=self.itemid):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
835 yield rev
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
836
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
837 def __getitem__(self, revid):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
838 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
839 Get Revision with revision id <revid>.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
840 """
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
841 return Revision(self, revid)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
842
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
843 def get_revision(self, revid, doc=None):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
844 """
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
845 Similar to item[revid], but you can optionally give an already existing
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
846 whoosh result document for the given revid to avoid backend accesses for some use cases.
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
847 """
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
848 return Revision(self, revid, doc)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
849
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
850 def preprocess(self, meta, data):
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
851 """
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
852 preprocess a revision before it gets stored and put into index.
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
853 """
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
854 content = convert_to_indexable(meta, data, is_new=True)
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
855 return meta, data, content
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
856
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
857 def store_revision(self, meta, data, overwrite=False,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
858 trusted=False, # True for loading a serialized representation or other trusted sources
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
859 name=None, # TODO name we decoded from URL path
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
860 action=u'SAVE',
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
861 remote_addr=None,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
862 userid=None,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
863 wikiname=None,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
864 contenttype_current=None,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
865 contenttype_guessed=None,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
866 acl_parent=None,
1918
f0b346ac6338 added return_rev param to store_revision() to avoid returning a revision object without need
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1780
diff changeset
867 return_rev=False,
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
868 ):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
869 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
870 Store a revision into the backend, write metadata and data to it.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
871
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
872 Usually this will be a new revision, either of an existing item or
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
873 a new item. With overwrite mode, we can also store over existing
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
874 revisions.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
875
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
876 :type meta: dict
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
877 :type data: open file (file must be closed by caller)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
878 :param overwrite: if True, allow overwriting of existing revs.
1918
f0b346ac6338 added return_rev param to store_revision() to avoid returning a revision object without need
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1780
diff changeset
879 :param return_rev: if True, return a Revision instance of the just created revision
f0b346ac6338 added return_rev param to store_revision() to avoid returning a revision object without need
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1780
diff changeset
880 :returns: a Revision instance or None
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
881 """
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
882 if remote_addr is None:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
883 try:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
884 # if we get here outside a request, this won't work:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
885 remote_addr = unicode(request.remote_addr)
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
886 except:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
887 pass
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
888 if userid is None:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
889 try:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
890 # if we get here outside a request, this won't work:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
891 userid = flaskg.user.valid and flaskg.user.itemid or None
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
892 except:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
893 pass
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
894 if wikiname is None:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
895 wikiname = app.cfg.interwikiname
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
896 state = {'trusted': trusted,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
897 keys.NAME: name,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
898 keys.ACTION: action,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
899 keys.ADDRESS: remote_addr,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
900 keys.USERID: userid,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
901 keys.WIKINAME: wikiname,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
902 keys.ITEMID: self.itemid, # real itemid or None
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
903 'contenttype_current': contenttype_current,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
904 'contenttype_guessed': contenttype_guessed,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
905 'acl_parent': acl_parent,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
906 }
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
907 ct = meta.get(keys.CONTENTTYPE)
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
908 if ct == CONTENTTYPE_USER:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
909 Schema = UserMetaSchema
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
910 else:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
911 Schema = ContentMetaSchema
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
912 m = Schema(meta)
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
913 valid = m.validate(state)
1267
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
914 # TODO: currently we just log validation results. in the end we should
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
915 # reject invalid stuff in some comfortable way.
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
916 if not valid:
1267
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
917 logging.warning("metadata validation failed, see below")
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
918 for e in m.children:
1267
9188844ae351 logging fixes, use logging not print
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1266
diff changeset
919 logging.warning("{0}, {1}".format(e.valid, e))
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
920
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
921 # we do not have anything in m that is not defined in the schema,
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
922 # e.g. userdefined meta keys or stuff we do not validate. thus, we
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
923 # just update the meta dict with the validated stuff:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
924 meta.update(dict(m.value.items()))
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
925 # we do not want None / empty values:
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
926 meta = dict([(k, v) for k, v in meta.items() if v not in [None, []]])
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
927
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
928 if self.itemid is None:
1266
1c66f39aa44f implement validation for revision metadata storage using flatland
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1009
diff changeset
929 self.itemid = meta[ITEMID]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
930 backend = self.backend
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
931 if not overwrite:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
932 revid = meta.get(REVID)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
933 if revid is not None and revid in backend:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
934 raise ValueError('need overwrite=True to overwrite existing revisions')
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
935 meta, data, content = self.preprocess(meta, data)
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
936 data.seek(0) # rewind file
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
937 revid = backend.store(meta, data)
908
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
938 meta[REVID] = revid
a31e0d2f542d reorder processing when storing a revision, fix ITEMLINKS/ITEMTRANSCLUSIONS
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 904
diff changeset
939 self.indexer.index_revision(meta, content)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
940 if not overwrite:
953
326f7a365007 search/indexing: remove all_revs=False params (it is the default)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 952
diff changeset
941 self._current = self.indexer._document(revid=revid)
1918
f0b346ac6338 added return_rev param to store_revision() to avoid returning a revision object without need
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1780
diff changeset
942 if return_rev:
f0b346ac6338 added return_rev param to store_revision() to avoid returning a revision object without need
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1780
diff changeset
943 return Revision(self, revid)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
944
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
945 def store_all_revisions(self, meta, data):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
946 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
947 Store over all revisions of this item.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
948 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
949 for rev in self.iter_revs():
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
950 meta[REVID] = rev.revid
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
951 self.store_revision(meta, data, overwrite=True)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
952
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
953 def destroy_revision(self, revid):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
954 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
955 Destroy revision <revid>.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
956 """
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
957 rev = Revision(self, revid)
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
958 self.backend.remove(rev.name, revid)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
959 self.indexer.remove_revision(revid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
960
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
961 def destroy_all_revisions(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
962 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
963 Destroy all revisions of this item.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
964 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
965 for rev in self.iter_revs():
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
966 self.destroy_revision(rev.revid)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
967
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
968
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
969 class Revision(object):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
970 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
971 An existing revision (exists in the backend).
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
972 """
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
973 def __init__(self, item, revid, doc=None):
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
974 is_current = revid == CURRENT
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
975 if doc is None:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
976 if is_current:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
977 doc = item._current
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
978 else:
954
f79d471667b9 search/indexing: replace all_revs (boolean) by idx_name (index name str) - easier code, more flexible
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 953
diff changeset
979 doc = item.indexer._document(idx_name=ALL_REVS, revid=revid)
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
980 if doc is None:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
981 raise KeyError
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
982 if is_current:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
983 revid = doc.get(REVID)
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
984 if revid is None:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
985 raise KeyError
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
986 self.item = item
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
987 self.revid = revid
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
988 self.backend = item.backend
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
989 self._doc = doc
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
990 self.meta = Meta(self, self._doc)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
991 self._data = None
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
992 # Note: this does not immediately raise a KeyError for non-existing revs any more
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
993 # If you access data or meta, it will, though.
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
994
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
995 @property
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
996 def name(self):
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
997 return self.meta.get(NAME, 'DoesNotExist')
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
998
869
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
999 def _load(self):
975
a4d4c74721bc fixed the routing middleware
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 969
diff changeset
1000 meta, data = self.backend.retrieve(self._doc[NAME], self.revid) # raises KeyError if rev does not exist
869
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1001 self.meta = Meta(self, self._doc, meta)
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1002 self._data = data
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1003 return meta, data
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1004
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1005 @property
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1006 def data(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1007 if self._data is None:
869
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1008 self._load()
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1009 return self._data
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1010
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1011 def close(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1012 if self._data is not None:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1013 self._data.close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1014
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1015 def __enter__(self):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1016 return self
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1017
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1018 def __exit__(self, exc_type, exc_value, exc_tb):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1019 self.close()
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1020
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1021 def __cmp__(self, other):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1022 return cmp(self.meta, other.meta)
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1023
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1024
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1025 from collections import Mapping
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1026
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1027 class Meta(Mapping):
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1028 def __init__(self, revision, doc, meta=None):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1029 self.revision = revision
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1030 self._doc = doc or {}
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1031 self._meta = meta or {}
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1032 self._common_fields = revision.item.indexer.common_fields
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1033
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1034 def __contains__(self, key):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1035 try:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1036 self[key]
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1037 except KeyError:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1038 return False
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1039 else:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1040 return True
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1041
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1042 def __iter__(self):
869
1a7aa271851d refactor indexing revision lazy retrieval
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 862
diff changeset
1043 self._meta, _ = self.revision._load()
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1044 return iter(self._meta)
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1045
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1046 def __getitem__(self, key):
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1047 if self._meta:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1048 # we have real metadata (e.g. from storage)
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1049 return self._meta[key]
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1050 elif self._doc and key in self._common_fields:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1051 # we have a result document from whoosh, which has quite a lot
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1052 # of the usually wanted metadata, avoid storage access, use this.
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1053 value = self._doc[key]
1438
f89fffd81540 Add ptime meta key to blog post items, logo and supertags meta keys to blog home items.
Pavel Sviderski <pavel@psviderski.name>
parents: 1357
diff changeset
1054 if key in [MTIME, PTIME]:
916
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1055 # whoosh has a datetime object, but we want a UNIX timestamp
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1056 value = utctimestamp(value)
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1057 return value
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1058 else:
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1059 self._meta, _ = self.revision._load()
ac600982e40a fine tune / optimize Revision object creation, MetaData access
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 908
diff changeset
1060 return self._meta[key]
848
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1061
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1062 def __cmp__(self, other):
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1063 if self[REVID] == other[REVID]:
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1064 return 0
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1065 return cmp(self[MTIME], other[MTIME])
ebd15cde707e add new storage-ng storage/indexing/... subsystem
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents:
diff changeset
1066
851
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1067 def __len__(self):
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1068 return 0 # XXX
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1069
cd8afd44521e storage fixes / extensions, made ACLs work, updated MoinMoin.config
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 848
diff changeset
1070 def __repr__(self):
969
5bf6d7a2ffcf Convert all %r, %s, %x, %d, %i, %o to new Format String Syntax
Vedran Mileti? <rivanvx@gmail.com>
parents: 954
diff changeset
1071 return "Meta _doc: {0!r} _meta: {1!r}".format(self._doc, self._meta)