annotate MoinMoin/filter/application_vnd_sun_xml.py @ 5240:b2e96d4cac1c

OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Wed, 28 Oct 2009 22:35:33 +0100
parents ae8bf4c7d5a1
children
rev   line source
470
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
1 # -*- coding: iso-8859-1 -*-
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
2 """
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
3 MoinMoin - OpenOffice.org *.sx? Filter
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
4
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
5 Depends on: nothing (only python with zlib)
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
6
5240
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
7 @copyright: 2006-2009 MoinMoin:ThomasWaldmann
470
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
8 @license: GNU GPL, see COPYING for details.
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
9 """
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
10
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
11 import re, zipfile
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
12
3611
0b220c040be4 replace request.log by direct logging calls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
13 from MoinMoin import log
0b220c040be4 replace request.log by direct logging calls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
14 logging = log.getLogger(__name__)
0b220c040be4 replace request.log by direct logging calls
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
15
470
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
16 rx_stripxml = re.compile("<[^>]*?>", re.DOTALL|re.MULTILINE)
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
17
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
18 def execute(indexobj, filename):
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
19 try:
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
20 zf = zipfile.ZipFile(filename, "r")
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
21 data = zf.read("content.xml")
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
22 zf.close()
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
23 data = " ".join(rx_stripxml.sub(" ", data).split())
4965
ae8bf4c7d5a1 catch another exception when a bad zip file is encountered while indexing (ignore it)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3611
diff changeset
24 except (zipfile.BadZipfile, RuntimeError), err:
ae8bf4c7d5a1 catch another exception when a bad zip file is encountered while indexing (ignore it)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 3611
diff changeset
25 logging.error("%s [%s]" % (str(err), filename))
470
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
26 data = ""
5240
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
27 try:
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
28 data = data.decode('utf-8')
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
29 except UnicodeDecodeError:
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
30 # protected with password? no valid OpenOffice file?
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
31 data = u''
b2e96d4cac1c OpenOffice/OpenDocument filters: catch UnicodeDecodeErrors (happens with password protected files)
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 4965
diff changeset
32 return data
470
957d482215ac lupy: some filters, update mode
Thomas Waldmann <tw@waldmann-edv.de>
parents:
diff changeset
33