annotate MoinMoin/util/bdiff.py @ 6025:05a6897ee496

"userdir" caching scope: use it for userid lookup caches, reduce code duplication For same reasons (see previous changesets), we need to use the "userdir" scope for the name2id / openid2id lookup caches, too.
author Thomas Waldmann <tw AT waldmann-edv DOT de>
date Wed, 12 Feb 2014 12:07:22 +0100
parents 01f05e74aa9c
children
rev   line source
964
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
1 """
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
2 MoinMoin - Binary patching and diffing
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
3
1918
bb2e053067fb fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1867
diff changeset
4 @copyright: 2005 Matt Mackall <mpm@selenic.com>,
bb2e053067fb fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1867
diff changeset
5 2006 MoinMoin:AlexanderSchremmer
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
6
964
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
7 Algorithm taken from mercurial's mdiff.py
2286
01f05e74aa9c Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1918
diff changeset
8
964
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
9 @license: GNU GPL, see COPYING for details.
dede3773735c Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 963
diff changeset
10 """
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
11
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
12 import zlib, difflib, struct
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
13
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
14 BDIFF_PATT = ">lll"
970
f29c09448353 Minor modifications.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 964
diff changeset
15 BDIFF_PATT_SIZE = struct.calcsize(BDIFF_PATT)
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
16
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
17 def compress(text):
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
18 return zlib.compress(text) # here we could tune the compression level
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
19
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
20 def decompress(bin):
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
21 return zlib.decompress(bin)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
22
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
23 def diff(a, b):
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
24 """ Generates a binary diff of the passed strings.
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
25 Note that you can pass arrays of strings as well.
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
26 This might give you better results for text files. """
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
27 if not a:
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
28 s = "".join(b)
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
29 return s and (struct.pack(BDIFF_PATT, 0, 0, len(s)) + s)
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
30
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
31 bin = []
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
32 la = lb = 0
1004
7828d27399d4 whitespace-only cleanup and minor style changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 970
diff changeset
33
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
34 p = [0]
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
35 for i in a: p.append(p[-1] + len(i))
1004
7828d27399d4 whitespace-only cleanup and minor style changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 970
diff changeset
36
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
37 for am, bm, size in difflib.SequenceMatcher(None, a, b).get_matching_blocks():
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
38 s = "".join(b[lb:bm])
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
39 if am > la or s:
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
40 bin.append(struct.pack(BDIFF_PATT, p[la], p[am], len(s)) + s)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
41 la = am + size
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
42 lb = bm + size
1004
7828d27399d4 whitespace-only cleanup and minor style changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 970
diff changeset
43
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
44 return "".join(bin)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
45
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
46 def textdiff(a, b):
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
47 """ A diff function optimised for text files. Works with binary files as well. """
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
48 return diff(a.splitlines(1), b.splitlines(1))
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
49
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
50 def patchtext(bin):
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
51 """ Returns the new hunks that are contained in a binary diff."""
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
52 pos = 0
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
53 t = []
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
54 while pos < len(bin):
963
1083861bd16c Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 962
diff changeset
55 p1, p2, l = struct.unpack(BDIFF_PATT, bin[pos:pos + BDIFF_PATT_SIZE])
1083861bd16c Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 962
diff changeset
56 pos += BDIFF_PATT_SIZE
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
57 t.append(bin[pos:pos + l])
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
58 pos += l
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
59 return "".join(t)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
60
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
61 def patch(a, bin):
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
62 """ Patches the string a with the binary patch bin. """
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
63 c = last = pos = 0
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
64 r = []
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
65
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
66 while pos < len(bin):
963
1083861bd16c Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 962
diff changeset
67 p1, p2, l = struct.unpack(BDIFF_PATT, bin[pos:pos + BDIFF_PATT_SIZE])
1083861bd16c Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 962
diff changeset
68 pos += BDIFF_PATT_SIZE
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
69 r.append(a[last:p1])
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
70 r.append(bin[pos:pos + l])
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
71 pos += l
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
72 last = p2
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
73 c += 1
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
74 r.append(a[last:])
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
75
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
76 return "".join(r)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
77
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
78 def test():
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
79 a = ("foo\n" * 30)
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
80 b = (" fao" * 30)
1004
7828d27399d4 whitespace-only cleanup and minor style changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 970
diff changeset
81
1268
7ef804645070 Cleaned file.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 1004
diff changeset
82 a = file(r"test.1").read()
7ef804645070 Cleaned file.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 1004
diff changeset
83 b = file(r"test.2").read()
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
84 a = a.splitlines(1)
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
85 b = b.splitlines(1)
1004
7828d27399d4 whitespace-only cleanup and minor style changes
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 970
diff changeset
86
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
87 d = diff(a, b)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
88 z = compress(d)
1867
8166b37db555 remove backticks, using repr() instead
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1268
diff changeset
89 print repr(patchtext(d))
8166b37db555 remove backticks, using repr() instead
Thomas Waldmann <tw AT waldmann-edv DOT de>
parents: 1268
diff changeset
90 print repr(d)
962
930c9e82a60b Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 961
diff changeset
91 print "".join(b) == patch("".join(a), d)
961
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
92 print len(d), len(z)
21eb4cb11e2c Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff changeset
93