Mercurial > moin > 1.9
annotate MoinMoin/util/bdiff.py @ 6025:05a6897ee496
"userdir" caching scope: use it for userid lookup caches, reduce code duplication
For same reasons (see previous changesets), we need to use the "userdir" scope
for the name2id / openid2id lookup caches, too.
author  Thomas Waldmann <tw AT waldmannedv DOT de> 

date  Wed, 12 Feb 2014 12:07:22 +0100 
parents  01f05e74aa9c 
children 
rev  line source 

964
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

1 """ 
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

2 MoinMoin  Binary patching and diffing 
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

3 
1918
bb2e053067fb
fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1867
diff
changeset

4 @copyright: 2005 Matt Mackall <mpm@selenic.com>, 
bb2e053067fb
fixing copyright headers: remove umlauts (encoding troubles), make epydoc compatible, reformat
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1867
diff
changeset

5 2006 MoinMoin:AlexanderSchremmer 
2286
01f05e74aa9c
Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1918
diff
changeset

6 
964
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

7 Algorithm taken from mercurial's mdiff.py 
2286
01f05e74aa9c
Big PEP8 and whitespace cleanup
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1918
diff
changeset

8 
964
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

9 @license: GNU GPL, see COPYING for details. 
dede3773735c
Replaced the module doc string.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
963
diff
changeset

10 """ 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

11 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

12 import zlib, difflib, struct 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

13 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

14 BDIFF_PATT = ">lll" 
970
f29c09448353
Minor modifications.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
964
diff
changeset

15 BDIFF_PATT_SIZE = struct.calcsize(BDIFF_PATT) 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

16 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

17 def compress(text): 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

18 return zlib.compress(text) # here we could tune the compression level 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

19 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

20 def decompress(bin): 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

21 return zlib.decompress(bin) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

22 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

23 def diff(a, b): 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

24 """ Generates a binary diff of the passed strings. 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

25 Note that you can pass arrays of strings as well. 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

26 This might give you better results for text files. """ 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

27 if not a: 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

28 s = "".join(b) 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

29 return s and (struct.pack(BDIFF_PATT, 0, 0, len(s)) + s) 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

30 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

31 bin = [] 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

32 la = lb = 0 
1004
7828d27399d4
whitespaceonly cleanup and minor style changes
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
970
diff
changeset

33 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

34 p = [0] 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

35 for i in a: p.append(p[1] + len(i)) 
1004
7828d27399d4
whitespaceonly cleanup and minor style changes
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
970
diff
changeset

36 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

37 for am, bm, size in difflib.SequenceMatcher(None, a, b).get_matching_blocks(): 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

38 s = "".join(b[lb:bm]) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

39 if am > la or s: 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

40 bin.append(struct.pack(BDIFF_PATT, p[la], p[am], len(s)) + s) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

41 la = am + size 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

42 lb = bm + size 
1004
7828d27399d4
whitespaceonly cleanup and minor style changes
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
970
diff
changeset

43 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

44 return "".join(bin) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

45 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

46 def textdiff(a, b): 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

47 """ A diff function optimised for text files. Works with binary files as well. """ 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

48 return diff(a.splitlines(1), b.splitlines(1)) 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

49 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

50 def patchtext(bin): 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

51 """ Returns the new hunks that are contained in a binary diff.""" 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

52 pos = 0 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

53 t = [] 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

54 while pos < len(bin): 
963
1083861bd16c
Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
962
diff
changeset

55 p1, p2, l = struct.unpack(BDIFF_PATT, bin[pos:pos + BDIFF_PATT_SIZE]) 
1083861bd16c
Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
962
diff
changeset

56 pos += BDIFF_PATT_SIZE 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

57 t.append(bin[pos:pos + l]) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

58 pos += l 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

59 return "".join(t) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

60 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

61 def patch(a, bin): 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

62 """ Patches the string a with the binary patch bin. """ 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

63 c = last = pos = 0 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

64 r = [] 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

65 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

66 while pos < len(bin): 
963
1083861bd16c
Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
962
diff
changeset

67 p1, p2, l = struct.unpack(BDIFF_PATT, bin[pos:pos + BDIFF_PATT_SIZE]) 
1083861bd16c
Replaced size literals with computed constant.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
962
diff
changeset

68 pos += BDIFF_PATT_SIZE 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

69 r.append(a[last:p1]) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

70 r.append(bin[pos:pos + l]) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

71 pos += l 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

72 last = p2 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

73 c += 1 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

74 r.append(a[last:]) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

75 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

76 return "".join(r) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

77 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

78 def test(): 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

79 a = ("foo\n" * 30) 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

80 b = (" fao" * 30) 
1004
7828d27399d4
whitespaceonly cleanup and minor style changes
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
970
diff
changeset

81 
1268
7ef804645070
Cleaned file.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
1004
diff
changeset

82 a = file(r"test.1").read() 
7ef804645070
Cleaned file.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
1004
diff
changeset

83 b = file(r"test.2").read() 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

84 a = a.splitlines(1) 
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

85 b = b.splitlines(1) 
1004
7828d27399d4
whitespaceonly cleanup and minor style changes
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
970
diff
changeset

86 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

87 d = diff(a, b) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

88 z = compress(d) 
1867
8166b37db555
remove backticks, using repr() instead
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1268
diff
changeset

89 print repr(patchtext(d)) 
8166b37db555
remove backticks, using repr() instead
Thomas Waldmann <tw AT waldmannedv DOT de>
parents:
1268
diff
changeset

90 print repr(d) 
962
930c9e82a60b
Optimised diff for text files.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
961
diff
changeset

91 print "".join(b) == patch("".join(a), d) 
961
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

92 print len(d), len(z) 
21eb4cb11e2c
Added binary diffing! Not much left for the getDiff function.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents:
diff
changeset

93 