2 unstable releases
0.2.0 | Jan 3, 2023 |
---|---|
0.1.0 | Dec 19, 2022 |
#16 in #diffing
140KB
3K
SLoC
natural-xml-diff
The natural-xml-diff
crate implements a diffing algorithm that attempts to
produce correct and human readable differences between two XML documents.
Algorithm
The algorithm implemented by this library is based on the paper "Bridging the gap between tracking and detecting changes on XML". It is also implemented by the Java-based jndiff library.
Work in progress
This is still a work in progress!
Credits
Structural diffing
Let's consider the following XML document, taken from the "Bridging the Gap" paper:
<?xml version="1.0"?>
<book>
<chapter>
<title>Text 1</title>
<para>Text 2</para>
</chapter>
<chapter>
<title>Text 4</title>
<para>Text 5</para>
</chapter>
<chapter>
<title>Text 6</title>
<para>Text 7<img/>Text 8</para>
</chapter>
<chapter>
<title>Text 9</title>
<para>Text 10</para>
</chapter>
<chapter>
<para>Text 11</para>
<para>Text 12</para>
</chapter>
</book>
We'll call that "document A", the "before" of the diffing. Here's the "after", "document B":
<?xml version="1.0"?>
<book>
<chapter>
<para>Text 2</para>
</chapter>
<chapter>
<title>Text 4</title>
<para>Text 25</para>
<para>Text 11</para>
</chapter>
<chapter>
<title>Text 6</title>
<para>Text 7<img/>Text 8</para>
</chapter>
<chapter>
<title>Text 9</title>
<para>Text 10</para>
</chapter>
<chapter>
<para>Text 12</para>
</chapter>
</book>
Let's present both as trees with numbered nodes (the root node, 0, is not shown). Here's document A:
graph TD;
1[1 book]-->2
2[2 chapter]-->3
2-->5
3[3 title]-->4
4[4 Text 1]
5[5 para]-->6
6[6 Text 2]
1-->7
7[7 chapter] --> 8
8[8 title] --> 9
9[9 Text 4]
7 --> 10
10[10 para] --> 11
11[11 Text 5]
1 --> 12
12[12 chapter] --> 13
13[13 title] --> 14
14[14 Text 6]
12-->15
15[15 para] --> 16
15 --> 17
15 --> 18
16[16 Text 7]
17[18 img]
18[19 Text 8]
1 --> 19
19[19 chapter]
19 --> 20
20[20 title] --> 21
21[21 Text 9]
19 --> 22
22[22 para] --> 23
23[23 Text 10]
1 --> 24
24[24 chapter] --> 25
25[25 para] --> 26
26[26 Text 11]
24 --> 27
27[27 para] --> 28
28[28 Text 12]
Maintaining the tests
Some tests use test_generator
to generate tests from the testdata
directory.
New tests in that directory aren't automatically picked up however; you have
to force a recompile of the .rs
files that run the tests to do so. You can
do this by using a non-significant whitespace edit in each .rs
file that
uses test_generator
and saving. I hope there's a better solution.
Dependencies
~4MB
~76K SLoC