analysis of tree edit distance algorithms serge dulucq and hélène b89902009 黃鼎翔 b89902011...

Analysis of Tree Edit Analysis of Tree Edit Distance AlgorithmsDistance Algorithms

Serge Dulucq and HSerge Dulucq and Hééllèènene

B89902009 黃鼎翔B89902011 田知本B89902045 巨彥霖

OutlineOutline

Introduction Edit Distance for Trees and Forests Cover Strategies

MotivationMotivation

One way of comparing two ordered trees is by measuring their edit distance

Application areas• Comparison of hierarchically structured data• Alignment of RNA secondary structures in computati

onal biology Two algorithms using dynamic programming

• Zhang-Shasha• Klein

PurposePurpose

A general analysis of dynamic programming for edit distance algorithm• Study the complexity of those decompositions

by counting the exact number of distinct recursive calls

Define a new edit distance algorithm for trees which improves original algorithms with respect to the number of recursive calls

Trees and forestsTrees and forests A tree is a node

(called the root) connected to an ordered sequence of disjoint trees

Such a sequence is called a forest

We write l(A1◦…◦An) for the tree composed of the node l connected to the sequence of trees A1, …, An An

l

A1

A2˙˙˙

˙˙˙

2

3

5 1

4

2

4

5 1

3≠

|F| denotes the number of nodes of the forest F

SF(F) is the set of all subforests of F

F(i), i is a node of F, denotes the subtree of F rooted at i

deg(i) is the degree of i, that is the number of children of i

1

2

3 5 6

4 78 9

10F

|F| = 10

2

3

9 5 6

4

∈ SF(F)

F(2) =

deg(4) = 2

Edit distanceEdit distance

Let F and G be two forests. The edit distance between F and G, denoted d(F, G), is the minimal cost of edit operations needed to transform F into G

Operations• Substitution• Insertion• Deletion

Let Cs, Ci, Cd denote the costs of substitution, insertion, deletion

Recursive relationship(1/3)Recursive relationship(1/3) Strings

• u, v are strings; x, y are alphabet symbols• d(xu, yv) = min{ Cd(x) + d(u, yv),

Ci(y) + d(xu, v), Cs(x, y) + d(u, v) }

• d(ux, vy) = min{ Cd(x) + d(u, vy), Ci(y) + d(ux, v), Cs(x, y) + d(u, v) }

ux

vy

y y

Recursive relationship(2/3)Recursive relationship(2/3)

Trees• l, l’ are roots; F, F’ are forests• d(l(F), l’(F’)) = min{ Cd(l) + d(F, l’(F’)),

Ci(l’) + d(l(F), F’), Cs(l, l’) + d(F, F’) }

l l’

l’

l’

Recursive relationship(3/3)Recursive relationship(3/3) Forests

• T, T’ are forests• Left decomposition

d(l(F)◦T, l’(F’)◦T’) = min{ Cd(l) + d(F◦T, l’(F’)◦T’),

Ci(l’) + d(l(F)◦T, F’◦T’), d(l(F), l’(F’)) + d(T, T’) }

• Right decompositiond(T◦l(F), T’◦l’(F’)) = min{ Cd(l) + d(T◦F, T’◦l’(F’)),

Ci(l’) + d(T◦l(F), T’◦F’), d(l(F), l’(F’)) + d(T, T’) }

• direction to indicate left or right

ExampleExample

1

3

4 5

2

1

3

4 5

2

3

4 5

2

3

4 5

2

3

4 5

2

4 5

4

5

3

4 5

4 52

2

4 5

42

5 4

Left decomposition

Right decomposition

Strategy & Relevant forestsStrategy & Relevant forests Let F and G be two forests. A strategy is a map

ping from SF(F)×SF(G) to {left, right}

Let (F, F’) be a pair of forests provided with a strategyφ.The set RFφ(F, F’) of relevant forests is defined as the least subset of SF(F)×SF(F’) such that if the decomposition of (F, F’) meets the pair (G, G’), then (G, G’) belongs to RFφ(F, F’)

RFφ(F) and RFφ(F’) denote the projection of RFφ(F, F’) on SF(F) and SF(F’)

#relevant denote the number of relevant forests

Proposition(1/2)Proposition(1/2) F=F’=Ø → RFφ(F, F’)=Ø

φ(F, F’)=left, F=l(G)◦T, F’=Ø → RFφ(F, F’) = {(F, F’)}∪RFφ(G◦T, F’)

φ(F, F’)=right, F=T◦l(G), F’=Ø → RFφ(F, F’) = {(F, F’)}∪RFφ(T◦G, F’)

φ(F, F’)=left, F=Ø, F=l’(G’)◦T’ → RFφ(F, F’) = {(F, F’)}∪RFφ(F, G’◦T’)d(l(G)◦T, l’(G’)◦T’) = min{ Cd(l) + d(G◦T, l’(G’)◦T’),

Ci(l’) + d(l(G)◦T, G’◦T’), Cs(l(G), l’(G’)) + d(G◦T, G’◦T’) }

d(T◦l(G), T’◦l’(G’)) = min{ Cd(l) + d(T◦G, T’◦l’(G’)), Ci(l’) + d(T◦l(G), T’◦G’), Cs(l(G), l’(G’)) + d(T◦G, T’◦G’) }

Proposition(2/2)Proposition(2/2) φ(F, F’)=right, F=Ø, F’=T’◦l’(G’)

→ RFφ(F, F’) = {(F, F’)}∪RFφ(F, T’◦G’)

φ(F, F’)=left, F=l(G)◦T, F’=l’(G’)◦T’ → RFφ(F, F’) = {(F, F’)}∪ RFφ(G◦T, F’)∪

RFφ(F, G’◦T’)∪RFφ(l(G), l’(G’))∪RFφ(T, T’) φ(F, F’)=right, F=T◦l(G), F’=T’◦l’(G’)

→ RFφ(F, F’) = {(F, F’)}∪ RFφ(T◦G, F’)∪ RFφ(F, T’◦G’)∪RFφ(l(G), l’(G’))∪RFφ(T, T’)

d(l(G)◦T, l’(G’)◦T’) = min{ Cd(l) + d(G◦T, l’(G’)◦T’), Ci(l’) + d(l(G)◦T, G’◦T’), Cs(l(G), l’(G’)) + d(G◦T, G’◦T’) }

d(T◦l(G), T’◦l’(G’)) = min{ Cd(l) + d(T◦G, T’◦l’(G’)), Ci(l’) + d(T◦l(G), T’◦G’), Cs(l(G), l’(G’)) + d(T◦G, T’◦G’) }

Lemma 1Lemma 1

Given a tree A=l(A1◦…◦An), for any strategy we have#relevant(A) ≥

|A| - |Ai|+ #relevant(A1) +…+ #relevant(An)where i∈[1…n] is such that the size of Ai is maximal

Proof(1/2)Proof(1/2)Let F = A1◦…◦An ⇒ RF(A) = {A}∪RF(F)

⇒ #relevant(A) = 1 + #relevant(F)When n=1:

F = A1, A=l(A1) ⇒ #relevant(A) = 1 + #relevant(A1)

≥ |A| - |A1| + #relevant(A1)When n>1:

Suppose left, Let A1 = l(F1), T = A2◦…◦AnRF(F) = {F}∪RF(A1)∪RF(T)∪RF(F1◦T)| RF(F1◦T) – (RF(F1)∪RF(T)) | ≥ min{|F1|, |T|}

⇒ #relevant(F) ≥ 1 + #relevant(A1) + #relevant(T) + min{|F1|, |T|}

Let j∈[2…n] st |Aj| is maximal among |A2|, …, |An|⇒ #relevant(F) ≥ 1 + #relevant(A1)

+…+ #relevant(An) + |T| - |Aj| + min{|F1|, |T|}

Take a lookTake a look

#relevant(A) ≥ |A| - |Ai| + #relevant(A1) +…+ #relevant(An)

⇒ #relevant(F) ≥ |F| + |Ai| + #relevant(A1) +…+ #relevant(An)

#relevant(F) ≥ 1 + |T| - |Aj| + min{|F1|, |T|}+ #relevant(A1) +…+ #relevant(An)

Proof(2/2)Proof(2/2)1 + |T| - |Aj| + min{|F1|, |T|} ≥ |F| - |Ai|1) If |F1| ≤ |T|

⇒ 1 + |T| + min{|F1|, |T|} = |F| Since |Aj| ≤ |Ai| ∴1 + |T| - |Aj| + min{|F1|, |T|} = |F| - |Aj|

≥ |F| - |Ai|2) If |F1| > |T|

⇒ |F| - |Ai| = |T| (∵i=1) ∴1 + |T| - |Aj| + min{|F1|, |T|} = 1 + |T| + |T| - |Aj|

≥ 1 + |T| > |F| - |Ai|

∴ #relevant(F) ≥ |F| - |Ai| + #relevant(A1) +…+ #relevant(An)⇒ #relevant(A) ≥ |A| - |Ai| + #relevant(A1) +…+ #relevant(An)

Lemma 2Lemma 2

For every nature number n, there exists a tree A of size n such that for any strategy, #relevant(A) has a lower bound in O(n logn)

• For complete balanced binary tree Tn of size n, prove by induction on n that

#relevant(Tn) ≥ (n+1)log2(n+1)/2

IdeaIdea Suppose the direction is left

RF(l(F)◦T) = {l(F)◦T}∪RF(l(F))∪RF(F◦T)∪RF(T)

Since T⊆F◦T, We want to eliminate in priority nodes of F in F◦T, such that RF(F◦T) and RF(T) share relevant forests as most as possible!

CoverCover Let F be a forest. A cover r of F is a mapping fr

om F to F∪{left, right} satisfying for each node i in F• if deg(i) = 0 or 1, then r(i)∈{left, right}• if deg(i) > 1, then r(i) is a child of i

2

4

1

3

2

4

1

3

left, right

Cover strategyCover strategy Given a pair of trees (A, B) and a cover r for A,

we associate a unique strategyφ as follows.• if deg(i) = 0 or 1, then φ(A(i), G) = r(i), for each forest

G in B• If A(i) is of the form l(A1◦…◦An) with n > 1, then let

p∈{1, …, n} such that the favorite child r(i) is the root of Ap. For each forest G of B, we define

φ(A(i), G) = right whenever p = 1, left otherwise φ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1 φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An

The tree A is called the cover tree. A strategy is a cover strategy if there exists a cover tree associated to it

A4

i

A1

A2

A3

GA(i)

φ(A(i), G) = right whenever p = 1, left otherwiseφ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An

Some TasksSome Tasks

The order of our TasksThe order of our Tasks• 研究研究 Tree A …Tree A …• 研究研究 Tree B …Tree B …•將將 Tree A & Tree BTree A & Tree B 的研究資料做結合的研究資料做結合• 求得求得 # distinct pairs (recursively)# distinct pairs (recursively)

研究研究 Tree A …Tree A …

Tree ATree A

Focus on relevant(A) (detail)Focus on relevant(A) (detail) Cover strategies in ACover strategies in A A A 將將牽引牽引著著 B B 走走

Lemma 3Lemma 3

(F(i), G(j))(F(i), G(j))∈∈ RF(F,G)

i1

j1

F

G

This is trivial

Lemma 4Lemma 4

RF(l(F)RF(l(F)◦T◦T) =) = {l(F) {l(F) ◦T, F1 ◦T, ….. ,Fk◦T◦T, F1 ◦T, ….. ,Fk◦T}}∪RF(l(F))∪RF(T)∪RF(l(F))∪RF(T) 這是幹什麼的呢這是幹什麼的呢 ?? Term : Term : k = |F| : Fk = |F| : F 所有所有 nodenode 的個數的個數 Fk+1 Fk+1 為為 Fk Fk 作作 left decomposition left decomposition 而得到而得到的的 forest , so F1 , F2 , …… , Fk forest , so F1 , F2 , …… , Fk 是由一是由一連串的連串的 left decomposition left decomposition 所產生的所產生的 forests. forests. 目標目標 : : 利用利用 cover strategy cover strategy 為為 φ(l(F) (F) ◦◦ T) = left T) = left 看看是否可以減少看看是否可以減少 recursiverecursive 的次數的次數 ??

F T

FT F T

RF(l(F))RF(T) RF(F◦T)

RF(l(F)◦T)

Since cover strategy, the

direction is left

RF(l(F)◦T) = {l(F) ◦T} ∪∪ RF(l(F)) ∪∪ RF(T) ∪∪RF(F◦T)

F T

F1T T

∈RF(l(F)) RF(T)

RF(F◦T)

F1

Since cover strategy, the

direction is left

Continue……..

F T

F T F T

{F1 ◦T , ….. , Fk◦T}

So …….

Conclusion Conclusion

RF(l(F)RF(l(F)◦T◦T) =) =

{l(F) {l(F) ◦T, F1 ◦T, ….. ,Fk◦T◦T, F1 ◦T, ….. ,Fk◦T}}∪RF(l(F))∪RF(T)∪RF(l(F))∪RF(T)

Lemma 5Lemma 5

#relevant(A) = #relevant(A) = |A| - |Aj| + #relevant(A1) + #relevant(A2) +|A| - |Aj| + #relevant(A1) + #relevant(A2) +

… + #relevant(An) … + #relevant(An)

Term : A = l(A1 Term : A = l(A1 ◦◦A2 A2 ◦ … ◦ ◦ … ◦ An).An). Aj Aj 為為 AA 的的 favorite child.favorite child.目標目標 : : 算出一個算出一個 cover treecover tree 的的 relevant forestsrelevant forests 的個數的個數

An

l

A1

Aj

A

……

Aj 為 A 的 favorite child j∈[1…n]

Part 1 : |A| - |Aj|Note : Φ(A(i), G) = right whenever p = 1, left otherwise

Φ(T◦Ap◦…◦An, G) = left, for each forest T of A1◦…◦Ap-1

Φ(Ap◦T, G) = right, for each forest T of Ap+1◦…◦An

說明 : 由於 Aj 為 A 的 favorite child , 所以 |A| - |Aj|

相當於在算 {A} ∪ { 所有包含 Aj 的 forests} 的

個數

Aj

Part 2: #relevant(A1) + #relevant(A2) + … + #relevant(An) Note : RF(A1◦A2◦A3◦A4◦... ◦An) ={A1◦A2◦A3◦A4◦... ◦An} ∪RF(F1◦A2◦A3◦A4◦... ◦An)∪RF(A1) RF(A2◦A3◦A4◦...∪ ◦An )

A1 A2 A3 A4 An…..

•#relevant(A) =

|A| - |Aj| + #relevant(A1) + #relevant(A2) +

… + #relevant(An)

Conclusion

free nodefree node

什麼是什麼是 free node?free node?• 不是獨生子不是獨生子• 不是父母最愛的孩子不是父母最愛的孩子

DefinitionDefinition• the root of Athe root of A• the node whose parent is of degree the node whose parent is of degree grater thgrater th

an 1an 1 and is and is not the favorite childnot the favorite child

favorite child

free node

研究研究 Tree B…Tree B…

Tree BTree B

B B 是是被被 A A 牽引著走牽引著走 So no any cover strategySo no any cover strategy Focus on following three things:Focus on following three things:

• Rightmost forestsRightmost forests• Leftmost forestsLeftmost forests• Special forestsSpecial forests

Three Things (1)Three Things (1)

DefinitionDefinition

• Rightmost forestsRightmost forests 由由 B B 開始開始，做一連串的，做一連串的 left decompositionleft decomposition 到到結束結束，產生的所有，產生的所有 subforestssubforests

• Leftmost forestsLeftmost forests 由由 B B 開始開始，做一連串的，做一連串的 right decompositionright decomposition 到到結束結束，產生的所，產生的所有有 subforestssubforests

• special forestsspecial forests 由由 B B 開始開始，做一連串的，做一連串的 left or right decompositionleft or right decomposition 到到結束結束，，產生的所有產生的所有 subforestssubforests

Rightmost ∪ leftmost = special ？ NO ！

2 3

5

4

76

2

3

5

4

76

3

5 6

5 6

5 6

5 64

7

4

76

4

7

7

example

Left decompositio

n

1

2 3

5

4

76

B

all rightmost forests of B


Three categoriesThree categories• relevant forests of A fall within three categoriesrelevant forests of A fall within three categories

((αα) those are compared with ) those are compared with all rightmost forests ofall rightmost forests of B B ((ββ ) those are compared with ) those are compared with all leftmost forests ofall leftmost forests of B B ((γγ ) those are compared with ) those are compared with all special forests ofall special forests of B B

why ？


The of rightmost , leftmost , special forestThe of rightmost , leftmost , special forests ( )s ( )

• #right(B) = ∑(|B(i)|,i#right(B) = ∑(|B(i)|,i∈B) - ∑(|B(i)|,∑(|B(i)|,i is a rightmost child)

• #left(B) = ∑(|B(i)|,i#left(B) = ∑(|B(i)|,i∈B) - ∑(|B(i)|,∑(|B(i)|,i is a leftmost child)

• #special(B) = |B|(|B|+3) / 2 - ∑(|B(i)|,i∑(|B(i)|,i∈B)

number

#right#right #left#left #special#special

說明說明 #right(B) , #left(B)#right(B) , #left(B)

Rightmost forests – all cover strategies are Rightmost forests – all cover strategies are that “favorite child is that “favorite child is rightmost childrightmost child” ” because of all because of all left decompositionleft decomposition

Leftmost forests – all cover strategies are Leftmost forests – all cover strategies are that “favorite child is that “favorite child is leftmost childleftmost child” ” because of all because of all right decompositionright decomposition

#relevant(B) = |B| - |Bj| + #relevant(B1) + … + #rele

vant(Bn)

#right(B) = |B| - |B 右 | + #right(B1) + … + #right(Bn)

#left(B) = |B| - |B 左 | + #left(B1) + … + #left(Bn)

recursively

recursively

#right(B) = ∑(|B(i)|,i∑(|B(i)|,i∈B) - ∑(|B(i)|,∑(|B(i)|,i is a rightmost child)

#left(B) = ∑(|B(i)|,i∑(|B(i)|,i∈B) - ∑(|B(i)|,∑(|B(i)|,i is a leftmost child)

Review

結合

comparisoncomparison

two types (two types ( 對於對於 A)A)• Tree’s comparisonTree’s comparison

free nodefree node favorite childfavorite child

• Forests’ comparisonForests’ comparison

Lemma 6Lemma 6

let F be a relevant forest of Alet F be a relevant forest of A• if the direction is left , then F is at least comif the direction is left , then F is at least com

pared with all rightmost forests of Bpared with all rightmost forests of B• if the direction is right , then F is at least coif the direction is right , then F is at least co

mpared with all lef tmost forests of Bmpared with all lef tmost forests of B

Why?

牛刀小試

free node’s comparisonfree node’s comparison

Lemma 7Lemma 7• let i be a free node of Alet i be a free node of A

if the direction of i is left , then A(i) is (if the direction of i is left , then A(i) is (αα) ) if the direction of i is right , then A(i) is (if the direction of i is right , then A(i) is (ββ ) )

((αα) those are compared with ) those are compared with all rightmost forests ofall rightmost forests of B B((ββ) those are compared with ) those are compared with all leftmost forests ofall leftmost forests of B B ((γγ ) those are compared with ) those are compared with all special forests ofall special forests of B B

lemma7 lemma7 說明說明 consider G , the largest forest of B such that (A(i),G) belconsider G , the largest forest of B such that (A(i),G) bel

ongs to RF(A,B) and G is ongs to RF(A,B) and G is not a rightmost forestnot a rightmost forest 因為因為 G G 一定不是一定不是 B , so…..B , so….. 考慮如何產生出考慮如何產生出 (A(i),G) ?(A(i),G) ? 共有四種可能的共有四種可能的 case :case :

if the direction of i is left , then A(i) is (if the direction of i is left , then A(i) is (αα))

Case1 : 左邊不動 , 右邊斷頭

since the direction of A(i) is left

存在 a node l , two forests H and P such that G = H ◦ P

則 (A(i) , l(H) ◦ P) is in RF(A,B)

(A(i) , l(H) ◦ P) -> (A(i),G) by 右邊斷頭 !!

G is the largest and not rightmost => l(H) ◦ P is a rightmost forest of B

=> G = H ◦ P is also a a rightmost forest of B

Case2 : 左邊斷頭 , 右邊不動

存在 a node l , (l ◦ A(i) , G) -> by 左邊斷頭 !!

(A(i) ◦ l , G) -> by 左邊斷頭 !!

(l(A(i)) , G) -> by 左邊斷頭 !!

Case3 : tree 的超級比一比

(A(i) ◦ F1 , G ◦ F2) -> (A(i) , G) by tree 的超級比一比

(F1 ◦ A(i) , F2 ◦ G) -> (A(i) , G) by tree 的超級比一比

Case4 : forest 的超級比一比

(T1 ◦ A(i) , T2 ◦ G) -> (A(i) , G) by forest 的超級比一比

(A(i)◦ T1 , G ◦ T2) -> (A(i) , G) by forest 的超級比一比矛盾 not free node ! G is a tree !

forests’ comparisonforests’ comparison

Lemma9Lemma9• let F be a relevant forest of A but let F be a relevant forest of A but not anot a treetree. Let. Let i i bebe thth

e lower common ancestore lower common ancestor of the set or nodes of F and of the set or nodes of F and jj be the be the favorite childfavorite child of i of i

if F is a rightmost forest whose left most tree is not if F is a rightmost forest whose left most tree is not A(j) , A(j) , then F has the same category as A(i)then F has the same category as A(i)

if F is a leftmost forest , if F is a leftmost forest , then F has the samethen F has the same category as category as A(i)A(i)

else else F is (F is (γγ ))

lemma8 lemma8 說明說明 (1) (2)(1) (2)

The fact…The fact…

(1) (2) is very trivial !!(1) (2) is very trivial !!

對於任何一座 forest , 如果

符合 (1) -> decomposition 尚未接觸 favorite child ( 在右邊 )

符合 (2) -> decomposition 尚未接觸 favorite child ( 在左邊 )

從老祖宗 (LCA) 下來的 forests’全部一致

category

Lemma8 Lemma8 說明說明 (3)(3)

對於任何一座對於任何一座 forest , forest , 如果不滿足如果不滿足 (1) & (2) , (1) & (2) , 則他最左邊的則他最左邊的 tree tree 必定是老祖宗的愛子必定是老祖宗的愛子 (favorite child) ,(favorite child) ,所以其目前的所以其目前的 direction direction 是是 right …right …

now consider a forest G …now consider a forest G …

如果 G is a rightmost forest of B :

因為 F is not a leftmost forest

所以 F 老祖宗的愛子一定不在最左邊 => A(i) 的方向是 left

by lemma => 存在 (A(i) , G)

(A(i) , G) -> (F , G) by 一連串左邊斷頭 , 右邊不變 !!

如果 G is not a rightmost forest of B :

B 要變成 G 一定有 right decomposition

而剛好 F 目前的方向是 right

所以 (F , G) 存在

favorite child’s comparisonfavorite child’s comparison

Lemma9Lemma9• let i be the node of A is not free , and j be the palet i be the node of A is not free , and j be the pa

rent of irent of i if the direction of i is left , if i is the rightmost chilif the direction of i is left , if i is the rightmost chil

d of j and A(j) is left , then A(i) is (d of j and A(j) is left , then A(i) is (αα) ) if the direction of i is right , if i is the leftmost chilif the direction of i is right , if i is the leftmost chil

d of j and A(j) is right , then A(i) is (d of j and A(j) is right , then A(i) is (ββ ) ) else A(i) is (else A(i) is (γγ ) )

Lemma9 Lemma9 說明說明The fact…The fact…

all are very trivial !!all are very trivial !!

(1) (1) left left 的世界的世界(2) (2) right right 的世界的世界(3) (3) 其餘…其餘…

真的是 trivial 嗎 ??

Final Task…Final Task…

NotationNotation

let i be a node of A , let j be the parent of i (if i is let i be a node of A , let j be the parent of i (if i is not root)not root)• Free(A(i))Free(A(i)) : #relevent(A(i),B) if i is free : #relevent(A(i),B) if i is free• Right(A(i))Right(A(i)) : #relevent(A(i),B) if A(j) is : #relevent(A(i),B) if A(j) is ((αα)) • Left(A(i))Left(A(i)) : #relevent(A(i),B) if A(j) is : #relevent(A(i),B) if A(j) is ((ββ )) All(A(i))All(A(i)) : # : #

relevent(A(i),B) if A(j) is relevent(A(i),B) if A(j) is ((γγ ))

So , #relevant(A,B) = Free(A)So , #relevant(A,B) = Free(A)

TheoremTheorem

let (A,B) be a pair of trees , A be a let (A,B) be a pair of trees , A be a cover treecover tree• 7 case 7 case

Case(1)Case(1)

If A is reduced to a single node whose If A is reduced to a single node whose direction is direction is rightright

Free(A) = #left(B)

Right(A) = #special(B)

Left(A) = #left(B)

All(A) = #special(B)

Case2Case2

If A is reduced to a single node whose If A is reduced to a single node whose direction is direction is leftleft

Free(A) = #right(B)

Right(A) = #left(B)

Left(A) = #special(B)

All(A) = #special(B)

Case3Case3

if A = l(A’) and the direction of l is if A = l(A’) and the direction of l is rightright ( A’ is a tree ) ( A’ is a tree )

Free(A) = #left(B) + Left(A’)

Right(A) = #special(B) + All(A’)

Left(A) = #left(B) + Left(A’)

All(A) = #special(B) + All(A’)

Case4Case4 if A = l(A’) and the direction of l is if A = l(A’) and the direction of l is leftleft ( A’ is a tree ) ( A’ is a tree )

Free(A) = #right(B) + Right(A’)

Right(A) = #right(B) + Right(A’)

Left(A) = #special(B) + All(A’)

All(A) = #special(B) + All(A’)

Case5Case5

if A = l(A1if A = l(A1 。…。。…。 An) and the favorite child is thAn) and the favorite child is the leftmost childe leftmost child

Free(A) = #left(B)(|A|-|A1|) + Left(A1) + Free(A2) +…+ Free(An)

Right(A) = #special(B)(|A|-|A1|) + All(A1) + Free(A2) +…+ Free(An)

Left(A) = #left(B)(|A|-|A1|) + Left(A1) + Free(A2) +…+ Free(An)

All(A) = #special(B)(|A|-|A1|) + All(A1) + Free(A2) +…+ Free(An)

Case6Case6 if A = l(A1if A = l(A1 。…。。…。 An) and the favorite child is thAn) and the favorite child is th

e rightmost childe rightmost child

Free(A) = #right(B)(|A|-|An|) + Right(An) + Free(A1) +…+Free(An-1)

Right(A) = #right(B)(|A|-|An|) + Right(An) + Free(A1) +…+Free(An-1)

Left(A) = #special(B)(|A|-|An|) + All(An) + Free(A1) +…+Free(An-1)

All(A) = #special(B)(|A|-|An|) + All(An) + Free(A1) +…+Free(An-1)

Case7Case7 if A = l(A1if A = l(A1 。…。。…。 An) and the favorite child is Aj ,An) and the favorite child is Aj ,

with 1<j<n with 1<j<n

Free(A) = #right(B)(1+|A1 。…。 Aj-1|) +#special(B)(|Aj 。…。 An|)

+ All(Aj) + Free(A1) +…+ Free(Aj-1) + Free(Aj+1) +…+ Free(An)

Right(A) = #right(B)(1+|A1 。…。 Aj-1|) +#special(B)(|Aj 。…。 An|)+ All(Aj) + Free(A1) +…+ Free(Aj-1) + Free(Aj+1) +…+ Free(An)

Left(A) = #special(B)(|A|-|Aj|) +All(Aj) + Free(A1) +…+ Free(Aj-1) + Free(Aj+1) +…+ Free(An)

All(A) = #special(B)(|A|-|Aj|) +All(Aj) + Free(A1) +…+ Free(Aj-1) + Free(Aj+1) +…+ Free(An)

conclusionconclusion

StepsSteps• 拿到拿到 two tree two tree AA & & BB• 計算計算 #right(#right(BB) #left() #left(BB) #special() #special(BB))

#relevant(#relevant(AA,,BB) = ) = Free( Free(AA))Free(Free(AA))

by theorem recursive

exampleexample

For Zhang-Shasha algorithmFor Zhang-Shasha algorithm

#relevant(A,B) = #right(A) * #right(B)

Why ?

望子成龍望子成龍

Choose the favorite child (1)Choose the favorite child (1)

Choose the good favorite child to Choose the good favorite child to minimminimizeize Free(A)Free(A)

Free(A) = minCase 5 (favorite child 在最左邊 )

Case 6 (favorite child 在最右邊 )

Case 7 (favorite child 在最中間 )

Choose the favorite child (2)Choose the favorite child (2)

Is this really good?Is this really good?

Not necessarily !!Why?Need preprocessing time !!

The endThe end

感謝呂學一教授

導演田知本黃鼎翔巨彥霖

監製田知本黃鼎翔巨彥霖

音效田知本黃鼎翔巨彥霖

燈光田知本黃鼎翔巨彥霖

特技田知本黃鼎翔巨彥霖

主角田知本黃鼎翔巨彥霖

Happy New Year !

analysis of tree edit distance algorithms serge dulucq and hélène b89902009 黃鼎翔 b89902011...

Documents