text indexing and dictionary matching with one error
DESCRIPTION
Text Indexing and Dictionary Matching with One Error. Amir, A., KeseIman, D., Landau, G., M. and etc, Journal of Algorithm, 37, 2000, pp. 309-325. Adviser: R. C. T. Lee Speaker: C. W. Cheng. Problem Definition. The Indexing Problem : - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/1.jpg)
1
Text Indexing and Dictionary Matching with One ErrorAmir, A., KeseIman, D., Landau, G., M. and etc, Journal of Algorithm, 37, 2000,
pp. 309-325
Adviser: R. C. T. LeeSpeaker: C. W. Cheng
![Page 2: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/2.jpg)
2
Problem Definition
• The Indexing Problem :– Input : A Text T of length n over alphabet Σ,
a pattern P of length m over alphabet Σ and an integer k.
– Output : All occurrences of P in T with at most k mismatches.
![Page 3: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/3.jpg)
3
Main idea
• In this algorithm, we construct suffix tree and prefix tree with text T. We set an integer j, j=1,2…m. Then we find the prefix P1,j-1 in prefix tree and the suffix Pj+1,m in suffix tree. If both of them exist, an approximation string matching with one error occurs.
![Page 4: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/4.jpg)
4
Processing
• 1.Construct a suffix tree ST of the text string T and suffix tree ST
R of the string TR is the reversed text TR = tn … t1.
![Page 5: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/5.jpg)
5
Ex : T=AGCAGAT
TR=TAGACGA
![Page 6: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/6.jpg)
6
g
a
t$
tagacga$
g
at
$
cagat$
t$
cagat$
at$
cagat$
a g
ga$
acga$
cga$
$cga$
gacga$
Ex : T=AGCAGAT
TR=TAGACGA
![Page 7: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/7.jpg)
7
Processing
• 2. For each of the suffix trees, link all leaves of the suffix tree in a left-to-right order.
![Page 8: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/8.jpg)
8
g
a
t$
tagacga$
g
at
$
cagat$
t$
cagat$
at$
cagat$
a g
ga$
acga$
cga$
$cga$
gacga$
Ex : T=AGCAGAT
TR=TAGACGA
![Page 9: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/9.jpg)
9
Processing
• 3. For each of the suffix trees, set pointers from each tree node v to its left most leaf vl and rightmost leave vr in the linked list.
![Page 10: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/10.jpg)
10
g
a
t$
tagacga$
g
at
$
cagat$
t$
cagat$
at$
cagat$
a g
ga$
acga$
cga$
$cga$
gacga$
Ex : T=AGCAGAT
TR=TAGACGA
![Page 11: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/11.jpg)
11
Processing
• 4. Designate each leaf in ST by the starting location of its suffix. Designate each leaf in ST
R by n – i + 3, where i is the starting position of the leaf’s suffix in TR.
![Page 12: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/12.jpg)
12
4 1 6 3 5 2 7 3 6 8 5 4 7 9
g
a
t$
tagacga$
g
at
$
cagat$
t$
cagat$
at$
cagat$
a g
ga$
acga$
cga$
$cga$
gacga$
Ex : T=AGCAGAT
TR=TAGACGA
![Page 13: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/13.jpg)
13
Query Processing
• For j = 1, …., m do– 1. Find node v, the location of Pj+1 … Pm in ST, if such a
node exists.– 2. Find node w, the location of Pj-1 .. P1 in ST
R, if such a node exist.
– 3. If v and w exist, the values of leaves under v and w are V[vl….vr] and W[wl…wr], to find the intersections I of V[vl….vr] and W[wl…wr]. If the intersections exist, the approximate string matching occurs on Ti-3…Ti-3+m, for all i I.
![Page 14: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/14.jpg)
14
ExampleEx : T=actgacctcagctta P=ctga k=1
![Page 15: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/15.jpg)
1515 5
a
c
g tc
1 10 9 6 7 2 12 4 11 14 8 3 13
tgacctcagctta$
ctcagctta$
$gctta$
ag
ct
ta
$
ctcagctta$
cagctta$
gacctcagctta$
ta$
a$
acctcagctta$
ctta$
cagctta$
gacctcagctta$
ta$
Ex : T=actgacctcagctta TR=attcgactccagtca P=ctaa
Suffix Tree of T
![Page 16: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/16.jpg)
163 12
a g tc
7 17 4 8 9 14 11 13 6 5 10 15 14
ttcgactccagtca$
a$
ctccagtca$
gtca$
a
$gtca$
cagtca$
gactccagtca$
tccagtca$
actccagtca$
tca$
c
a$
cagtca$
gactccagtca$
tcgactccagtca$
Ex : T=actgacctcagctta TR=attcgactccagtca P=ctaa
Suffix Tree of TR
![Page 17: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/17.jpg)
1715 5
a
c
g tc
1 10 9 6 7 2 12 4 11 14 8 3 13
tgacctcagctta$
ctcagctta$
$gctta$
ag
ct
ta
$
ctcagctta$
cagctta$
gacctcagctta$
ta$
a$
acctcagctta$
ctta$
cagctta$
gacctcagctta$
ta$
Ex : T=actgacctcagctta TR=attcgactccagtca P=ctaa
Suffix Tree of T
j=1v=Pj+1…Pm=taaw=Pj-1…P1=ε
V[vl….vr]={ε}
![Page 18: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/18.jpg)
183 12
a g tc
7 17 4 8 9 14 11 13 6 5 10 15 14
ttcgactccagtca$
a$
ctccagtca$
gtca$
a
$gtca$
cagtca$
gactccagtca$
tccagtca$
actccagtca$
tca$
c
a$
cagtca$
gactccagtca$
tcgactccagtca$
Suffix Tree of TREx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=1v=Pj+1…Pm=taaw=Pj-1…P1=ε
V[vl….vr]={ε}W[vl….vr]={3,12,…,14}
I={ε}
![Page 19: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/19.jpg)
1915 5
a
c
g tc
1 10 9 6 7 2 12 4 11 14 8 3 13
tgacctcagctta$
ctcagctta$
$gctta$
ag
ct
ta
$
ctcagctta$
cagctta$
gacctcagctta$
ta$
a$
acctcagctta$
ctta$
cagctta$
gacctcagctta$
ta$
Suffix Tree of TEx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=2v=Pj+1…Pm=aaw=Pj-1…P1=c
V[vl….vr]={ε}
![Page 20: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/20.jpg)
203 12
a g tc
7 17 4 8 9 14 11 13 6 5 10 15 14
ttcgactccagtca$
a$
ctccagtca$
gtca$
a
$gtca$
cagtca$
gactccagtca$
tccagtca$
actccagtca$
tca$
c
a$
cagtca$
gactccagtca$
tcgactccagtca$
Suffix Tree of TREx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=2v=Pj+1…Pm=aaw=Pj-1…P1=c
V[vl….vr]={ε}W[vl….vr]={4,8,9,14,11}
I={ε}
![Page 21: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/21.jpg)
2115 5
a
c
g tc
1 10 9 6 7 2 12 4 11 14 8 3 13
tgacctcagctta$
ctcagctta$
$gctta$
ag
ct
ta
$
ctcagctta$
cagctta$
gacctcagctta$
ta$
a$
acctcagctta$
ctta$
cagctta$
gacctcagctta$
ta$
Suffix Tree of TEx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=3v=Pj+1…Pm=aw=Pj-1…P1=tc
V[vl….vr]={15,5,1,10}
![Page 22: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/22.jpg)
223 12
a g tc
7 17 4 8 9 14 11 13 6 5 10 15 14
ttcgactccagtca$
a$
ctccagtca$
gtca$
a
$gtca$
cagtca$
gactccagtca$
tccagtca$
actccagtca$
tca$
c
a$
cagtca$
gactccagtca$
tcgactccagtca$
Suffix Tree of TREx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=3v=Pj+1…Pm=aw=Pj-1…P1=tc
V[vl….vr]={15,5,1,10}W[vl….vr]={5,10,15}
I={15,5,10}
![Page 23: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/23.jpg)
23
When j=3, the intersection of V[15,5,1,10] and W[5,10,15] is I={5,10,15}. Therefore approximate string matching occurs on Ti-j…Ti-j+m, for all i I.
T2…T6 ,T7…T11 ,T12…T15 。
T=actgacctcagcttaP=ctaa
![Page 24: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/24.jpg)
2415 5
a
c
g tc
1 10 9 6 7 2 12 4 11 14 8 3 13
tgacctcagctta$
ctcagctta$
$gctta$
ag
ct
ta
$
ctcagctta$
cagctta$
gacctcagctta$
ta$
a$
acctcagctta$
ctta$
cagctta$
gacctcagctta$
ta$
Suffix Tree of TEx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=4v=Pj+1…Pm=εw=Pj-1…P1=atc
V[vl….vr]={15,5,…,13}
![Page 25: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/25.jpg)
253 12
a g tc
7 17 4 8 9 14 11 13 6 5 10 15 14
ttcgactccagtca$
a$
ctccagtca$
gtca$
a
$gtca$
cagtca$
gactccagtca$
tccagtca$
actccagtca$
tca$
c
a$
cagtca$
gactccagtca$
tcgactccagtca$
Suffix Tree of TREx : T=actgacctcagctta TR=attcgactccagtca P=ctaa
j=3v=Pj+1…Pm=εw=Pj-1…P1=atc
V[vl….vr]={15,5,…,13}W[vl….vr]={ε}
I={ε}
![Page 26: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/26.jpg)
26
Range Query Problem
• In step 3, given nodes v and w, we want to find the leaves that appear both in interval [vl … vr] and in the interval [wl … wr], where the four end points of the two intervals are defined in step P.3 of the preprocessing. Thus, we are seeking a solution to the range query problem.
![Page 27: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/27.jpg)
27
Problem Definition of Range Query
• Input : Let V=[v1,v2 … vn] and W=[w1,w2 … wn] be two permutation arrays, where n is the number of elements. Four constants i,j,k and l, where both i+k < n and j+l < n.
• Output : Find the intersection of elements of V[i … i+k] and W[j … j+l].
![Page 28: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/28.jpg)
28
Example :V=[8,5,1,4,3,7,6,2]W=[3,6,4,7,2,1,5,8]i=3,k=4j=2,l=5
Output: the intersection of V[v3,v4,v5,v6] and W[w2,w3,w4,w5,w6]
![Page 29: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/29.jpg)
29
Preprocessing
V=
W=
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8
8 5 1 4 3 7 6 2
3 6 4 7 2 1 5 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
![Page 30: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/30.jpg)
30
Preprocessing
V=
W=
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8
8 5 1 4 3 7 6 2
3 6 4 7 2 1 5 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
8
5
1
4
3
6
7
2
![Page 31: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/31.jpg)
31
Preprocessing
V=
W=
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8
8 5 1 4 3 7 6 2
3 6 4 7 2 1 5 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
8
5
1
4
3
7
6
2
The intersection of V[v3,v4,v5,v6] and W[w2,w3,w4,w5,w6] is {1,4,7}.
![Page 32: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/32.jpg)
32
Time Complexity of Range Query Problem
• By using Overmars’ algorithm, the range query problem can be solved with preprocessing time and , where k is the number of points in the range.
[O88] Overmars, M. H., Efficient data structures for range searching on a grid, J. Algorithms 9, 1988,pp. 254-275.
)log( nkO )log( nnO
![Page 33: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/33.jpg)
33
Time Complexity
• For the indexing problem, the preprocessing time is and the query can be implemented in , where tocc is the number of occurrences of the pattern in the text with one error.
)log( nnO)log( nmtoccO
![Page 34: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/34.jpg)
34
The Dictionary Matching Problem
![Page 35: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/35.jpg)
35
Problem Definition
• The Dictionary Matching Problem– Input :
• 1. A dictionary P = {p1,…., ps}, where pi, i = 1,…., s, are patterns over alphabet Σ, and is the sum of the lengths of all the dictionary patterns.
• 2. A Text T of length n over alphabet Σ.• 3. An integer k.
– Output :• All occurrences of any dictionary patterns in T with a
t most k mismatches.
isi pd 1
![Page 36: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/36.jpg)
36
Main idea
• In this algorithm, we construct suffix tree and prefix tree with D which is concatenation of all patterns in dictionary. We set an integer j, j=1,2…n. Then we find the prefix T1,j-1 in prefix tree and the suffix Tj+1,m in suffix tree. If both of them exist, an approximation string matching with one error occurs.
![Page 37: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/37.jpg)
37
Processing
• 1. Construct a suffix tree SD of string D and suffix tree SD
R of the string DR, where D is the concatenation of all dictionary patterns, with a separator at the end of each pattern, and where DR is the reversal of string D.
![Page 38: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/38.jpg)
38
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
![Page 39: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/39.jpg)
39
a$ g tc
gc
tga$gca$
a$
gctga$gca$
a$
tga$gca$
a$gca$
a$
c
tga$gca$
ca$gctga$gca$
ga$gca$
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
Suffix Tree of D (SD)
![Page 40: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/40.jpg)
40
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
a g tc
g$a
gt
cg
$a
ct
$
cg
$$
t$
gtcg$act$
agtcg$act$
act$
act$
agtcg$act$
tcg$act$
cg$act$
$t$
Suffix Tree of DR (SDR)
![Page 41: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/41.jpg)
41
Processing
• 2. Modify suffix tree SD, and SDR respectivel
y, as follows. For each separator which is treefirst but not edgefirst, i.e., it appears on an edge (u,v) labeled σ$σ”, where σ≠ε, break (u,v) into (u,w) and (w,v). Label (u,v) with σ and (w,v) with $σ’.
![Page 42: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/42.jpg)
42
a$ g tc
a$
tga$
a$
a$
c
tga$
ca$
ga$
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
Suffix Tree of D (SD)
![Page 43: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/43.jpg)
43
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
a g tc
g$
cg
$$
t$
gtcg$
tcg$
cg$$t
$
Suffix Tree of DR (SDR)
![Page 44: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/44.jpg)
44
Preprocessing
• 3. Scan suffix tree SD, respectively SDR, and modi
fy as follows. For each vertex v consider the associated string L(v), i.e., the string from the root to v. Label v with all the locations of the pattern suffixes, resp. prefixes, that are equal to L(v). To implement this note that all the relevant suffixes share a prefix of L(v)$. So, go to edge (v,w) with label beginning with $, assuming such exists, and scan the subtree rooted at w to find all relevant suffixes.
![Page 45: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/45.jpg)
4511
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
Suffix Tree of D (SD)
![Page 46: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/46.jpg)
46
Example :P={tca,gctga,gca}D=TCA$GCTGA$GCA$DR=ACG$AGTCG$ACT$
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)
![Page 47: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/47.jpg)
47
Query Processing
• For j = 1,…., n do– 1. Find node v, the location of the longest pref
ix of tj+1 … tn in SD.– 2. Find node w, the location of the longest pre
fix of tj-1 … t1 in SDR.
– 3. Find intersection of markings of nodes on the path from the root to v in SD and on the path from the root to w in SD
R.
![Page 48: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/48.jpg)
48
Example
T=acagccgaD={tca,gctga,gca}K=1
![Page 49: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/49.jpg)
4911
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
![Page 50: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/50.jpg)
50
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
![Page 51: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/51.jpg)
5111
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=1v=Tj+1…Tm=cagccgaw=Tj-1…T1=ε
V={10,2}
![Page 52: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/52.jpg)
52
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$t
$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=1v=Tj+1…Tm=cagccgaw=Tj-1…T1=ε
V={10,2}W={13,5,…, 8}I={10,2}
![Page 53: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/53.jpg)
53
When j=1, the intersection of V[10,2] and W[13,5,…,8] is I={10,2}. Therefore approximate string matching occurs on T with P…P.
T=acagccgaP={tca,gctga,gca}
![Page 54: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/54.jpg)
5411
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=2v=Tj+1…Tm=agccgaw=Tj-1…T1=a
V={11,8,3}
![Page 55: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/55.jpg)
55
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=2v=Tj+1…Tm=agccgaw=Tj-1…T1=a
V={11,8,3}W={ε}I={ε}
![Page 56: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/56.jpg)
5611
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=3v=Tj+1…Tm=gccgaw=Tj-1…T1=ca
V={ε}
![Page 57: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/57.jpg)
57
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=3v=Tj+1…Tm=gccgaw=Tj-1…T1=ca
V={ε}W={ε}I={ε}
![Page 58: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/58.jpg)
5811
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=4v=Tj+1…Tm=ccgaw=Tj-1…T1=aca
V={ε}
![Page 59: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/59.jpg)
59
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=4v=Tj+1…Tm=ccgaw=Tj-1…T1=aca
V={ε}W={ε}I={ε}
![Page 60: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/60.jpg)
6011
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=5v=Tj+1…Tm=cgaw=Tj-1…T1=gaca
V={ε}
![Page 61: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/61.jpg)
61
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=5v=Tj+1…Tm=cgaw=Tj-1…T1=gaca
V={ε}W={6,11}I={ε}
![Page 62: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/62.jpg)
6211
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=6v=Tj+1…Tm=gaw=Tj-1…T1=cgaca
V={7}
![Page 63: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/63.jpg)
63
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=6v=Tj+1…Tm=gaw=Tj-1…T1=cgaca
V={7}W={7,12}I={7}
![Page 64: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/64.jpg)
64
When j=6, the intersection of V[7] and W[7,12] is I={7}. Therefore approximate string matching occurs on T with P.
T=acagccgaP={tca,gctga,gca}
![Page 65: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/65.jpg)
6511
a$ g tc
8 3 10 2 5 7 9 4 1 6
a$
tga$
a$
a$
c
tga$
ca$
ga$
Suffix Tree of D (SD)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=7v=Tj+1…Tm=εw=Tj-1…T1=ccgaca
V={11,8,…,6}
![Page 66: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/66.jpg)
66
13
a g tc
5 10 7 12 4 6 11 9 3 8
g$
cg
$$
t$
gtcg$
tcg$
cg$$
t$
Suffix Tree of DR (SDR)Example :P={tca,gctga,gca}D=tca$gctga$gca$DR=acg$agtcg$act$T=acagccga
j=1v=Tj+1…Tm=εw=Tj-1…T1=ccgaca
V={11,8,…,6}W={ε}I={ε}
![Page 67: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/67.jpg)
67
Time Complexity
• For the indexing problem, the preprocessing time is and the query can be implemented in , where tocc is the number of occurrences of the dictionary in the text with one error.
)log( ddO)log( dntoccO
![Page 68: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/68.jpg)
68
References[AC75] Efficient string matching, A. V. Aho and M. J. Corasick, Comm. Assoc. Comput. Mach. 18, No.
6, 1975, pp.333-340.[AF91] Adaptive dictionary matching, A. Amir and M. Farach, Proc. 32nd IEEE FOCS, 1991, pp.760-7
66.[AFI95] A. Amir, M. Farach, R. M. Idury and etc, Improved dynamic dictionary matching , Inform. And
Comput. 119, 1995, pp.258-282.[AKL99] A. Amir, D. Keselman, G. M. Landau and etc, Indexing and dictionary matching with one erro
r, Proc. 1999 Workshop on Algorithms and Data Structures (WADS), 1999, pp.181-192.[AG97] Pattern Matching Algorithms, Oxford University Press, 1997.[BM77] A fast string searching algorithm, Comm. Assoc. Comput. Mach. 20, 1977, pp.762-772.[BG96] G. S. Brodal and L. Gasieniec, Approximate Dictionary queries, Proc. 7th Annual Symposium
on Combinatorial Pattern Matching (CPM96), 1996, pp.65-74.[O88] M. H. Overmars, Efficient data structures for range searching on a grid, J. Algorithms 9, 1988,p
p. 254-275.
![Page 69: Text Indexing and Dictionary Matching with One Error](https://reader036.vdocuments.pub/reader036/viewer/2022062305/56815dc7550346895dcbf45d/html5/thumbnails/69.jpg)
69
Thanks for your listening