wavelet tree wo tears

Upload: toshiyuki-maezawa

Post on 16-Jul-2015

826 views

Category:

Documents


0 download

TRANSCRIPT

echizen_tm Mar. 24, 2012

(1 slide)

(2 slides) (2 slides) FM-Index(13 slides)

(12 slides) (1 slide) (1 slide)

(2 slides) (1 slide)

IDechizen_tm

EchizenBlog-Zwei(http://d.hatena.ne.jp/echizen_tm/)

web ()

(1/2) () LOUDS (Information Theoretical Lower Bound = ITLB) (O(1)O(logN)) ic

(2/2) (DSIRNLP#2)

LOUDS (DSIRNLP#1)

(DSIRNLP#3)

(1/2) (Full-Text Search Engine) ()

(Inverted Index) (Suffix Array)

(2/2) FM-Index

(Suffix Array) (4)

FM-Index (0.3)

FM-Index(1/13) FM-Index FerraginaManzini [Ferragina+ 2000] Ferragina & Manzini - Index

Burrows-Wheeler(BWT)

(self-index) ()

[Ferragina+ 2004]

FM-Index(2/13) (Suffix Array) mississippi (1)(Suffix)0 1 2 3 4 5 6 7 8 9 10 11 mississippi# ississippi#m ssissippi#mi sissippi#mis issippi#miss ssippi#missi sippi#missis ippi#mississ ppi#mississi pi#mississip i#mississipp #mississippi 11 10 7 4 1 0 9 8 6 3 5 2 #mississippi i#mississipp ippi#mississ issippi#miss ississippi#m mississippi# pi#mississip ppi#mississi sippi#missis sissippi#mis ssippi#missi ssissippi#mi

FM-Index(3/13) (Suffix Array) mississippi (2) (3) 11 10 7 4 1 0 9 8 6 3 5 2 #mississippi i#mississipp ippi#mississ issippi#miss ississippi#m mississippi# pi#mississip ppi#mississi sippi#missis sissippi#mis ssippi#missi ssissippi#mi

FM-Index(4/13) N 4

(N) + (4N)= 5N (5) (O(N)) + (O(NlogN)) = O(NlogN)

FM-Index(5/13) FM-Index Burrows-Wheeler(BWT) BWT(N) BWT()

( o(N)) FM-Index N + o(N) (o(N)0.3 = 1.33)

FM-Index(6/13) Burrows-Wheeler(BWT) BWT#mississippi i#mississipp ippi#mississ issippi#miss ississippi#m mississippi# pi#mississip ppi#mississi sippi#missis sissippi#mis ssippi#missi ssissippi#mi i p s s m # p i s s i i

BWT

FM-Index(7/13) BWTTO(N) O(1) LF()

LF(0) = 1LF(1) = 6 LF(6) = 7 LF(7) = 2 LF(2) = 8 LF(8) = 10 LF(10) = 3 LF(3) = 9 LF(9) = 11 LF(11) = 4 LF(4) = 5

T[0] = i T[1] = p T[6] = p T[7] = i T[2] = s T[8] = s T[10] = i T[3] = s T[9] = s T[11] = i T[4] = m

0 1 2 3 4 5 6 7 8 9 10 11

#mississippi i#mississipp ippi#mississ issippi#miss ississippi#m mississippi# pi#mississip ppi#mississi sippi#missis sissippi#mis ssippi#missi ssissippi#mi

i p s s m # p i s s i i

FM-Index(8/13) LF() LF(i) = TT[i]+ TiT[i]

ipssm#pissiiLF(9) = s (#1 + i4 + m1 + p2 = 8) + 9s(T[2], T[3], T[8]) =8+3 = 11

FM-Index(9/13) LF() LF(i) = TT[i]+ TiT[i]

TT[i](256) TiT[i] (256)(N)

FM-Index(10/13) TiT[i]

FM-Index(11/13) DSIRNLP#2

(O(1)O(logN)) rank(i) = i1 select(i) = i1

rank()

ic

FM-Index(12/13) LOUDS

BP

DFUDS

FM-Index(13/13) /

(4) BWT(FM-Index) BWT ic

(1/12) (Wavelet Tree) NO(N) + o(N) O(1)O(logN) rank(i, c)ic select(i, c)ic

FM-Indexrankrank

(2/12) 012

a,b,c,d4

(3/12) :abcdabdc

rank(5, a) = 2

abcdabdc2a

(4/12) abcdabdcrank(5,a)

4 2 abcdabdc abab (ab) cddc (cd)

(5/12) abab => 0101 cddc => 0110

rank

rank abcdabdcrank(5, a)ababrank(i, a)

irank

(6/12) abcdabdc5a2b1 ab3

rank(5, a)5a

5ab3 ababrank(3, a)

(7/12) abcdabdc5ab(abab)

abcdabdcabab0 cddc 1 abcdabdc => 00110011

rank(5, 0) = 3

(8/12) abcdabdcrank(5, a)

ababrank(3, a)

0101rank(3, 0) rank(3, 0) = 2

(9/12) abcdabdcrank(5, a) abab, cddc a,b => 0, c,d => 1 00110011rank(5, 0) rank(5, 0) = 3

ababa => 0, b => 1 0101rank(3, 0) rank(3, 0) = 2

(10/12) bv = abcdabdc

x

= 00110011 (abcdabdc) y[0] = 0101 (abab) y[1] = 0110 (cddc)

a = {0, 0}, b = {0, 1}, c = {1, 0}, d = {1, 1} bv.rank(5, a)= y[a[0]].rank(x.rank(5, a[0]), a[1]) = y[0].rank(x.rank(5, 0), 0) = y[0].rank(3, 0) =2

(11/12) bv = abcdabdc

x

= 00110011 (abcdabdc) y[0] = 0101 (abab) y[1] = 0110 (cddc)

a = {0, 0}, b = {0, 1}, c = {1, 0}, d = {1, 1} bv.rank(6, c)= y[c[0]].rank(x.rank(6, c[0]), c[1]) = y[1].rank(x.rank(6, 1), 0) = y[1].rank(2, 0) =1

(12/12) 4

rank(i, c)c rank 4 => 2 256 => 8

1=1rankrank8

FM-Index

FM-Index FM-IndexBurrows-Wheeler

FM-Index

LOUDS

The Burrows-wheeler Transform BWT ( )

(1/2) FM-IndexShellinford Shellinford

Shellinford()

(2/2) shellinford::fm_index fm;fm.push_back(); fm.push_back(); fm.push_back(); fm.search(, values); i = values.begin(); while (i != values.end()) { cout first)