資料結構 與 演算法

62
資資資資 資資資 台台台台台 台台http://www.csie.ntu.edu.tw/~hil/algo/

Upload: joben

Post on 24-Feb-2016

54 views

Category:

Documents


1 download

DESCRIPTION

資料結構 與 演算法. 台大資工系 呂學一 http://www.csie.ntu.edu.tw/~hil/algo/. Today. An query that strengthens suffix tree Range Minima Query (RMQ) RMQ for ±sequences. RMQ for general sequences . 地圖. Document listing. Wildcard matching Fuzzy matching. LCE. RMQ. LCA. +/-RMQ. - PowerPoint PPT Presentation

TRANSCRIPT

http://www.csie.ntu.edu.tw/~hil/algo/

2TodayAn query that strengthens suffix treeRange Minima Query (RMQ)RMQ for sequences.RMQ for general sequences.

+/-RMQLCARMQLCEDocument listingWildcard matchingFuzzy matching34RMQ: Range Minima QueryS: a sequence of numbers.(S, i, j) = k if i k j, andS[k] = min(S[i], S[i+1], , S[i]). 123456789S = 340141932

(S, 2, 6) = 3(S, 4, 10) = 4 (or 6).5The RMQ challengeInput: a sequence S of numbersOutput: a data structure D for STime complexityConstant query timeEach query (S, i, j) for S can be answered from D and S in O(1) time.Linear preprocessing timeD can be computed in O(|S|) time.6Nave approachStoring the answer of (S, i, j) in a table for all index pairs i and j with 1 i j |S|.Query time = O(1).Preprocessing time = (|S|2).7Faster PreprocessingAssumption (without loss of generality)|S| = 2k for some positive integer k.Idea: Precomputing the values of (S, i, j) only for those indices i and j with j i + 1 = 1, 2, 4, 8, , 2k = |S|.Preprocessing timeO(|S| log |S|).8(S, i, j) still in O(1) timeLet k be the (unique) integer that satisfies 2k j i + 1 < 2k+1.Then, (S, i, j) is x = (S, i, i + 2k 1) ory = (S, j 2k + 1, j).iji + 2k 1j 2k + 19As a resultRMQInput: O(n) numbersPreprocessing: O(n log n) timeQuery: O(1) timeRMQInput: O(n/log n) numbersPreprocessing: O(n) timeQuery: O(1) time10The RMQ Challenge for sequeneces11sequenecesS is a sequence if S[i] S[i 1] = 1 for each index i with 2 i |S|.For example, S = 5 6 5 4 3 2 3 2 3 4 5 6 5 6 7 + - - - - + - + + + + - + +

S = 3 4 3 2 1 0 -1 -2 -1 0 1 2 1 + - - - - - - + + + + -12The RMQ Challenge for sequenecesInput: a sequence S of numbersOutput: a data structure D for STime complexityConstant query timeEach query (S, i, j) for S can be answered from D and S in O(1) time.Linear preprocessing timeD can be computed in O(|S|) time under the unit-cost RAM model.13Unit-Cost RAM modelOperations such as add, minus, comparison on consecutive O(log n) bits can be performed in O(1) time.! model14Idea: compressionBreaking S into blocks of length L = log |S|.There are B = 2|S|/log |S| blocks.Let [t] be the minimum of the t-th block of S.[t] = min {S[j] | j = (t 1) L < j tL} for t = 1, 2, , B.Computable in O(|S|) time.RMQ on : (, x, y)O(1) query time.O(|S|) preprocessing time. (Why?)Any constant c < 1 is OK.15(S, i, j) via (, s, t)Suppose S[i] is in the -th block of S.(1) L k) then return no;j += 1 + (i + j + 1, j + 2);return yes.43O(k|S|) timeO(|P|+|S|) = O(|S|) time: preprocessing for supporting each (i, j) query in O(1) time.O(|S|) iterations, each takes time O(k).44The RMQ (i.e., (S, i, j)) challenge for general sequencesAnother application of lowest common ancestor+/-RMQLCARMQLCEDocument listingWildcard matchingFuzzy matching4546The RMQ challengeInput: a sequence S of numbersOutput: a data structure D for STime complexityConstant query timeEach query (S, i, j) for S can be answered from D and S in O(1) time.Linear preprocessing timeD can be computed in O(|S|) time.47Idea: Minima Tree 123456789S=432417363(S,i,j)=(i,j).573864921ExerciseGive an O(n)-time algorithm for computing the minima tree for an n-element array.4849Listing source strings that contains a pattern string [Muthukrishnan, SODA02]An application of RMQ for general sequences+/-RMQLCARMQLCEDocument listingWildcard matchingFuzzy matching 5051The problem Input: Strings S1, S2, , Sm, which can be preprocessed in linear time.A string P.Output: The index j of each Sj that contains P.52Preliminary attemptsObtaining the suffix tree for S1#S2##Sm$.Find all occurrences of P.I.e., exact string matching for S1#S2##Sm$ and P.Time = O(|P| + total number of occurences of P).Obtaining the suffix tree for each Si.Determining whether P occurs in Si. I.e., substring problem for each pair Si and P.Time = O(|P|m).53The challenge Input: Strings S1, S2, , Sm, which can be preprocessed in linear time.A string P.Output: The index j of each Sj that contains P.ObjectiveO(|P| + (P)) time, where (P) is the number of output indices.54The second attemptConstructing the suffix tree for S1#S2##Sm$.Keeping the distinct descendant leaf colors for each internal node.Query time?Preprocessing time?55The second attemptEach query takes O(|P|+(P)) time. (Why?)The preprocessing may need (m| S1#S2##Sm$|) time. (Why?)Q: Any suggestions for resolving this problem?5612345678Keeping the list of leaf colors from left to right.Each internal keeps the indices of leftmost and rightmost descendant leaves.467381521,85,81,42,43,46,76,8Compact Representation57The challenge of listing distinct colorsInput: a sequence of colors.Output: a data structure D for such thatD is computable in O(||) time.Each (i, j) = {(i), , (j)} query can be answered from D in O(|(i, j)|) time.58An auxiliary index array1234567800023156Let [i] = 0 if [j] [i] for all j < i. Let [i] be the largest index j with j < i such that [i] = [j].59An observation1234567800023156A color c is in (i, j) if and only there is an index k in [i, j] such that [k] = c and [k] < i.60The algorithm (i, j)Just recursively call (i, j, i);Subroutine (p, q, ): If (p > q) then return;Let k = (, p, q);If (k ) then return;Output [k];Call (p, k 1, );Call (k + 1, q, );61(3,7) = (3, 7, 3)123456780002315662Time = O(|(i, j)|)Why?