fundamentals of algorithm analysis algorithm : design & analysis [tutorial - 1]

Download Fundamentals of Algorithm Analysis Algorithm : Design & Analysis [Tutorial - 1]

If you can't read please download the document

Upload: eunice-anderson

Post on 18-Jan-2018

261 views

Category:

Documents


0 download

DESCRIPTION

In Previous Classes…  Introduction to Algorithm Analysis  Asymptotic Behavior of Functions  Recursion and Master Theorem  Sorting

TRANSCRIPT

Fundamentals of Algorithm Analysis Algorithm : Design & Analysis [Tutorial - 1] Qian Zhuzhong Research: Distributed Computing Pervasive Computing Service Oriented Computing Office: 503A, MMW In Previous Classes Introduction to Algorithm Analysis Asymptotic Behavior of Functions Recursion and Master Theorem Sorting In Tutorial One About the Tutorial Algorithm Analysis Revisiting Asymptotic Behavior Revisiting Recursion About the Tutorial The course Algorithm design and analysis Coverage The tutorial: Reemphasize important issues by Further explanation Typical examples Interaction Find efficient solutions Algorithm Analysis Before learning algorithm analysis Learn to solve problems by computational thinking* Data structures After leaning algorithm analysis How efficient is my first solution? How to improve? Better solutions The optimal solution *Solve problems Nave solution Better solutions optimal solution efficiency Asymptotic Behavior Discussion of asymptotic notations Some properties An example: Maximum Subsequence Sum Improvement of Algorithm Comparison of Asymptotic Behavior The definition of O, and O Giving g:NR +, then (g) is the set of f:NR +, such that for some c R + and some n 0 N, f(n) cg(n) for all n n 0. A function f (g) if lim n [f(n)/g(n)]=c< Giving g:NR +, then (g) is the set of f:NR +, such that for some c R + and some n 0 N, f(n) cg(n) for all n n 0. A function f (g) if lim n [f(n)/g(n)]=c>0 Giving g:NR +, then (g) = (g) (g) A function f (g) if lim n [f(n)/g(n)]=c, 0 b Properties of O(o), () and Transitive property(O, , , , o): If f O(g) and g O(h), then f O(h), Reflexive property: f(n) (f(n)), f(n) O(f(n)), f(n) (f(n)) Symmetric properties f (g) if and only if g (f) f O(g) if and only if g (f) f o(g) if and only if g (f) Order of sum function O(f+g)=O(max(f, g)) Maximum Subsequence Sum The problem: Given a sequence S of integer, find the largest sum of a consecutive subsequence of S. (0, if all negative items) An example: -2, 11, -4, 13, -5, -2; the result 20: (11, -4, 13) A brute-force algorithm: MaxSum = 0; for (i = 0; i < N; i++) for (j = i; j < N; j++) { ThisSum = 0; for (k = i; k MaxSum) MaxSum = ThisSum; } return MaxSum; i=0 i=1 i=2 i=n-1 k j=0 j=1 j=2 j=n-1 in O(n 3 ) the sequence More Precise Complexity Decreasing the number of loops An improved algorithm MaxSum = 0; for (i = 0; i < N; i++) { ThisSum = 0; for (j = i; j < N; j++) { ThisSum += A[j]; if (ThisSum > MaxSum) MaxSum = ThisSum; } return MaxSum; the sequence i=0 i=1 i=2 i=n-1 j in O(n 2 ) Power of Divide-and-Conquer Part 1 Part 2 the sub with largest sum may be in: Part 1 Part 2 or: Part 1 Part 2 recursion The largest is the result in O(nlogn) Divide-and-Conquer: the Procedure Center = (Left + Right) / 2; MaxLeftSum = MaxSubSum(A, Left, Center); MaxRightSum = MaxSubSum(A, Center + 1, Right); MaxLeftBorderSum = 0; LeftBorderSum = 0; for (i = Center; i >= Left; i--) { LeftBorderSum += A[i]; if (LeftBorderSum > MaxLeftBorderSum) MaxLeftBorderSum = LeftBorderSum; } MaxRightBorderSum = 0; RightBorderSum = 0; for (i = Center + 1; i MaxRightBorderSum) MaxRightBorderSum = RightBorderSum; } return Max3(MaxLeftSum, MaxRightSum, MaxLeftBorderSum + MaxRightBorderSum); Note: this is the core part of the procedure, with base case and wrap omitted. A Linear Algorithm ThisSum = MaxSum = 0; for (j = 0; j < N; j++) { ThisSum += A[j]; if (ThisSum > MaxSum) MaxSum = ThisSum; else if (ThisSum < 0) ThisSum = 0; } return MaxSum; the sequence j Negative item or subsequence cannot be a prefix of the subsequence we want. This is an example of online algorithm in O(n) ThisSum MaxSum Recursion Problem solving Divide and conquer Recurrence equation Solve the recurrence Characteristic Equation Master Theorem How do we obtain the results? Rationale behind the detailed mathematical proof External Path Length The external path length of a 2-tree t is defined as follows: The external path length of a leaf, which is a 2-tree consisting of a single external node, is 0 If t is a nonleaf 2-tree, with left subtree L and right subtree R, then the external path length of t is the sum of: the external path length of L; the number of external node of L; the external path length of R; the number of external node of R; In fact, the external path length of t is the sum of the lengths of all the paths from the root of t to any external node in t. 2-Tree 2-Tree Common Binary Tree internal nodes external nodes no child any type Both left and right children of these nodes are empty tree Calculating the External Path Length EplReturn calcEpl(TwoTree t) EplReturn ansL, ansR; EplReturn ans=new EplReturn(); 1. if (t is a leaf) 2. ans.epl=0; ans.extNum=1; 3. else 4. ansL=calcEpl(leftSubtree(t)); 5. ansR=calcEpl(rightSubtree(t)); 6. ans.epl=ansL.epl+ansR.epl+ansL.extNum +ansR.extNum; 7. ans.extNum=ansL.extNum+ansR.extNum 8. Return ans; TwoTree is an ADT defined for 2-tree EplReturn is a organizer class with two field epl and extNum Correctness of Procedure calcEpl Let t be any 2-tree. Let epl and m be the values of the fields epl and extNum, respectively, as returned by calcEpl(t). Then: 1. epl is the external path length of t. 2. m is the number of external nodes in t. 3. epl mlg(m) (note: for 2-tree with internal n nodes, m=n+1) Proof on Procedure calcEpl Induction on t, with the subtree partial order: Base case: t is a leaf. (line 2) Inductive hypothesis: the 3 statements hold for any proper subtree of t, say s. Inductive case: by ind. hyp., epl L, epl R, m L, m R,are expected results for L and R(both are proper subtrees of t), so: Statement 1 is guranteed by line 6 Statement 2 is guranteed by line 7 (any external node is in either L or R) Statement 3: by ind.hyp. epl=epl L +epl R +m m L lg(m L )+m R lg(m R )+m, note f(x)+f(y) 2f((x+y)/2) if f is convex, and xlgx is convex for x>0, so, epl 2((m L +m R )/2)lg((m L +m R )/2)+m = m(lg(m)-1)+m =mlgm. Characteristic Equation If the characteristic equation of the recurrence relation has two distinct roots s 1 and s 2, then where u and v depend on the initial conditions, is the explicit formula for the sequence. Number of Valid Strings String to be transmitted on the channel Length n Consisting of symbols a, b, c If aa exists, cannot be transmitted E.g. strings of length 2: ab, ac, ba, bb, bc, ca, cc, cb Number of valid strings? Divide and conquer f(n)=2f(n-1)+2f(n-2), n>2 f(1)=3, f(2)=8 b a c bac n-1 n-2 Analysis of the D&C solution Characteristic equation Solution Recursion Tree for T(n)=bT(n/c)+f(n) f(n)f(n) T(1) f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2)f(n/c2) f(n/c)f(n/c)f(n/c)f(n/c)f(n/c)f(n/c) log c n f(n)f(n) Note: b b Total ? Divide-and-Conquer: the Solution The recursion tree has depth D=lg(n)/ lg(c), so there are about that many row-sums. The solution of divide-and-conquer equation is the nonrecursive costs of all nodes in the tree, which is the sum of the row-sums. The 0 th row-sum is f(n), the nonrecursive cost of the root. The D th row-sum is n E, assuming base cases cost 1, or (n E ) in any event. Solution by Row-sums [Little Master Theorem] Row-sums decide the solution of the equation for divide-and-conquer: Increasing geometric series: T(n) (n E ) Constant: T(n) (f(n) log n) Decreasing geometric series: T(n) (f(n)) This can be generalized to get a result not using explicitly row-sums. Master Theorem Loosening the restrictions on f(n) Case 1: f(n) O(n E- ), ( >0), then: T(n) (n E ) Case 2: f(n) (n E ), as all node depth contribute about equally: T(n) (f(n)log(n)) case 3: f(n) (n E+ ), ( >0), and f(n) O(n E+ ), ( ), then: T(n) (f(n)) The positive is critical, resulting gaps between cases as well Looking at the Gap T(n)=2T(n/2)+nlgn a=2, b=2, E=1, f(n)=nlgn We have f(n)= (n E ), but no >0 satisfies f(n)= (n E+ ), since lgn grows slower that n for any small positive . So, case 3 doesnt apply. However, neither case 2 applies. Why is important? Example: Matrix Multiplication Standard Algorithm by definition Run time = (n 3 ) Divide-and-conquer Algorithm Idea: n*n matrix = 2*2 of (n/2) * (n/2) sub-matrices: Analysis of the D&C Algorithm Sub-matrix size # sub-matrices Adding sub-matrices Do you have any questions?