cs 3343: analysis of algorithms

25
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics

Upload: elsa

Post on 09-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

CS 3343: Analysis of Algorithms. Lecture 14: Order Statistics. Order statistics. The i th order statistic in a set of n elements is the i th smallest element The minimum is thus the 1 st order statistic The maximum is the n th order statistic The median is the n/2 order statistic - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 3343: Analysis of Algorithms

CS 3343: Analysis of Algorithms

Lecture 14: Order Statistics

Page 2: CS 3343: Analysis of Algorithms

Order statistics

• The ith order statistic in a set of n elements is the ith smallest element

• The minimum is thus the 1st order statistic • The maximum is the nth order statistic• The median is the n/2 order statistic• If n is even, there are 2 medians• How can we calculate order statistics?• What is the running time?

Page 3: CS 3343: Analysis of Algorithms

Order statistics – selection problem

• Select the ith smallest of n elements

• Naive algorithm: Sort.– Worst-case running time (n log n)

using merge sort or heapsort (not quicksort).

• We will show:– A practical randomized algorithm with (n)

expected running time– A cool algorithm of theoretical interest only

with (n) worst-case running time

Page 4: CS 3343: Analysis of Algorithms

Recall: Quicksort

• The function Partition gives us the rank of the pivot

• If we are lucky, k = i. done!• If not, at least get a smaller subarray to work with

– k > i: ith smallest is on the left subarray– k < i : ith smallest is on the right subarray

• Divide and conquer– If we are lucky, k close to n/2, or desired # is in smaller subarray– If unlucky, desired # is in larger subarray (possible size n-1)

x x xx x xrp q

k

Page 5: CS 3343: Analysis of Algorithms

Randomized divide-and-conquer algorithm

RAND-SELECT(A, p, q, i) ⊳ i th smallest of A[ p . . q] if p = q & i > 1 then error!r RAND-PARTITION(A, p, q)k r – p + 1 ⊳ k = rank(A[r])if i = k then return A[ r]if i < k

then return RAND-SELECT( A, p, r – 1, i )else return RAND-SELECT( A, r + 1, q, i – k )

A[r] A[r] A[r] A[r]rp q

k

Page 6: CS 3343: Analysis of Algorithms

Randomized Partition

• Randomly choose an element as pivot– Every time need to do a partition, throw a die to

decide which element to use as the pivot– Each element has 1/n probability to be selected

Rand-Partition(A, p, q){ d = random(); // draw a random number between 0 and 1 index = p + floor((q-p+1) * d); // p<=index<=q swap(A[p], A[index]); Partition(A, p, q); // now use A[p] as pivot}

Page 7: CS 3343: Analysis of Algorithms

Example

pivot

i = 677 1010 55 88 1111 33 22 1313

k = 4

Select the 6 – 4 = 2nd smallest recursively.

Select the i = 6th smallest:

33 22 55 77 1111 88 1010 1313

Partition:

Page 8: CS 3343: Analysis of Algorithms

77 1010 55 88 1111 33 22 1313

33 22 55 77 1111 88 1010 1313

1010

1010 88 1111 1313

88 1010

Complete example: select the 6th smallest element.

i = 6

k = 4

i = 6 – 4 = 2

k = 3

i = 2 < k

k = 2

i = 2 = k

Note: here we always used first element as pivot to do the partition (instead of rand-partition).

Page 9: CS 3343: Analysis of Algorithms

Intuition for analysis

Lucky:101log 9/10 nn

CASE 3T(n) = T(9n/10) + (n)

= (n)Unlucky:

T(n) = T(n – 1) + (n)= (n2)

arithmetic series

Worse than sorting!

(All our analyses today assume that all elements are distinct.)

Page 10: CS 3343: Analysis of Algorithms

Running time of randomized selection

• For upper bound, assume ith element always falls in larger side of partition

• The expected running time is an average of all cases

T(n) ≤

T(max(0, n–1)) + n if 0 : n–1 split,T(max(1, n–2)) + n if 1 : n–2 split,T(max(n–1, 0)) + n if n–1 : 0 split,

nknkTn

nTn

k

1

0)1,max(

1)(

Expectation

Page 11: CS 3343: Analysis of Algorithms

Substitution method

Assume: T(k) ≤ ck for all k < n

nkTn

nknkTn

nTn

nk

n

k

1

2

1

0)(

2)1,max(

1)(

nknc

nkTn

nTn

nk

n

nk

1

2

1

2

2)(

2)(

cncn

ncnncn

nn

nc

nT )4

(4

38

32)(

2

if c ≥ 4Therefore, T(n) = O(n)

Want to show T(n) = O(n). So need to prove T(n) ≤ cn for n > n0

Page 12: CS 3343: Analysis of Algorithms

Summary of randomized selection

• Works fast: linear expected time.• Excellent algorithm in practice.• But, the worst case is very bad: (n2).

Q. Is there an algorithm that runs in linear time in the worst case?

IDEA: Generate a good pivot recursively.

A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973].

Page 13: CS 3343: Analysis of Algorithms

Worst-case linear-time selection

if i = k then return xelseif i < k

then recursively SELECT the i th smallest element in the

lower partelse recursively SELECT the (i–

k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

Same as RAND-SELECT

Page 14: CS 3343: Analysis of Algorithms

Choosing the pivot

Page 15: CS 3343: Analysis of Algorithms

Choosing the pivot

1. Divide the n elements into groups of 5.

Page 16: CS 3343: Analysis of Algorithms

Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

Page 17: CS 3343: Analysis of Algorithms

Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

2. Recursively SELECT the median x of the n/5 group medians to be the pivot.

x

Page 18: CS 3343: Analysis of Algorithms

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.

Page 19: CS 3343: Analysis of Algorithms

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.

(Assume all elements are distinct.)

Page 20: CS 3343: Analysis of Algorithms

Analysis

lesser

greater

x

At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.• Similarly, at least 3 n/10elements are x.

Page 21: CS 3343: Analysis of Algorithms

• At least 3 n/10elements are x at most n-3 n/10elements are x

• At least 3 n/10elements are x at most n-3 n/10elements are x

• The recursive call to SELECT in Step 4 is executed recursively on at most n-3

n/10elements.

AnalysisNeed “at most” for worst-case runtime

3 n/10 3 n/10Possible position for pivot

Page 22: CS 3343: Analysis of Algorithms

• Use fact that a/ba/b-1

• n-3 n/10< n-3(n/10-1) 7n/10 + 3

3n/4 if n ≥ 60

• The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elements.

Analysis

Page 23: CS 3343: Analysis of Algorithms

Developing the recurrence

if i = k then return xelseif i < k

then recursively SELECT the i th smallest element in the

lower partelse recursively SELECT the (i–

k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

T(n)

(n)

T(n/5)

(n)

T(7n/10+3)

Page 24: CS 3343: Analysis of Algorithms

nnTnTnT

3

107

51

)(

Solving the recurrence

if c ≥ 20 and n ≥ 60cn

ncncn

ncn

ncncn

nncncnT

)20/(

20/19

4/35

)3107()5()(

Assumption: T(k) ck for all k < n

if n ≥ 60

Page 25: CS 3343: Analysis of Algorithms

Conclusions

• Since the work at each level of recursion is basically a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root.

• In practice, this algorithm runs slowly, because the constant in front of n is large.

• The randomized algorithm is far more practical.

Exercise: Try to divide into groups of 3 or 7.Exercise: Think about an application in sorting.