lecture 8: sorting in linear time - github pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท...

17
Lecture 8: Sorting in Linear Time

Upload: others

Post on 09-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Lecture 8: Sorting in Linear Time

Page 2: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Running time of sorting algorithms

Do you still remember what these statements mean?

Sorting algorithm A runs in ๐‘‚(๐‘› log ๐‘›) time.

Sorting algorithm A runs in ฮฉ(๐‘› log ๐‘›) time.

So far, all algorithms have running time ฮฉ(๐‘› log ๐‘›)

We didnโ€™t show this for heap sort, though it is true.

Q: Is it possible to design an algorithm that runs faster than ฮฉ(๐‘› log ๐‘›)?

A: No, if the algorithm is comparison-based.

Remark: A comparison-based sorting algorithm is more general, e.g., the

sorting algorithm implemented in C++ STL.

Corollary: Heap sort has running time ฮ˜(๐‘› log ๐‘›).

2

Page 3: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Lower bounds for problems

What does each of the following statements mean?

Sorting can be done in ๐‘‚(๐‘› log ๐‘›) time.

โ€“ There is a sorting algorithm that runs in ๐‘‚(๐‘› log ๐‘›) time.

Sorting requires ฮฉ(๐‘› log ๐‘›) time.

โ€“ Any sorting algorithm must run in ฮฉ ๐‘› log๐‘› time.

This is a very strong statement, which we unfortunately cannot prove.

In fact, we have no lower bound higher than ฮฉ(๐‘›) for sorting.

But we can show an ฮฉ(๐‘› log ๐‘›) lower bound for comparison-based sorting

algorithms

More generally, in the decision-tree model of computation.

3

Page 4: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

The decision-tree model

4

An algorithm in the decision-tree model

Solves the problem by asking questions with binary answers

Cannot ask questions like โ€œhow many legs does it have?โ€

The worst-case running time is the height of the decision tree.

Page 5: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

The decision-tree for binary search

5

Theorem: Any algorithm for finding a given element in a sorted array of

size ๐‘› must have running time ฮฉ(log ๐‘›) in the decision-tree model.

Proof:

The algorithm must have at least ๐‘› different outputs.

The decision-tree has at least ๐‘› leaves.

Any binary tree with ๐‘› leaves must have height ฮฉ(log ๐‘›).

Page 6: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

The decision-tree for sorting

6

Theorem: Any algorithm for sorting ๐‘› elements must have running time

ฮฉ(๐‘› log๐‘›) in the decision-tree model.

Pf:

The algorithm must have at least ๐‘›! different outputs (there are so

many different permutations).

The decision-tree has at least ๐‘›! leaves.

Any binary tree with ๐‘› leaves must have height ฮฉ log ๐‘›! = ฮฉ(๐‘› log๐‘›).

Page 7: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Can we do better?

Implication of the lower bound

Have to use non-comparison based algorithms.

Donโ€™t use worse-case analysis

Integer sorting

We will assume that the elements are integers from 0 to ๐‘˜.

We will use both ๐‘› and ๐‘˜ to express the running time.

Both ๐‘› < ๐‘˜ are ๐‘› > ๐‘˜ possible.

Exercise. Compare the following functions asymptotically:

๐‘›, log ๐‘˜, ๐‘› + 1000, ๐‘› + ๐‘˜, ๐‘› log ๐‘˜ ,max ๐‘›, ๐‘˜ ,min(๐‘›, ๐‘˜)

7

Page 8: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Counting sort

๐ด: input array

๐ต: output array

๐ถ: counting array, then position array

8

Page 9: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Counting sort

9

Counting-Sort(๐ด, ๐ต):

let ๐ถ[0 . . ๐‘˜] be a new array

for ๐‘– โ† 0 to ๐‘˜

๐ถ ๐‘– โ† 0

for ๐‘— โ† 1 to ๐‘›

๐ถ ๐ด ๐‘— โ† ๐ถ[๐ด[๐‘—]] + 1

// ๐ถ[๐‘–] now contains the number of ๐‘–โ€™s

for ๐‘– โ† 1 to ๐‘˜

๐ถ ๐‘– โ† ๐ถ[๐‘–] + ๐ถ[๐‘– โˆ’ 1]

// ๐ถ[๐‘–] now contains the number of elements โ‰ค ๐‘–

for ๐‘— โ† ๐‘› downto 1

๐ต ๐ถ ๐ด ๐‘— โ† ๐ด[๐‘—]

๐ถ ๐ด ๐‘— โ† ๐ถ[๐ด[๐‘—]] โˆ’ 1

Running time: ฮ˜ ๐‘› + ๐‘˜

Working space: ฮ˜ ๐‘› + ๐‘˜

It is a stable sorting algorithm.

Page 10: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Radix sort

10

Radix-Sort(๐ด, ๐‘‘):

for ๐‘– โ† 1 to ๐‘‘

use counting sort to sort array ๐‘จ on digit ๐‘–

Page 11: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Radix sort: Correctness

11

Proof: (induction on digit position)

Assume that the numbers are sorted by their low-order ๐‘– โˆ’ 1

digits

Sort on digit ๐‘–

โ€“ Two numbers that differ on digit ๐‘–

are correctly sorted by their

low-order ๐‘– digits

โ€“ Two numbers equal on digit ๐‘– are put

in the same order as the input โ‡’

correctly sorted by their low-order ๐‘–

digits

Page 12: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Radix sort: Running time analysis

Q: How large should the โ€œdigitsโ€ be?

Analysis: Let each โ€œdigitโ€ take values from 0 to ๐‘ โˆ’ 1.

Counting sort takes ฮ˜(๐‘› + ๐‘) time.

An integer โ‰ค ๐‘˜ has ๐‘‘ = log ๐‘˜/ log ๐‘ such โ€œdigitsโ€.

So the running time of radix sort is ฮ˜log ๐‘˜

log ๐‘โ‹… ๐‘› + ๐‘

This is minimized when ๐‘ = ฮ˜(๐‘›).

The running time is ฮ˜ ๐‘› log๐‘› ๐‘˜ , which is ๐‘‚(๐‘›) when log ๐‘˜ = ๐‘‚(log ๐‘›)

Q: When is this faster than ฮ˜(๐‘› log๐‘›)?

A: When log๐‘› ๐‘˜ < log ๐‘› โ‡” log๐‘˜ < log2 ๐‘›

Implementation:

Choose ๐‘› โ‰ค ๐‘ < 2๐‘› such that ๐‘ is a power of 2.

Use bit-wise operation for more efficient implementation.

12

Page 13: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Summary of sorting algorithms

13

Insertion sort

Merge sort Quicksort Heapsort Radix sort

Running time ฮ˜(๐‘›2) ฮ˜(๐‘› log ๐‘›) ฮ˜(๐‘› log ๐‘›) ฮ˜(๐‘› log ๐‘›) ฮ˜(๐‘› log๐‘› ๐‘˜)

Workingspace

ฮ˜(1) ฮ˜(๐‘›) ฮ˜(log ๐‘›) ฮ˜(1) ฮ˜(๐‘›)

Randomized No No Yes No No

Cache performance

Good Good Good Bad Bad

Parallelization No Excellent Good No No

Stable Yes Yes No No Yes

Comparison-based

Yes Yes Yes Yes No

Page 14: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Bucket sort

Average-case analysis

Bucket sort is the only algorithm for which we do an average-case

analysis in this course.

We will assume that the elements are drawn uniformly and

independently from a range.

โ€“ We can assume the range is [0,1) without loss of generality.

โ€“ They can be real numbers; while radix sort only works on

integers

Non-comparison based

Bucket sort is non-comparison based

It breaks the lower bound by breaking both conditions.

Q: Is it possible to design a comparison-based sorting algorithm with

expected or average running time better than ฮ˜(๐‘› log ๐‘›)?

A: No. (Proof omitted)

14

Page 15: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Bucket sort

15

Bucket-Sort(๐ด)

let ๐ต[0. . ๐‘› โˆ’ 1] be a new array

for ๐‘– โ† 0 to ๐‘› โˆ’ 1

make ๐ต[๐‘–] an empty list

for ๐‘– โ† 1 to ๐‘›

insert ๐ด[๐‘–] into list ๐ต[โŒŠ๐‘›๐ด[๐‘–]โŒ‹]

for ๐‘– โ† 0 to ๐‘› โˆ’ 1

sort list ๐ต[๐‘–] with insertion sort

concatenate the lists ๐ต[0], ๐ต[1], โ€ฆ , ๐ต[๐‘› โˆ’ 1]

together in order

Worst-case: ฮ˜(๐‘›2)

Page 16: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Average-case analysis of bucket sort

๐‘‡ ๐‘› = ๐‘› +

๐‘–=0

๐‘›โˆ’1

๐‘›๐‘–2

๐ธ ๐‘‡ ๐‘› = ๐ธ ๐‘› + ๐ธ

๐‘–=0

๐‘›โˆ’1

๐‘›๐‘–2

= ๐‘› +

๐‘–=0

๐‘›โˆ’1

๐ธ ๐‘›๐‘–2

Claim: ๐ธ ๐‘›๐‘–2 = 2 โˆ’

1

๐‘›

Pf: Let ๐‘‹๐‘–๐‘— = 1 if ๐ด[๐‘—]

falls into bucket ๐‘–.

16

Page 17: Lecture 8: Sorting in Linear Time - GitHub Pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท Sorting algorithm A runs in ๐‘‚(๐‘›log๐‘›)time. Sorting algorithm A runs in ฮฉ(๐‘›log๐‘›)time

Average-case analysis of bucket sort

17

Theorem: When the inputs are

drawn from a range uniformly

and independently, bucket sort

runs in expected ฮ˜(๐‘›) time.