lecture 8: sorting in linear time - github pagesclcheungac.github.io/comp3711/08linearsort.pdfย ยท...
TRANSCRIPT
Lecture 8: Sorting in Linear Time
Running time of sorting algorithms
Do you still remember what these statements mean?
Sorting algorithm A runs in ๐(๐ log ๐) time.
Sorting algorithm A runs in ฮฉ(๐ log ๐) time.
So far, all algorithms have running time ฮฉ(๐ log ๐)
We didnโt show this for heap sort, though it is true.
Q: Is it possible to design an algorithm that runs faster than ฮฉ(๐ log ๐)?
A: No, if the algorithm is comparison-based.
Remark: A comparison-based sorting algorithm is more general, e.g., the
sorting algorithm implemented in C++ STL.
Corollary: Heap sort has running time ฮ(๐ log ๐).
2
Lower bounds for problems
What does each of the following statements mean?
Sorting can be done in ๐(๐ log ๐) time.
โ There is a sorting algorithm that runs in ๐(๐ log ๐) time.
Sorting requires ฮฉ(๐ log ๐) time.
โ Any sorting algorithm must run in ฮฉ ๐ log๐ time.
This is a very strong statement, which we unfortunately cannot prove.
In fact, we have no lower bound higher than ฮฉ(๐) for sorting.
But we can show an ฮฉ(๐ log ๐) lower bound for comparison-based sorting
algorithms
More generally, in the decision-tree model of computation.
3
The decision-tree model
4
An algorithm in the decision-tree model
Solves the problem by asking questions with binary answers
Cannot ask questions like โhow many legs does it have?โ
The worst-case running time is the height of the decision tree.
The decision-tree for binary search
5
Theorem: Any algorithm for finding a given element in a sorted array of
size ๐ must have running time ฮฉ(log ๐) in the decision-tree model.
Proof:
The algorithm must have at least ๐ different outputs.
The decision-tree has at least ๐ leaves.
Any binary tree with ๐ leaves must have height ฮฉ(log ๐).
The decision-tree for sorting
6
Theorem: Any algorithm for sorting ๐ elements must have running time
ฮฉ(๐ log๐) in the decision-tree model.
Pf:
The algorithm must have at least ๐! different outputs (there are so
many different permutations).
The decision-tree has at least ๐! leaves.
Any binary tree with ๐ leaves must have height ฮฉ log ๐! = ฮฉ(๐ log๐).
Can we do better?
Implication of the lower bound
Have to use non-comparison based algorithms.
Donโt use worse-case analysis
Integer sorting
We will assume that the elements are integers from 0 to ๐.
We will use both ๐ and ๐ to express the running time.
Both ๐ < ๐ are ๐ > ๐ possible.
Exercise. Compare the following functions asymptotically:
๐, log ๐, ๐ + 1000, ๐ + ๐, ๐ log ๐ ,max ๐, ๐ ,min(๐, ๐)
7
Counting sort
๐ด: input array
๐ต: output array
๐ถ: counting array, then position array
8
Counting sort
9
Counting-Sort(๐ด, ๐ต):
let ๐ถ[0 . . ๐] be a new array
for ๐ โ 0 to ๐
๐ถ ๐ โ 0
for ๐ โ 1 to ๐
๐ถ ๐ด ๐ โ ๐ถ[๐ด[๐]] + 1
// ๐ถ[๐] now contains the number of ๐โs
for ๐ โ 1 to ๐
๐ถ ๐ โ ๐ถ[๐] + ๐ถ[๐ โ 1]
// ๐ถ[๐] now contains the number of elements โค ๐
for ๐ โ ๐ downto 1
๐ต ๐ถ ๐ด ๐ โ ๐ด[๐]
๐ถ ๐ด ๐ โ ๐ถ[๐ด[๐]] โ 1
Running time: ฮ ๐ + ๐
Working space: ฮ ๐ + ๐
It is a stable sorting algorithm.
Radix sort
10
Radix-Sort(๐ด, ๐):
for ๐ โ 1 to ๐
use counting sort to sort array ๐จ on digit ๐
Radix sort: Correctness
11
Proof: (induction on digit position)
Assume that the numbers are sorted by their low-order ๐ โ 1
digits
Sort on digit ๐
โ Two numbers that differ on digit ๐
are correctly sorted by their
low-order ๐ digits
โ Two numbers equal on digit ๐ are put
in the same order as the input โ
correctly sorted by their low-order ๐
digits
Radix sort: Running time analysis
Q: How large should the โdigitsโ be?
Analysis: Let each โdigitโ take values from 0 to ๐ โ 1.
Counting sort takes ฮ(๐ + ๐) time.
An integer โค ๐ has ๐ = log ๐/ log ๐ such โdigitsโ.
So the running time of radix sort is ฮlog ๐
log ๐โ ๐ + ๐
This is minimized when ๐ = ฮ(๐).
The running time is ฮ ๐ log๐ ๐ , which is ๐(๐) when log ๐ = ๐(log ๐)
Q: When is this faster than ฮ(๐ log๐)?
A: When log๐ ๐ < log ๐ โ log๐ < log2 ๐
Implementation:
Choose ๐ โค ๐ < 2๐ such that ๐ is a power of 2.
Use bit-wise operation for more efficient implementation.
12
Summary of sorting algorithms
13
Insertion sort
Merge sort Quicksort Heapsort Radix sort
Running time ฮ(๐2) ฮ(๐ log ๐) ฮ(๐ log ๐) ฮ(๐ log ๐) ฮ(๐ log๐ ๐)
Workingspace
ฮ(1) ฮ(๐) ฮ(log ๐) ฮ(1) ฮ(๐)
Randomized No No Yes No No
Cache performance
Good Good Good Bad Bad
Parallelization No Excellent Good No No
Stable Yes Yes No No Yes
Comparison-based
Yes Yes Yes Yes No
Bucket sort
Average-case analysis
Bucket sort is the only algorithm for which we do an average-case
analysis in this course.
We will assume that the elements are drawn uniformly and
independently from a range.
โ We can assume the range is [0,1) without loss of generality.
โ They can be real numbers; while radix sort only works on
integers
Non-comparison based
Bucket sort is non-comparison based
It breaks the lower bound by breaking both conditions.
Q: Is it possible to design a comparison-based sorting algorithm with
expected or average running time better than ฮ(๐ log ๐)?
A: No. (Proof omitted)
14
Bucket sort
15
Bucket-Sort(๐ด)
let ๐ต[0. . ๐ โ 1] be a new array
for ๐ โ 0 to ๐ โ 1
make ๐ต[๐] an empty list
for ๐ โ 1 to ๐
insert ๐ด[๐] into list ๐ต[โ๐๐ด[๐]โ]
for ๐ โ 0 to ๐ โ 1
sort list ๐ต[๐] with insertion sort
concatenate the lists ๐ต[0], ๐ต[1], โฆ , ๐ต[๐ โ 1]
together in order
Worst-case: ฮ(๐2)
Average-case analysis of bucket sort
๐ ๐ = ๐ +
๐=0
๐โ1
๐๐2
๐ธ ๐ ๐ = ๐ธ ๐ + ๐ธ
๐=0
๐โ1
๐๐2
= ๐ +
๐=0
๐โ1
๐ธ ๐๐2
Claim: ๐ธ ๐๐2 = 2 โ
1
๐
Pf: Let ๐๐๐ = 1 if ๐ด[๐]
falls into bucket ๐.
16
Average-case analysis of bucket sort
17
Theorem: When the inputs are
drawn from a range uniformly
and independently, bucket sort
runs in expected ฮ(๐) time.