lecture-36-cs210-2012.pptx
TRANSCRIPT
-
8/11/2019 Lecture-36-CS210-2012.pptx
1/42
Data Structures and Algorithms
(CS210/ESO207/ESO211)
Lecture 36
Sorting
beyondO(nlogn) bound
1
-
8/11/2019 Lecture-36-CS210-2012.pptx
2/42
Overview of todays lecture
The sorting algorithms you studied till now
Integer sorting
Solving 2 problems from Practice sheet 6 and one problem
from Practice sheet 5.
2
-
8/11/2019 Lecture-36-CS210-2012.pptx
3/42
Sorting algorithms studied till now
-
8/11/2019 Lecture-36-CS210-2012.pptx
4/42
Algorithms for Sorting nelements
Insertionsort: O()
Selectionsort: O()
Bubblesort: O()
Mergesort: O(log)
Quicksort: worst case O(), average case O(log)
Heap sort: O(log)
Question:What is common among these algorithms ?
Answer: All of them are allowed to use only comparisonoperation toperform sorting.
-
8/11/2019 Lecture-36-CS210-2012.pptx
5/42
Question:Can we sort in O() time ?
The answerdepends upon
the model of computation.
the domain of input.
Theorem (to be proved in CS345): Every comparison based sortingalgorithm must perform at least O(log)comparisons in the worst case.
-
8/11/2019 Lecture-36-CS210-2012.pptx
6/42
word RAM model of computation:
Characteristics
Word is the basic storageunit of RAM. Word is a collection of few bytes.
Each input item (number, name) is stored in binary format.
RAM can be viewed as a huge array of words. Any arbitrary location ofRAM can be accessedin the same time irrespectiveof the location.
Data as well as Program reside fully in RAM.
Each arithmetic or logical operation (+,-,*,/,or, xor,) involving a constant
number of words takes a constant number of steps by the CPU.
6
Each arithmetic or logical operation (+,-,*,/,or, xor,) involving O( log n) bitstake a constant number of steps by the CPU, where nis the number of bits of
input instance.
-
8/11/2019 Lecture-36-CS210-2012.pptx
7/42
Integer sorting
-
8/11/2019 Lecture-36-CS210-2012.pptx
8/42
Counting sort:algorithm for sorting integers
Input: An array Astoring integers in the range [0 ].
Output:Sorted array A.
Running time: O( + ) in word RAM model of computation.
Extra space: O()
-
8/11/2019 Lecture-36-CS210-2012.pptx
9/42
Counting sort:algorithm for sorting integers
A
0 1 2 3 4 5 6 7
Count
0 1 2 3 4 5
2 5 3 0 2 3 0 3
2
2 2 4 7 7 8
0 2 3 0 1
Place
0 1 2 3 4 5
B
0 1 2 3 4 5 6 7
3
-
8/11/2019 Lecture-36-CS210-2012.pptx
10/42
Counting sort:algorithm for sorting integers
A
0 1 2 3 4 5 6 7
Count
0 1 2 3 4 5
2 5 3 0 2 3 0 3
2
2 2 4 6 7 8
0 2 3 0 1
Place
0 1 2 3 4 5
B
0 1 2 3 4 5 6 7
0 3
-
8/11/2019 Lecture-36-CS210-2012.pptx
11/42
Counting sort:algorithm for sorting integers
A
0 1 2 3 4 5 6 7
Count
0 1 2 3 4 5
2 5 3 0 2 3 0 3
2
1 2 4 6 7 8
0 2 3 0 1
Place
0 1 2 3 4 5
B
0 1 2 3 4 5 6 7
30 3
-
8/11/2019 Lecture-36-CS210-2012.pptx
12/42
Counting sort:algorithm for sorting integers
Algorithm (A[... ],)
For=0to do Count[] 0;
For=0to do Count[A[]] Count[A[]] +1;
For=0to do Place[]
Count[];
For=1to do Place[] Place[ ] + Count[];
For= to do
{ B[ ?? ]
A[];Place[A[]] Place[A[]]-1;
}
return B;
Place[A[]]-1
-
8/11/2019 Lecture-36-CS210-2012.pptx
13/42
Counting sort:algorithm for sorting integers
Note:The algorithm performs arithmetic operations involving O(log + log)
bits. In word RAM model, it takes O(1) time for such an operation.
Theorem: An array storing integers in the range [.. ]can be sorted inO(+) time and using total O(+) space in word RAM model.
For = O(), we get an optimal algorithm for sorting. But what if is large ?
In the next class:
We shall discuss an algorithm for sorting integers in the range [..] in O()
time and using O() space in word RAMmodel.
-
8/11/2019 Lecture-36-CS210-2012.pptx
14/42
Practice sheet 6
We shall solve exercises 5and 1 from this sheet
-
8/11/2019 Lecture-36-CS210-2012.pptx
15/42
Important note
Though the solution is provided for this problem here, one should NOTfeel that such
a problem will be asked in the end sem exam of this course. It was a mistake of the
instructor to put it in the practice sheet.
-
8/11/2019 Lecture-36-CS210-2012.pptx
16/42
Problem 5 of practice sheet 6.
Description(in terms of interval):
Given a set Aof nintervals, compute smallest set Bof intervals so that for every
interval I in A\B, there is some interval in Bwhich overlaps/intersects with I.
A
The set of green intervals is a solution
but not an optimalsolution.
-
8/11/2019 Lecture-36-CS210-2012.pptx
17/42
Solution of Problem 5 of practice sheet 6.
Description(in terms of interval):
Given a set Aof nintervals, compute smallest set Bof intervals so that for everyinterval I in A\B, there is some interval in Bwhich overlaps/intersects with I.
Let I* be the interval with earliest finish time.
Let I be the interval with maximumfinish time overlapping I*.
Lemma1:There is an optimal solution for set A that contains I.
A I*I
-
8/11/2019 Lecture-36-CS210-2012.pptx
18/42
Solution of Problem 5 of practice sheet 6.
Question: How to obtain smaller instance A using this greedy approach ?
Naive approach (again inspired from the job scheduling problem):remove from Aall
intervals which overlap with I. This is A.
This approach does not work! Here is a counterexample.
The problem is that some deleted interval (in this case I) could have been used forintersecting many intervals if it were not deleted. But deleting it from the instance
disallows it to be selected in the solution.
A
II*
I
-
8/11/2019 Lecture-36-CS210-2012.pptx
19/42
Overview of the approach
In order to make sure we do not delete intervals (like I in the previous slide)if they are essential to be selected to cover many other intervals, we make
some observations and introduce a terminology called Uniquely coveredinterval. It turns out that we need to keep I in the smaller instance if there isan interval there which is uniquely covered by I . Otherwise, we may discardI.
-
8/11/2019 Lecture-36-CS210-2012.pptx
20/42
An Observation
We can delete all intervals whose finish time is before finish time of Ibecause any intervaloverlapped by such intervals will anyway be overlapped by I. Let us consider intervalswhich overlap with I, but have finish time greater than that of I. In the example shownbelow, these intervals are those three intervals which cross the red line.
Observation1: Among the intervals crossing the red line, we need to keep only that interval
which has maximum finish time. (I in this picture)
Proof: Notice that each of these intervals are anyway intersected by I. As far as using themto intersect other intervals in concerned, we may better choose I for this purpose.
So from now onwards, we shall assume that there is exactly one interval I in Awhich
overlaps I(intersects the red line) and has finish time larger than I.
II*
IA
-
8/11/2019 Lecture-36-CS210-2012.pptx
21/42
Uniquely covered interval
I2 is said to be uniquely covered by I1if
I2 is fully covered by I1
Every interval overlapping I2 is also full covered by I1.
Lemma2 :There is an optimal solution containing I1.
Proof: Surely I2or some other interval overlapping it must be there in the optimal solution. If
we replace that interval by I1, we still get a solution of the same size and hence an optimal solution.
I2
I1
-
8/11/2019 Lecture-36-CS210-2012.pptx
22/42
We are now ready to give description/construction of AfromA. There will be
two cases. We shall then prove that |Opt(A)| = |Opt(A)| + 1 for each ofthese cases.
Important note:
The reader is advised to full understand Lemma1, Lemma2, Observation1,
and the notion of Uniquely covered interval. Also fully internalize thenotations I*,I, andI. This will help the reader understand the rest of thesolution.
-
8/11/2019 Lecture-36-CS210-2012.pptx
23/42
ConstructingAfrom A
-
8/11/2019 Lecture-36-CS210-2012.pptx
24/42
ConstructingAfrom A
AI
I*
I
I
Case1:There is an interval I D uniquely covered by I
AI
I
D E
We need to take care
of intervals whose
starting point is to
the right of red line
(finish time of I).
We can partition these
intervals into two sets.
D: those which overlap with I.E: those that start after the
end of I and hence do notoverlap with I.
D E
Now we shall describe the two
cases for construction of A.
-
8/11/2019 Lecture-36-CS210-2012.pptx
25/42
ConstructingAfrom A
If there is an interval I D uniquely covered by I, then we define A as
follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). Now add I to this set. This set isthe smaller instance A for Case 1.
We shall now define A for Case 2.
-
8/11/2019 Lecture-36-CS210-2012.pptx
26/42
ConstructingAfromA
Case2:There is nointerval uniquely covered by I
AI
I*
I
D E
A
D E
-
8/11/2019 Lecture-36-CS210-2012.pptx
27/42
ConstructingAfrom A
If there is no interval in D uniquely covered by I, then we define A as
follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). This set is the smaller instance Afor Case 2.
-
8/11/2019 Lecture-36-CS210-2012.pptx
28/42
Theorem1: |Opt(A)| = |Opt(A)| + 1
We shall prove this theorem for case 1 as well as
case 2.
-
8/11/2019 Lecture-36-CS210-2012.pptx
29/42
Case1:There is an interval I Duniquely covered by I|Opt(A)| |Opt(A)| + 1
AI
I*
I
I
AI
I
D E
D E
Now Using Lemma2, it follows
that there is an optimal
solution for Acontaining I.What to add to this solution
to get a solution for A ?We need to add just I to get asolution for A and we are done.
-
8/11/2019 Lecture-36-CS210-2012.pptx
30/42
Case1:There is an interval I uniquely covered by I|Opt(A)| |Opt(A)| - 1
AI
I*
I
I
AI
I
D E
D E
Using Lemma1and Lemma2,
it follows that there is an
optimal solution for A
containingIand I.
We need to just remove I fromthis optimal solution for Ato get
a solution for Aand we are done.
-
8/11/2019 Lecture-36-CS210-2012.pptx
31/42
This finishes the proof of Theoremfor Case 1.
We shall now analyze Case2and prove Theoremfor this case as well.
-
8/11/2019 Lecture-36-CS210-2012.pptx
32/42
Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| + 1
AI
I*
I
A
D E
D E
Consider any optimal solution
for A. Note that this optimalsolution takes care of D and E.
So we just need to take care of intervals
from A which intersect the red line.
These are taken care by adding Ito thissolution. We are done.
-
8/11/2019 Lecture-36-CS210-2012.pptx
33/42
Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| - 1
AI
I*
I
A
D E
D E
Using Lemma1, it follows that
there is an optimal solution
for AcontainingI.
If Iis notin this optimal solution,we can see that removing I fromthis optimal solution gives a valid
solution for A.
So let us consider the case when Iispresent in the optimal solution of A.The problem is that Iis not presentin A, so we need a substitute of I
from A.
Notice that Ican serve the purposeof overlapping of intervals from D
only. So we should search for
substitute for Ifrom Donly.
We replace I by the interval from D whichintersectsthe violet line and has earliest start
time. See the following slide for its justification.
-
8/11/2019 Lecture-36-CS210-2012.pptx
34/42
Letbe the interval in D which intersects the violet vertical line (has finish time greater thanthat of I) and has earlieststart time. It suffices if we can show that every interval of Doverlaps with. We proceed as follows. Consider any intervalin D. There are two cases.
Finish time of is less than that of I. In other words, does not intersects the violetline. In this case, there must be some other interval in Dthat overlaps and intersectsthe violet line (otherwise, would be uniquely covered by I); since start time of is less
than this interval, so is overlapped by as well. Finish time of is more than I. In other words, does intersect the violet line. Hence
overlaps with as well since the latter also intersects the violet line.
Hence if remove I andI from the given optimal solution of A, and addto it, we get asolution for A. Since optimal solution for Ahas to be smaller or equal in size related to thissolution, we get |Opt(A)| |Opt(A)| - 1 for Case 2.
Hence we have proved Theorem1: |Opt(A)| = |Opt(A)| + 1
Now we need to design the algorithm for our problem based on the greedy strategy thatwe used for constructing A from A.
-
8/11/2019 Lecture-36-CS210-2012.pptx
35/42
Simplification and efficient implementation of
the algorithm
Though the algorithm looks quite complex to implement, but as will soon become clear,
it is quite simple to implement. We first introduce some notations to facilitate a clean
representation of the algorithm.
Notations:
f(I): finish time of interval I;
Maxf(I,A): maximum finish time of an interval from Athat overlaps with I. (If no interval
overlaps with I, then Maxf(I,A)=f(I)).
Maxf-Interval(I,A): the interval from A with maximum finish time that overlaps with I. (If no
interval overlaps with I, then Maxf-Interval(I,A)=I).
Cover: set of intervals selected in till now. (At the end of the algo, Notationswill be an optimal
solution)
][: Empty interval.
-
8/11/2019 Lecture-36-CS210-2012.pptx
36/42
Algorithm
I][; Cover ;AA;
WhileA do
{ If(I = ][)
{ let Ibe the interval in A with earliest finish time;
let Imaxf-Interval(I);
CoverCoverU{I};
Imaxf-Interval(I,A);
remove all intervals from Athat are overlapped by I;
}
Else If (there is an interval IAwith maxf(I) < f(I))
{ II;
CoverCoverU{I};
Imaxf-Interval(I,A);
remove all intervals from Athat are overlapped by I;
}
ElseI][;
}
returnCover;
-
8/11/2019 Lecture-36-CS210-2012.pptx
37/42
Algorithm
(further refinements of the same algo)
I][; Cover ;AA;
WhileA do
{ let Ibe the interval in A with earliest finish time;
If(maxf(I) < f(I)) I I;Else Imaxf-Interval(I ,A);
CoverCoverU{I};
Imaxf-Interval(I,A);
remove all intervals from Athat are overlapped by I;
}
returnCover;
It is easy to observe that each iteration of the whileloop can be implemented in O()
time.
-
8/11/2019 Lecture-36-CS210-2012.pptx
38/42
Proof of correctnessfor the algorithm
Though we had derived a proof of correctness while arriving at the algorithm, the same can be
given now as well. This may be helpful if you are not interested in the way we arrived at the
algorithm and are just wish to see the correctness of the algorithm.
Let Overlapped = A\A; In plain words, Overlapped is the set of intervals from A which areoverlapped by some interval from Cover.
In the beginning of an iteration, the following assertions hold:
1. There is an optimal solution for A containing Cover.
2. Every interval from Overlapped is overlapped by an interval from Cover, and Iis an intervalwith maximum finish time from the setOverlapped.
3. Every interval from A has start time greater than finish time of any interval fromCover.
The above assertion can be proved by induction on the number of iterations. The arguments
needed will be a small collection of arguments used for proving Theorem 1.
-
8/11/2019 Lecture-36-CS210-2012.pptx
39/42
Concluding slide for exercise 5
Theorem:
There is an O() time algorithm for computing smallest subset of intervalsoverlapping a given set of intervals.
-
8/11/2019 Lecture-36-CS210-2012.pptx
40/42
Problem 1 of practice sheet 6
Given an array Astoring nelements, and a number k, compute knearest
elements for the median. Time complexity should be O(n).
Hint:Use the following tools.
Divide and conquer strategy like used in problem 2 of the same practice
sheet.
Linear time median finding algorithm.
You need to divide the problem to half the size in each step.
-
8/11/2019 Lecture-36-CS210-2012.pptx
41/42
Firstly, we may prune our search domain from to 2 as follows.
Find median, let it be .
Find element with rank
, let it be . Remove all elements smaller than (justify it).
Find element with rank
+ , let it be . Remove all elements greater than (justify it).
Time spent till now is O().The nearest elements of the median are surely among these remaining 2 elements.
Now find element with rank
, let it be . Find element with rank
, let it be . If is
closer to than , then we can conclude the following:
1. all elements greater than and less than must be among the set of nearest
elements from . These /elements are eliminated from input and added to oursolution.
2. None of the elements which are greater thancan be among the set of nearest
element from . These /elements are also removed from the input.
In this way, we have found /nearest element from . Moreover, the input has reduced
from to . Keep repeating it. We get nearest element from inO() time.
-
8/11/2019 Lecture-36-CS210-2012.pptx
42/42
Finding DFS tree from start and finish time
There was a problem in practice sheet 5 where, given start time and finishtime of DFS traversal for all vertices, the aim is to compute DFNnumber andDFStree.
A few students were facing the problem of determining children of a node in
DFStree. An easy way to achieve this goal is an indirect way:In order to compute children of a vertex in DFStree, it suffices if we cancompute parent of each vertex. We can do the latter task as follows.
Among all vertices neighboring to a vertex u, find all those vertices whosestart time is smaller than that of u. All these vertices are ancestors of u. Whoamong them will be parent of u? Surely, the vertex with maximum start time.
So we can compute parent of vertex u in O(deg(u)) time. Time spent over allvertices will be O(m+n) time. Hence we can compute children of each vertexin DFS tree and hence the entire DFS tree structure in O(m+n)time.