lecture-36-cs210-2012.pptx

8/11/2019 Lecture-36-CS210-2012.pptx

1/42

Data Structures and Algorithms

(CS210/ESO207/ESO211)

Lecture 36

Sorting

beyondO(nlogn) bound

1

8/11/2019 Lecture-36-CS210-2012.pptx

2/42

Overview of todays lecture

The sorting algorithms you studied till now

Integer sorting

Solving 2 problems from Practice sheet 6 and one problem

from Practice sheet 5.

2

8/11/2019 Lecture-36-CS210-2012.pptx

3/42

Sorting algorithms studied till now

8/11/2019 Lecture-36-CS210-2012.pptx

4/42

Algorithms for Sorting nelements

Insertionsort: O()

Selectionsort: O()

Bubblesort: O()

Mergesort: O(log)

Quicksort: worst case O(), average case O(log)

Heap sort: O(log)

Question:What is common among these algorithms ?

Answer: All of them are allowed to use only comparisonoperation toperform sorting.

8/11/2019 Lecture-36-CS210-2012.pptx

5/42

Question:Can we sort in O() time ?

The answerdepends upon

the model of computation.

the domain of input.

Theorem (to be proved in CS345): Every comparison based sortingalgorithm must perform at least O(log)comparisons in the worst case.

8/11/2019 Lecture-36-CS210-2012.pptx

6/42

word RAM model of computation:

Characteristics

Word is the basic storageunit of RAM. Word is a collection of few bytes.

Each input item (number, name) is stored in binary format.

RAM can be viewed as a huge array of words. Any arbitrary location ofRAM can be accessedin the same time irrespectiveof the location.

Data as well as Program reside fully in RAM.

Each arithmetic or logical operation (+,-,*,/,or, xor,) involving a constant

number of words takes a constant number of steps by the CPU.

6

Each arithmetic or logical operation (+,-,*,/,or, xor,) involving O( log n) bitstake a constant number of steps by the CPU, where nis the number of bits of

input instance.

8/11/2019 Lecture-36-CS210-2012.pptx

7/42

Integer sorting

8/11/2019 Lecture-36-CS210-2012.pptx

8/42

Counting sort:algorithm for sorting integers

Input: An array Astoring integers in the range [0 ].

Output:Sorted array A.

Running time: O( + ) in word RAM model of computation.

Extra space: O()

8/11/2019 Lecture-36-CS210-2012.pptx

9/42


A

0 1 2 3 4 5 6 7

Count

0 1 2 3 4 5

2 5 3 0 2 3 0 3

2

2 2 4 7 7 8

0 2 3 0 1

Place

0 1 2 3 4 5

B

0 1 2 3 4 5 6 7

3

8/11/2019 Lecture-36-CS210-2012.pptx

10/42


A

0 1 2 3 4 5 6 7

Count

0 1 2 3 4 5

2 5 3 0 2 3 0 3

2

2 2 4 6 7 8

0 2 3 0 1

Place

0 1 2 3 4 5

B

0 1 2 3 4 5 6 7

0 3

8/11/2019 Lecture-36-CS210-2012.pptx

11/42


A

0 1 2 3 4 5 6 7

Count

0 1 2 3 4 5

2 5 3 0 2 3 0 3

2

1 2 4 6 7 8

0 2 3 0 1

Place

0 1 2 3 4 5

B

0 1 2 3 4 5 6 7

30 3

8/11/2019 Lecture-36-CS210-2012.pptx

12/42


Algorithm (A[... ],)

For=0to do Count[] 0;

For=0to do Count[A[]] Count[A[]] +1;

For=0to do Place[]

Count[];

For=1to do Place[] Place[ ] + Count[];

For= to do

{ B[ ?? ]

A[];Place[A[]] Place[A[]]-1;

}

return B;

Place[A[]]-1

8/11/2019 Lecture-36-CS210-2012.pptx

13/42


Note:The algorithm performs arithmetic operations involving O(log + log)

bits. In word RAM model, it takes O(1) time for such an operation.

Theorem: An array storing integers in the range [.. ]can be sorted inO(+) time and using total O(+) space in word RAM model.

For = O(), we get an optimal algorithm for sorting. But what if is large ?

In the next class:

We shall discuss an algorithm for sorting integers in the range [..] in O()

time and using O() space in word RAMmodel.

8/11/2019 Lecture-36-CS210-2012.pptx

14/42

Practice sheet 6

We shall solve exercises 5and 1 from this sheet

8/11/2019 Lecture-36-CS210-2012.pptx

15/42

Important note

Though the solution is provided for this problem here, one should NOTfeel that such

a problem will be asked in the end sem exam of this course. It was a mistake of the

instructor to put it in the practice sheet.

8/11/2019 Lecture-36-CS210-2012.pptx

16/42

Problem 5 of practice sheet 6.

Description(in terms of interval):

Given a set Aof nintervals, compute smallest set Bof intervals so that for every

interval I in A\B, there is some interval in Bwhich overlaps/intersects with I.

A

The set of green intervals is a solution

but not an optimalsolution.

8/11/2019 Lecture-36-CS210-2012.pptx

17/42

Solution of Problem 5 of practice sheet 6.

Description(in terms of interval):

Given a set Aof nintervals, compute smallest set Bof intervals so that for everyinterval I in A\B, there is some interval in Bwhich overlaps/intersects with I.

Let I* be the interval with earliest finish time.

Let I be the interval with maximumfinish time overlapping I*.

Lemma1:There is an optimal solution for set A that contains I.

A I*I

8/11/2019 Lecture-36-CS210-2012.pptx

18/42

Solution of Problem 5 of practice sheet 6.

Question: How to obtain smaller instance A using this greedy approach ?

Naive approach (again inspired from the job scheduling problem):remove from Aall

intervals which overlap with I. This is A.

This approach does not work! Here is a counterexample.

The problem is that some deleted interval (in this case I) could have been used forintersecting many intervals if it were not deleted. But deleting it from the instance

disallows it to be selected in the solution.

A

II*

I

8/11/2019 Lecture-36-CS210-2012.pptx

19/42

Overview of the approach

In order to make sure we do not delete intervals (like I in the previous slide)if they are essential to be selected to cover many other intervals, we make

some observations and introduce a terminology called Uniquely coveredinterval. It turns out that we need to keep I in the smaller instance if there isan interval there which is uniquely covered by I . Otherwise, we may discardI.

8/11/2019 Lecture-36-CS210-2012.pptx

20/42

An Observation

We can delete all intervals whose finish time is before finish time of Ibecause any intervaloverlapped by such intervals will anyway be overlapped by I. Let us consider intervalswhich overlap with I, but have finish time greater than that of I. In the example shownbelow, these intervals are those three intervals which cross the red line.

Observation1: Among the intervals crossing the red line, we need to keep only that interval

which has maximum finish time. (I in this picture)

Proof: Notice that each of these intervals are anyway intersected by I. As far as using themto intersect other intervals in concerned, we may better choose I for this purpose.

So from now onwards, we shall assume that there is exactly one interval I in Awhich

overlaps I(intersects the red line) and has finish time larger than I.

II*

IA

8/11/2019 Lecture-36-CS210-2012.pptx

21/42

Uniquely covered interval

I2 is said to be uniquely covered by I1if

I2 is fully covered by I1

Every interval overlapping I2 is also full covered by I1.

Lemma2 :There is an optimal solution containing I1.

Proof: Surely I2or some other interval overlapping it must be there in the optimal solution. If

we replace that interval by I1, we still get a solution of the same size and hence an optimal solution.

I2

I1

8/11/2019 Lecture-36-CS210-2012.pptx

22/42

We are now ready to give description/construction of AfromA. There will be

two cases. We shall then prove that |Opt(A)| = |Opt(A)| + 1 for each ofthese cases.

Important note:

The reader is advised to full understand Lemma1, Lemma2, Observation1,

and the notion of Uniquely covered interval. Also fully internalize thenotations I*,I, andI. This will help the reader understand the rest of thesolution.

8/11/2019 Lecture-36-CS210-2012.pptx

23/42

ConstructingAfrom A

8/11/2019 Lecture-36-CS210-2012.pptx

24/42

ConstructingAfrom A

AI

I*

I

I

Case1:There is an interval I D uniquely covered by I

AI

I

D E

We need to take care

of intervals whose

starting point is to

the right of red line

(finish time of I).

We can partition these

intervals into two sets.

D: those which overlap with I.E: those that start after the

end of I and hence do notoverlap with I.

D E

Now we shall describe the two

cases for construction of A.

8/11/2019 Lecture-36-CS210-2012.pptx

25/42

ConstructingAfrom A

If there is an interval I D uniquely covered by I, then we define A as

follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). Now add I to this set. This set isthe smaller instance A for Case 1.

We shall now define A for Case 2.

8/11/2019 Lecture-36-CS210-2012.pptx

26/42

ConstructingAfromA

Case2:There is nointerval uniquely covered by I

AI

I*

I

D E

A

D E

8/11/2019 Lecture-36-CS210-2012.pptx

27/42

ConstructingAfrom A

If there is no interval in D uniquely covered by I, then we define A as

follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). This set is the smaller instance Afor Case 2.

8/11/2019 Lecture-36-CS210-2012.pptx

28/42

Theorem1: |Opt(A)| = |Opt(A)| + 1

We shall prove this theorem for case 1 as well as

case 2.

8/11/2019 Lecture-36-CS210-2012.pptx

29/42

Case1:There is an interval I Duniquely covered by I|Opt(A)| |Opt(A)| + 1

AI

I*

I

I

AI

I

D E

D E

Now Using Lemma2, it follows

that there is an optimal

solution for Acontaining I.What to add to this solution

to get a solution for A ?We need to add just I to get asolution for A and we are done.

8/11/2019 Lecture-36-CS210-2012.pptx

30/42

Case1:There is an interval I uniquely covered by I|Opt(A)| |Opt(A)| - 1

AI

I*

I

I

AI

I

D E

D E

Using Lemma1and Lemma2,

it follows that there is an

optimal solution for A

containingIand I.

We need to just remove I fromthis optimal solution for Ato get

a solution for Aand we are done.

8/11/2019 Lecture-36-CS210-2012.pptx

31/42

This finishes the proof of Theoremfor Case 1.

We shall now analyze Case2and prove Theoremfor this case as well.

8/11/2019 Lecture-36-CS210-2012.pptx

32/42

Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| + 1

AI

I*

I

A

D E

D E

Consider any optimal solution

for A. Note that this optimalsolution takes care of D and E.

So we just need to take care of intervals

from A which intersect the red line.

These are taken care by adding Ito thissolution. We are done.

8/11/2019 Lecture-36-CS210-2012.pptx

33/42

Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| - 1

AI

I*

I

A

D E

D E

Using Lemma1, it follows that

there is an optimal solution

for AcontainingI.

If Iis notin this optimal solution,we can see that removing I fromthis optimal solution gives a valid

solution for A.

So let us consider the case when Iispresent in the optimal solution of A.The problem is that Iis not presentin A, so we need a substitute of I

from A.

Notice that Ican serve the purposeof overlapping of intervals from D

only. So we should search for

substitute for Ifrom Donly.

We replace I by the interval from D whichintersectsthe violet line and has earliest start

time. See the following slide for its justification.

8/11/2019 Lecture-36-CS210-2012.pptx

34/42

Letbe the interval in D which intersects the violet vertical line (has finish time greater thanthat of I) and has earlieststart time. It suffices if we can show that every interval of Doverlaps with. We proceed as follows. Consider any intervalin D. There are two cases.

Finish time of is less than that of I. In other words, does not intersects the violetline. In this case, there must be some other interval in Dthat overlaps and intersectsthe violet line (otherwise, would be uniquely covered by I); since start time of is less

than this interval, so is overlapped by as well. Finish time of is more than I. In other words, does intersect the violet line. Hence

overlaps with as well since the latter also intersects the violet line.

Hence if remove I andI from the given optimal solution of A, and addto it, we get asolution for A. Since optimal solution for Ahas to be smaller or equal in size related to thissolution, we get |Opt(A)| |Opt(A)| - 1 for Case 2.

Hence we have proved Theorem1: |Opt(A)| = |Opt(A)| + 1

Now we need to design the algorithm for our problem based on the greedy strategy thatwe used for constructing A from A.

8/11/2019 Lecture-36-CS210-2012.pptx

35/42

Simplification and efficient implementation of

the algorithm

Though the algorithm looks quite complex to implement, but as will soon become clear,

it is quite simple to implement. We first introduce some notations to facilitate a clean

representation of the algorithm.

Notations:

f(I): finish time of interval I;

Maxf(I,A): maximum finish time of an interval from Athat overlaps with I. (If no interval

overlaps with I, then Maxf(I,A)=f(I)).

Maxf-Interval(I,A): the interval from A with maximum finish time that overlaps with I. (If no

interval overlaps with I, then Maxf-Interval(I,A)=I).

Cover: set of intervals selected in till now. (At the end of the algo, Notationswill be an optimal

solution)

][: Empty interval.

8/11/2019 Lecture-36-CS210-2012.pptx

36/42

Algorithm

I][; Cover ;AA;

WhileA do

{ If(I = ][)

{ let Ibe the interval in A with earliest finish time;

let Imaxf-Interval(I);

CoverCoverU{I};

Imaxf-Interval(I,A);

remove all intervals from Athat are overlapped by I;

}

Else If (there is an interval IAwith maxf(I) < f(I))

{ II;

CoverCoverU{I};



}

ElseI][;

}

returnCover;

8/11/2019 Lecture-36-CS210-2012.pptx

37/42

Algorithm

(further refinements of the same algo)

I][; Cover ;AA;

WhileA do

{ let Ibe the interval in A with earliest finish time;

If(maxf(I) < f(I)) I I;Else Imaxf-Interval(I ,A);

CoverCoverU{I};



}

returnCover;

It is easy to observe that each iteration of the whileloop can be implemented in O()

time.

8/11/2019 Lecture-36-CS210-2012.pptx

38/42

Proof of correctnessfor the algorithm

Though we had derived a proof of correctness while arriving at the algorithm, the same can be

given now as well. This may be helpful if you are not interested in the way we arrived at the

algorithm and are just wish to see the correctness of the algorithm.

Let Overlapped = A\A; In plain words, Overlapped is the set of intervals from A which areoverlapped by some interval from Cover.

In the beginning of an iteration, the following assertions hold:

1. There is an optimal solution for A containing Cover.

2. Every interval from Overlapped is overlapped by an interval from Cover, and Iis an intervalwith maximum finish time from the setOverlapped.

3. Every interval from A has start time greater than finish time of any interval fromCover.

The above assertion can be proved by induction on the number of iterations. The arguments

needed will be a small collection of arguments used for proving Theorem 1.

8/11/2019 Lecture-36-CS210-2012.pptx

39/42

Concluding slide for exercise 5

Theorem:

There is an O() time algorithm for computing smallest subset of intervalsoverlapping a given set of intervals.

8/11/2019 Lecture-36-CS210-2012.pptx

40/42

Problem 1 of practice sheet 6

Given an array Astoring nelements, and a number k, compute knearest

elements for the median. Time complexity should be O(n).

Hint:Use the following tools.

Divide and conquer strategy like used in problem 2 of the same practice

sheet.

Linear time median finding algorithm.

You need to divide the problem to half the size in each step.

8/11/2019 Lecture-36-CS210-2012.pptx

41/42

Firstly, we may prune our search domain from to 2 as follows.

Find median, let it be .

Find element with rank

, let it be . Remove all elements smaller than (justify it).

Find element with rank

+ , let it be . Remove all elements greater than (justify it).

Time spent till now is O().The nearest elements of the median are surely among these remaining 2 elements.

Now find element with rank

, let it be . Find element with rank

, let it be . If is

closer to than , then we can conclude the following:

1. all elements greater than and less than must be among the set of nearest

elements from . These /elements are eliminated from input and added to oursolution.

2. None of the elements which are greater thancan be among the set of nearest

element from . These /elements are also removed from the input.

In this way, we have found /nearest element from . Moreover, the input has reduced

from to . Keep repeating it. We get nearest element from inO() time.

8/11/2019 Lecture-36-CS210-2012.pptx

42/42

Finding DFS tree from start and finish time

There was a problem in practice sheet 5 where, given start time and finishtime of DFS traversal for all vertices, the aim is to compute DFNnumber andDFStree.

A few students were facing the problem of determining children of a node in

DFStree. An easy way to achieve this goal is an indirect way:In order to compute children of a vertex in DFStree, it suffices if we cancompute parent of each vertex. We can do the latter task as follows.

Among all vertices neighboring to a vertex u, find all those vertices whosestart time is smaller than that of u. All these vertices are ancestors of u. Whoamong them will be parent of u? Surely, the vertex with maximum start time.

So we can compute parent of vertex u in O(deg(u)) time. Time spent over allvertices will be O(m+n) time. Hence we can compute children of each vertexin DFS tree and hence the entire DFS tree structure in O(m+n)time.

lecture-36-cs210-2012.pptx

Documents