lecture-36-cs210-2012.pptx

Upload: moazzam-hussain

Post on 03-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    1/42

    Data Structures and Algorithms

    (CS210/ESO207/ESO211)

    Lecture 36

    Sorting

    beyondO(nlogn) bound

    1

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    2/42

    Overview of todays lecture

    The sorting algorithms you studied till now

    Integer sorting

    Solving 2 problems from Practice sheet 6 and one problem

    from Practice sheet 5.

    2

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    3/42

    Sorting algorithms studied till now

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    4/42

    Algorithms for Sorting nelements

    Insertionsort: O()

    Selectionsort: O()

    Bubblesort: O()

    Mergesort: O(log)

    Quicksort: worst case O(), average case O(log)

    Heap sort: O(log)

    Question:What is common among these algorithms ?

    Answer: All of them are allowed to use only comparisonoperation toperform sorting.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    5/42

    Question:Can we sort in O() time ?

    The answerdepends upon

    the model of computation.

    the domain of input.

    Theorem (to be proved in CS345): Every comparison based sortingalgorithm must perform at least O(log)comparisons in the worst case.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    6/42

    word RAM model of computation:

    Characteristics

    Word is the basic storageunit of RAM. Word is a collection of few bytes.

    Each input item (number, name) is stored in binary format.

    RAM can be viewed as a huge array of words. Any arbitrary location ofRAM can be accessedin the same time irrespectiveof the location.

    Data as well as Program reside fully in RAM.

    Each arithmetic or logical operation (+,-,*,/,or, xor,) involving a constant

    number of words takes a constant number of steps by the CPU.

    6

    Each arithmetic or logical operation (+,-,*,/,or, xor,) involving O( log n) bitstake a constant number of steps by the CPU, where nis the number of bits of

    input instance.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    7/42

    Integer sorting

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    8/42

    Counting sort:algorithm for sorting integers

    Input: An array Astoring integers in the range [0 ].

    Output:Sorted array A.

    Running time: O( + ) in word RAM model of computation.

    Extra space: O()

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    9/42

    Counting sort:algorithm for sorting integers

    A

    0 1 2 3 4 5 6 7

    Count

    0 1 2 3 4 5

    2 5 3 0 2 3 0 3

    2

    2 2 4 7 7 8

    0 2 3 0 1

    Place

    0 1 2 3 4 5

    B

    0 1 2 3 4 5 6 7

    3

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    10/42

    Counting sort:algorithm for sorting integers

    A

    0 1 2 3 4 5 6 7

    Count

    0 1 2 3 4 5

    2 5 3 0 2 3 0 3

    2

    2 2 4 6 7 8

    0 2 3 0 1

    Place

    0 1 2 3 4 5

    B

    0 1 2 3 4 5 6 7

    0 3

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    11/42

    Counting sort:algorithm for sorting integers

    A

    0 1 2 3 4 5 6 7

    Count

    0 1 2 3 4 5

    2 5 3 0 2 3 0 3

    2

    1 2 4 6 7 8

    0 2 3 0 1

    Place

    0 1 2 3 4 5

    B

    0 1 2 3 4 5 6 7

    30 3

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    12/42

    Counting sort:algorithm for sorting integers

    Algorithm (A[... ],)

    For=0to do Count[] 0;

    For=0to do Count[A[]] Count[A[]] +1;

    For=0to do Place[]

    Count[];

    For=1to do Place[] Place[ ] + Count[];

    For= to do

    { B[ ?? ]

    A[];Place[A[]] Place[A[]]-1;

    }

    return B;

    Place[A[]]-1

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    13/42

    Counting sort:algorithm for sorting integers

    Note:The algorithm performs arithmetic operations involving O(log + log)

    bits. In word RAM model, it takes O(1) time for such an operation.

    Theorem: An array storing integers in the range [.. ]can be sorted inO(+) time and using total O(+) space in word RAM model.

    For = O(), we get an optimal algorithm for sorting. But what if is large ?

    In the next class:

    We shall discuss an algorithm for sorting integers in the range [..] in O()

    time and using O() space in word RAMmodel.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    14/42

    Practice sheet 6

    We shall solve exercises 5and 1 from this sheet

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    15/42

    Important note

    Though the solution is provided for this problem here, one should NOTfeel that such

    a problem will be asked in the end sem exam of this course. It was a mistake of the

    instructor to put it in the practice sheet.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    16/42

    Problem 5 of practice sheet 6.

    Description(in terms of interval):

    Given a set Aof nintervals, compute smallest set Bof intervals so that for every

    interval I in A\B, there is some interval in Bwhich overlaps/intersects with I.

    A

    The set of green intervals is a solution

    but not an optimalsolution.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    17/42

    Solution of Problem 5 of practice sheet 6.

    Description(in terms of interval):

    Given a set Aof nintervals, compute smallest set Bof intervals so that for everyinterval I in A\B, there is some interval in Bwhich overlaps/intersects with I.

    Let I* be the interval with earliest finish time.

    Let I be the interval with maximumfinish time overlapping I*.

    Lemma1:There is an optimal solution for set A that contains I.

    A I*I

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    18/42

    Solution of Problem 5 of practice sheet 6.

    Question: How to obtain smaller instance A using this greedy approach ?

    Naive approach (again inspired from the job scheduling problem):remove from Aall

    intervals which overlap with I. This is A.

    This approach does not work! Here is a counterexample.

    The problem is that some deleted interval (in this case I) could have been used forintersecting many intervals if it were not deleted. But deleting it from the instance

    disallows it to be selected in the solution.

    A

    II*

    I

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    19/42

    Overview of the approach

    In order to make sure we do not delete intervals (like I in the previous slide)if they are essential to be selected to cover many other intervals, we make

    some observations and introduce a terminology called Uniquely coveredinterval. It turns out that we need to keep I in the smaller instance if there isan interval there which is uniquely covered by I . Otherwise, we may discardI.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    20/42

    An Observation

    We can delete all intervals whose finish time is before finish time of Ibecause any intervaloverlapped by such intervals will anyway be overlapped by I. Let us consider intervalswhich overlap with I, but have finish time greater than that of I. In the example shownbelow, these intervals are those three intervals which cross the red line.

    Observation1: Among the intervals crossing the red line, we need to keep only that interval

    which has maximum finish time. (I in this picture)

    Proof: Notice that each of these intervals are anyway intersected by I. As far as using themto intersect other intervals in concerned, we may better choose I for this purpose.

    So from now onwards, we shall assume that there is exactly one interval I in Awhich

    overlaps I(intersects the red line) and has finish time larger than I.

    II*

    IA

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    21/42

    Uniquely covered interval

    I2 is said to be uniquely covered by I1if

    I2 is fully covered by I1

    Every interval overlapping I2 is also full covered by I1.

    Lemma2 :There is an optimal solution containing I1.

    Proof: Surely I2or some other interval overlapping it must be there in the optimal solution. If

    we replace that interval by I1, we still get a solution of the same size and hence an optimal solution.

    I2

    I1

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    22/42

    We are now ready to give description/construction of AfromA. There will be

    two cases. We shall then prove that |Opt(A)| = |Opt(A)| + 1 for each ofthese cases.

    Important note:

    The reader is advised to full understand Lemma1, Lemma2, Observation1,

    and the notion of Uniquely covered interval. Also fully internalize thenotations I*,I, andI. This will help the reader understand the rest of thesolution.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    23/42

    ConstructingAfrom A

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    24/42

    ConstructingAfrom A

    AI

    I*

    I

    I

    Case1:There is an interval I D uniquely covered by I

    AI

    I

    D E

    We need to take care

    of intervals whose

    starting point is to

    the right of red line

    (finish time of I).

    We can partition these

    intervals into two sets.

    D: those which overlap with I.E: those that start after the

    end of I and hence do notoverlap with I.

    D E

    Now we shall describe the two

    cases for construction of A.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    25/42

    ConstructingAfrom A

    If there is an interval I D uniquely covered by I, then we define A as

    follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). Now add I to this set. This set isthe smaller instance A for Case 1.

    We shall now define A for Case 2.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    26/42

    ConstructingAfromA

    Case2:There is nointerval uniquely covered by I

    AI

    I*

    I

    D E

    A

    D E

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    27/42

    ConstructingAfrom A

    If there is no interval in D uniquely covered by I, then we define A as

    follows. Remove all intervals from Awhich overlap with I(this was our usualway of defining A in our wrong solution). This set is the smaller instance Afor Case 2.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    28/42

    Theorem1: |Opt(A)| = |Opt(A)| + 1

    We shall prove this theorem for case 1 as well as

    case 2.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    29/42

    Case1:There is an interval I Duniquely covered by I|Opt(A)| |Opt(A)| + 1

    AI

    I*

    I

    I

    AI

    I

    D E

    D E

    Now Using Lemma2, it follows

    that there is an optimal

    solution for Acontaining I.What to add to this solution

    to get a solution for A ?We need to add just I to get asolution for A and we are done.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    30/42

    Case1:There is an interval I uniquely covered by I|Opt(A)| |Opt(A)| - 1

    AI

    I*

    I

    I

    AI

    I

    D E

    D E

    Using Lemma1and Lemma2,

    it follows that there is an

    optimal solution for A

    containingIand I.

    We need to just remove I fromthis optimal solution for Ato get

    a solution for Aand we are done.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    31/42

    This finishes the proof of Theoremfor Case 1.

    We shall now analyze Case2and prove Theoremfor this case as well.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    32/42

    Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| + 1

    AI

    I*

    I

    A

    D E

    D E

    Consider any optimal solution

    for A. Note that this optimalsolution takes care of D and E.

    So we just need to take care of intervals

    from A which intersect the red line.

    These are taken care by adding Ito thissolution. We are done.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    33/42

    Case2:There is nointerval uniquely covered by I|Opt(A)| |Opt(A)| - 1

    AI

    I*

    I

    A

    D E

    D E

    Using Lemma1, it follows that

    there is an optimal solution

    for AcontainingI.

    If Iis notin this optimal solution,we can see that removing I fromthis optimal solution gives a valid

    solution for A.

    So let us consider the case when Iispresent in the optimal solution of A.The problem is that Iis not presentin A, so we need a substitute of I

    from A.

    Notice that Ican serve the purposeof overlapping of intervals from D

    only. So we should search for

    substitute for Ifrom Donly.

    We replace I by the interval from D whichintersectsthe violet line and has earliest start

    time. See the following slide for its justification.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    34/42

    Letbe the interval in D which intersects the violet vertical line (has finish time greater thanthat of I) and has earlieststart time. It suffices if we can show that every interval of Doverlaps with. We proceed as follows. Consider any intervalin D. There are two cases.

    Finish time of is less than that of I. In other words, does not intersects the violetline. In this case, there must be some other interval in Dthat overlaps and intersectsthe violet line (otherwise, would be uniquely covered by I); since start time of is less

    than this interval, so is overlapped by as well. Finish time of is more than I. In other words, does intersect the violet line. Hence

    overlaps with as well since the latter also intersects the violet line.

    Hence if remove I andI from the given optimal solution of A, and addto it, we get asolution for A. Since optimal solution for Ahas to be smaller or equal in size related to thissolution, we get |Opt(A)| |Opt(A)| - 1 for Case 2.

    Hence we have proved Theorem1: |Opt(A)| = |Opt(A)| + 1

    Now we need to design the algorithm for our problem based on the greedy strategy thatwe used for constructing A from A.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    35/42

    Simplification and efficient implementation of

    the algorithm

    Though the algorithm looks quite complex to implement, but as will soon become clear,

    it is quite simple to implement. We first introduce some notations to facilitate a clean

    representation of the algorithm.

    Notations:

    f(I): finish time of interval I;

    Maxf(I,A): maximum finish time of an interval from Athat overlaps with I. (If no interval

    overlaps with I, then Maxf(I,A)=f(I)).

    Maxf-Interval(I,A): the interval from A with maximum finish time that overlaps with I. (If no

    interval overlaps with I, then Maxf-Interval(I,A)=I).

    Cover: set of intervals selected in till now. (At the end of the algo, Notationswill be an optimal

    solution)

    ][: Empty interval.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    36/42

    Algorithm

    I][; Cover ;AA;

    WhileA do

    { If(I = ][)

    { let Ibe the interval in A with earliest finish time;

    let Imaxf-Interval(I);

    CoverCoverU{I};

    Imaxf-Interval(I,A);

    remove all intervals from Athat are overlapped by I;

    }

    Else If (there is an interval IAwith maxf(I) < f(I))

    { II;

    CoverCoverU{I};

    Imaxf-Interval(I,A);

    remove all intervals from Athat are overlapped by I;

    }

    ElseI][;

    }

    returnCover;

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    37/42

    Algorithm

    (further refinements of the same algo)

    I][; Cover ;AA;

    WhileA do

    { let Ibe the interval in A with earliest finish time;

    If(maxf(I) < f(I)) I I;Else Imaxf-Interval(I ,A);

    CoverCoverU{I};

    Imaxf-Interval(I,A);

    remove all intervals from Athat are overlapped by I;

    }

    returnCover;

    It is easy to observe that each iteration of the whileloop can be implemented in O()

    time.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    38/42

    Proof of correctnessfor the algorithm

    Though we had derived a proof of correctness while arriving at the algorithm, the same can be

    given now as well. This may be helpful if you are not interested in the way we arrived at the

    algorithm and are just wish to see the correctness of the algorithm.

    Let Overlapped = A\A; In plain words, Overlapped is the set of intervals from A which areoverlapped by some interval from Cover.

    In the beginning of an iteration, the following assertions hold:

    1. There is an optimal solution for A containing Cover.

    2. Every interval from Overlapped is overlapped by an interval from Cover, and Iis an intervalwith maximum finish time from the setOverlapped.

    3. Every interval from A has start time greater than finish time of any interval fromCover.

    The above assertion can be proved by induction on the number of iterations. The arguments

    needed will be a small collection of arguments used for proving Theorem 1.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    39/42

    Concluding slide for exercise 5

    Theorem:

    There is an O() time algorithm for computing smallest subset of intervalsoverlapping a given set of intervals.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    40/42

    Problem 1 of practice sheet 6

    Given an array Astoring nelements, and a number k, compute knearest

    elements for the median. Time complexity should be O(n).

    Hint:Use the following tools.

    Divide and conquer strategy like used in problem 2 of the same practice

    sheet.

    Linear time median finding algorithm.

    You need to divide the problem to half the size in each step.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    41/42

    Firstly, we may prune our search domain from to 2 as follows.

    Find median, let it be .

    Find element with rank

    , let it be . Remove all elements smaller than (justify it).

    Find element with rank

    + , let it be . Remove all elements greater than (justify it).

    Time spent till now is O().The nearest elements of the median are surely among these remaining 2 elements.

    Now find element with rank

    , let it be . Find element with rank

    , let it be . If is

    closer to than , then we can conclude the following:

    1. all elements greater than and less than must be among the set of nearest

    elements from . These /elements are eliminated from input and added to oursolution.

    2. None of the elements which are greater thancan be among the set of nearest

    element from . These /elements are also removed from the input.

    In this way, we have found /nearest element from . Moreover, the input has reduced

    from to . Keep repeating it. We get nearest element from inO() time.

  • 8/11/2019 Lecture-36-CS210-2012.pptx

    42/42

    Finding DFS tree from start and finish time

    There was a problem in practice sheet 5 where, given start time and finishtime of DFS traversal for all vertices, the aim is to compute DFNnumber andDFStree.

    A few students were facing the problem of determining children of a node in

    DFStree. An easy way to achieve this goal is an indirect way:In order to compute children of a vertex in DFStree, it suffices if we cancompute parent of each vertex. We can do the latter task as follows.

    Among all vertices neighboring to a vertex u, find all those vertices whosestart time is smaller than that of u. All these vertices are ancestors of u. Whoamong them will be parent of u? Surely, the vertex with maximum start time.

    So we can compute parent of vertex u in O(deg(u)) time. Time spent over allvertices will be O(m+n) time. Hence we can compute children of each vertexin DFS tree and hence the entire DFS tree structure in O(m+n)time.