data structure

73
Data Structure Spring Semester 2012 School of Computer Science & Engineering Chung-Ang University 1 Chung-Ang University Spring 2012

Upload: avarielle-smith

Post on 03-Jan-2016

18 views

Category:

Documents


4 download

DESCRIPTION

Data Structure. Spring Semester 2012 School of Computer Science & Engineering Chung-Ang University. Sang Yong Han (Professor) Office hour: Tuesday 14:00 – 14:50 eMail: [email protected]. Administrative Matters. http://ec.cse.cau.ac.kr (web site). Teaching Assistant - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Structure

Data Structure

Spring Semester 2012

School of Computer Science & Engineering

Chung-Ang University

1Chung-Ang University Spring 2012

Page 2: Data Structure

Administrative Matters

http://ec.cse.cau.ac.kr (web site)

Teaching Assistant 최승진 (02-824-1187, 010-9071-7598)

Sang Yong Han (Professor) Office hour: Tuesday 14:00 – 14:50 eMail: [email protected]

Chung-Ang University Spring 2012 2

Page 3: Data Structure

Grading

10% - Class attitude and attendance

25% - Assignments

65% - Mid and Final Exams

Chung-Ang University Spring 2012 3

Page 4: Data Structure

Administrative Matters

1. 지각 2 번은 결석 1 번과 동일하게 처리2. Break Time 이후의 출석은 결석으로 처리3. Due date 를 넘긴 과제물은 받지 않습니다 .

4Chung-Ang University Spring 2012

Page 5: Data Structure

Prerequisites

C

Chung-Ang University Spring 2012 5

Page 6: Data Structure

Tentative Class Schedule

1. Week 1 – course introduction2. Week 2 – Performance Analysis 3. Week 3 – Array and Stack 4. Week 4 – Stack 5. Week 5 – Queue 6. Week 6 – Linked List 7. Week 7 - Review 8. Midterm

6Chung-Ang University Spring 2012

Page 7: Data Structure

Tentative Class Schedule (Cont.)

1. Week 9 – Tree (binary tree, etc)2. Week 10 – Tree (Heap, etc) 3. Week 11 – Graph (ADT, etc) 4. Week 12 – Graph (BFS, DFS, etc) 5. Week 13 – Graph6. Week 14 – Graph (maybe on Saturday)7. Week 15 - Review 8. Week 16 - Final Exam

7Chung-Ang University Spring 2012

Page 8: Data Structure

What The Course Is About

• We shall study ways to represent data and algorithms to manipulate these representations.

• The study of data structures is fundamental to Computer Science & Engineering.

Chung-Ang University Spring 2012 8

Page 9: Data Structure

What The Course Is About

Data structures is concerned with the representation and manipulation of data.All programs manipulate data.So, all programs represent data in some way.Data manipulation requires an algorithm.

Chung-Ang University Spring 2012 9

Page 10: Data Structure

Data Structures and Algorithms

Algorithm: Outline the essence of a computational procedureProgram: an implementation of an algorithm in some programming languageData Structure: Organization of data needed to solve the problem

Chung-Ang University Spring 2012

Algorithm ProgramData

Structure

10

Page 11: Data Structure

Algorithmic Problem

Infinite number of input instances satisfying the specifications. For eg: A sorted, non-decreasing sequence of natural numbers of non-zero, finite length

1, 20, 908, 909, 10000, 20000

Specification of input

Specification of output as a function of input

11Chung-Ang University Spring 2012

Page 12: Data Structure

Algorithmic Solution

Algorithm describes actions on the input instance

Infinitely many correct algorithms for the same algorithmic problem

Chung-Ang University Spring 2012 12

Page 13: Data Structure

What is a Good Algorithm ?

Efficient:Running Time

Space Used

Data Structure: Organization of data needed to solve the problem

Chung-Ang University Spring 2012 13

Page 14: Data Structure

Performance Analysis

14Chung-Ang University Spring 2012

Page 15: Data Structure

Problem Solving: Main Steps

1. Problem definition2. Algorithm design / Algorithm

specification3. Algorithm analysis4. Implementation5. Testing6. [Maintenance]

15Chung-Ang University Spring 2012

Page 16: Data Structure

1. Problem Definition

What is the task to be accomplished?Calculate the average of the grades for a given studentUnderstand the talks given out by politicians and translate them in Chinese

What are the time / space / performance requirements ?

16Chung-Ang University Spring 2012

Page 17: Data Structure

2. Algorithm Design / Specifications

Algorithm: Finite set of instructions that, if followed, accomplishes a particular task.Describe: in natural language / pseudo-code / diagrams / etc. Criteria to follow:

Input: Zero or more quantities (externally produced)Output: One or more quantities Definiteness: Clarity, precision of each instructionFiniteness: The algorithm has to stop after a finite (may be very large) number of stepsEffectiveness: Each instruction has to be basic enough and feasible

17Chung-Ang University Spring 2012

Page 18: Data Structure

4,5,6: Implementation, Testing, Maintainance

ImplementationDecide on the programming language to use

• C, C++, Lisp, Java, Perl, Prolog, assembly, etc. , etc.

Write clean, well documented code

Test, test, test

Integrate feedback from users, fix bugs, ensure compatibility across different versions Maintenance

18Chung-Ang University Spring 2012

Page 19: Data Structure

3. Algorithm Analysis

Space complexityHow much space is required

Time complexityHow much time does it take to run the algorithm

Often, we deal with estimates!

19Chung-Ang University Spring 2012

Page 20: Data Structure

Space Complexity

Space complexity = The amount of memory required by an algorithm to run to completion

Some algorithms may be more efficient if data completely loaded into memory

Need to look also at system limitationsE.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters, etc.] – can I afford to load the entire collection?

20Chung-Ang University Spring 2012

Page 21: Data Structure

Space Complexity (cont’d)

1. Fixed part: The size required to store certain data/variables, that is independent of the size of the problem:

2. Variable part: Space needed by variables, whose size is dependent on the size of the problem:

21Chung-Ang University Spring 2012

Page 22: Data Structure

Space Complexity (cont’d)

S(P) = c + S(instance characteristics)c = constant

Example:void float sum (float* a, int n) {

float s = 0; for(int i = 0; i<n; i++) { s+ = a[i]; } return s;}Space? one word for n, one for a [passed by reference!],

one for i constant space!

22Chung-Ang University Spring 2012

Page 23: Data Structure

Time Complexity

Often more important than space complexityspace available (for computer programs!) tends to be larger and largertime is still a problem for all of us

3-4GHz processors on the market still … researchers estimate that the computation of various transformations for 1 single DNA chain for one single protein on 1 TerraHZ computer would take about 1 year to run to completion

Algorithms running time is an important issue

23Chung-Ang University Spring 2012

Page 24: Data Structure

Experimental Approach

Write a program that implements the algorithmRun the program with data sets of varying size.Determine the actual running time using a system call to measure time (e.g. system (date) );

Problems?

24Chung-Ang University Spring 2012

Page 25: Data Structure

Experimental Approach

It is necessary to implement and test the algorithm in order to determine its running time. Experiments can be done only on a limited set of inputs, and may not be indicative of the running time for other inputs. The same hardware and software should be used in order to compare two algorithms. – condition very hard to achieve!

25Chung-Ang University Spring 2012

Page 26: Data Structure

Use a Theoretical Approach

Based on high-level description of the algorithms, rather than language dependent implementationsMakes possible an evaluation of the algorithms that is independent of the hardware and software environments

Generality

26Chung-Ang University Spring 2012

Page 27: Data Structure

Some Uses of a Theoretical Approach

determine practicality of algorithm

predict run time on large instance

compare 2 algorithms that have different asymptotic complexitye.g., O(n) and O(n2)

27Chung-Ang University Spring 2012

Page 28: Data Structure

Algorithm DescriptionHow to describe algorithms independent of a programming language Pseudo-Code = a description of an algorithm that is

more structured than usual prose but less formal than a programming language

(Or diagrams)Example: find the maximum element of an array.Algorithm arrayMax(A, n):

Input: An array A storing n integers.Output: The maximum element in A.currentMax A[0]for i 1 to n -1 do

if currentMax < A[i] then currentMax A[i]return currentMax

28Chung-Ang University Spring 2012

Page 29: Data Structure

Pseudo CodeExpressions: use standard mathematical symbols

use for assignment ( ? in C/C++)use = for the equality relationship (? in C/C++)

Method Declarations: -Algorithm name(param1, param2) Programming Constructs:

decision structures: if ... then ... [else ..]while-loops while ... do repeat-loops: repeat ... until ... for-loop: for ... do array indexing: A[i]

Methodscalls: object method(args)returns: return value

Use commentsInstructions have to be basic enough and feasible!

29Chung-Ang University Spring 2012

Page 30: Data Structure

Low Level Algorithm Analysis

Based on primitive operations (low-level computations independent from the programming language)E.g.:

Make an addition = 1 operationCalling a method or returning from a method = 1 operationIndex in an array = 1 operationComparison = 1 operation etc.

Method: Inspect the pseudo-code and count the number of primitive operations executed by the algorithm

30Chung-Ang University Spring 2012

Page 31: Data Structure

Example

Algorithm arrayMax(A, n):Input: An array A storing n integers.Output: The maximum element in A.

currentMax A[0]for i 1 to n -1 doif currentMax < A[i] then

currentMax A[i]return currentMax

How many operations ?

31Chung-Ang University Spring 2012

Page 32: Data Structure

Sorting

Rearrange a[0], a[1], …, a[n-1] into ascending order. When done, a[0] <= a[1] <= … <= a[n-1]8, 6, 9, 4, 3 => 3, 4, 6, 8, 9

32Chung-Ang University Spring 2012

Page 33: Data Structure

Sort Methods

Insertion SortBubble SortSelection SortCount SortShaker SortShell SortHeap SortMerge SortQuick Sort

33Chung-Ang University Spring 2012

Page 34: Data Structure

Insert An Element

Given a sorted list/sequence, insert a new elementGiven 3, 6, 9, 14Insert 5Result 3, 5, 6, 9, 14

34Chung-Ang University Spring 2012

Page 35: Data Structure

Insert an Element

3, 6, 9, 14 insert 5Compare new element (5) and last one (14)Shift 14 right to get 3, 6, 9, , 14Shift 9 right to get 3, 6, , 9, 14Shift 6 right to get 3, , 6, 9, 14Insert 5 to get 3, 5, 6, 9, 14

35Chung-Ang University Spring 2012

Page 36: Data Structure

Insert An Element

/* insert t into a[0:i-1] */int j;for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j];a[j + 1] = t;

36Chung-Ang University Spring 2012

Page 37: Data Structure

Insertion Sort

Start with a sequence of size 1Repeatedly insert remaining elements

37Chung-Ang University Spring 2012

Page 38: Data Structure

Insertion Sort

Sort 7, 3, 5, 6, 1Start with 7 and insert 3 => 3, 7Insert 5 => 3, 5, 7Insert 6 => 3, 5, 6, 7Insert 1 => 1, 3, 5, 6, 7

38Chung-Ang University Spring 2012

Page 39: Data Structure

Insertion Sort

for (i = 1; i < n; i++){/* insert a[i] into a[0:i-1] */ /* code to insert comes here */}

39Chung-Ang University Spring 2012

Page 40: Data Structure

Insertion Sort

for (i = 1; i < n; i++){/* insert a[i] into a[0:i-1] */ int t = a[i]; int j; for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j]; a[j + 1] = t;}

40Chung-Ang University Spring 2012

Page 41: Data Structure

Complexity

Space/MemoryTime Count a particular operation Count number of steps Asymptotic complexity

41Chung-Ang University Spring 2012

Page 42: Data Structure

Comparison Count

for (i = 1; i < n; i++){/* insert a[i] into a[0:i-1] */ int t = a[i]; int j; for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j]; a[j + 1] = t;}

42Chung-Ang University Spring 2012

Page 43: Data Structure

Comparison Count

Pick an instance characteristic … n, n = a.length for insertion sortDetermine count as a function of this instance characteristic.

43Chung-Ang University Spring 2012

Page 44: Data Structure

Comparison Count

for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j];

How many comparisons are made?

44Chung-Ang University Spring 2012

Page 45: Data Structure

Comparison Count

for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j];

number of compares depends on a[]s and t as well as on i

45Chung-Ang University Spring 2012

Page 46: Data Structure

Comparison Count

Worst-case count = maximum countBest-case count = minimum countAverage count

46Chung-Ang University Spring 2012

Page 47: Data Structure

Worst-Case Comparison Count

for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j];

a = [1, 2, 3, 4] and t = 0 => 4 compares

a = [1,2,3,…,i] and t = 0 => i compares

47Chung-Ang University Spring 2012

Page 48: Data Structure

Worst-Case Comparison Count

for (i = 1; i < n; i++) for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j];

total compares = 1 + 2 + 3 + … + (n-1)

= (n-1)n/2

48Chung-Ang University Spring 2012

Page 49: Data Structure

Step Count

A step is an amount of computing that does not depend on the instance characteristic n

10 adds, 100 subtracts, 1000 multipliescan all be counted as a single step

n adds cannot be counted as 1 step

49Chung-Ang University Spring 2012

Page 50: Data Structure

Step Count

Step Count

for (i = 1; i < n; i++) {/* insert a[i] into a[0:i-1] */ int t = a[i]; int j; for (j = i - 1; j >= 0 && t < a[j]; j--) a[j + 1] = a[j]; a[j + 1] = t;}

50Chung-Ang University Spring 2012

Page 51: Data Structure

Step Count

Step count isn’t always 0 or 1

x = sum(a, n);

where n is the instance characteristic and sum adds a[0:n-1] has a s/e count of n

51Chung-Ang University Spring 2012

Page 52: Data Structure

Asymptotic Complexity of Insertion Sort

O(n2)What does this mean?

52Chung-Ang University Spring 2012

Page 53: Data Structure

Complexity of Insertion Sort

Time or number of operations does not exceed c.n2 on any input of size n (n suitably large).Actually, the worst-case time is (n2) and the best-case is (n)So, the worst-case time is expected to quadruple each time n is doubled

53Chung-Ang University Spring 2012

Page 54: Data Structure

Complexity of Insertion Sort

Is O(n2) too much time?Is the algorithm practical?

54Chung-Ang University Spring 2012

Page 55: Data Structure

Some Numbers

log n n n log n n2 n3 2n

0 1 0 1 1 21 2 2 4 8 42 4 8 16 64 163 8 24 64 512 2564 16 64 256 4096 655365 32 160 1024 32768 4294967296

55Chung-Ang University Spring 2012

Page 56: Data Structure

Practical Complexities109 instructions/second

n n nlogn n2 n3

1000 1mic 10mic 1milli 1sec

10000 10mic 130mic 100milli 17min

106 1milli 20milli 17min 32years

56Chung-Ang University Spring 2012

Page 57: Data Structure

Impractical Complexities109 instructions/second

n n4 n10 2n

1000 17min 3.2 x 1013 years

3.2 x 10283 years

10000

116 days

??? ???

106 3 x 107 years

?????? ??????

57Chung-Ang University Spring 2012

Page 58: Data Structure

Faster Computer Vs Better Algorithm

Algorithmic improvement more usefulthan hardware improvement.

E.g. 2n to n3

58Chung-Ang University Spring 2012

Page 59: Data Structure

Limitation of Analysis

• Doesn’t account for constant factors.

• but constant factor may dominate 1000n vs n2

• and we are interested only in n < 1000

59Chung-Ang University Spring 2012

Page 60: Data Structure

Asymptotic Notation

Need to abstract furtherGive an “idea” of how the algorithm performsn steps vs. n+5 stepsn steps vs. n2 steps

60Chung-Ang University Spring 2012

Page 61: Data Structure

Asymptotic Notation

Goal: to simplify analysis by getting rid of unneeded information (like “rounding” 1,000,001≈1,000,000)We want to say in a formal way 3n2 ≈ n2

The “Big-Oh” Notation:given functions f(n) and g(n), we say that f(n) is O(g(n)) if and only if there are positive constants c and n0 such that f(n)≤ c g(n) for n ≥ n0

61Chung-Ang University Spring 2012

Page 62: Data Structure

Graphic Illustration

f(n) = 2n+6Conf. def:

Need to find a function g(n) and a const. c such as f(n) < cg(n)

g(n) = n and c = 4 f(n) is O(n)The order of f(n) is n

g(n) n

c g(n) 4n

n

f(n) = 2n + 6

62Chung-Ang University Spring 2012

Page 63: Data Structure

More examples

What about f(n) = 4n2 ? Is it O(n)?Find a c such that 4n2 < cn for any n > n0

50n3 + 20n + 4 is O(n3)Would be correct to say is O(n3+n)

• Not useful, as n3 exceeds by far n, for large values

Would be correct to say is O(n5)• OK, but g(n) should be as closed as possible to

f(n)

3log(n) + log (log (n)) = O( ? ) •Simple Rule: Drop lower order terms and constant factors

63Chung-Ang University Spring 2012

Page 64: Data Structure

Properties of Big-Oh

If f(n) is O(g(n)) then af(n) is O(g(n)) for any a.If f(n) is O(g(n)) and h(n) is O(g’(n)) then f(n)+h(n) is O(g(n)+g’(n))If f(n) is O(g(n)) and h(n) is O(g’(n)) then f(n)h(n) is O(g(n)g’(n))If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n))If f(n) is a polynomial of degree d , then f(n) is O(nd)nx = O(an), for any fixed x > 0 and a > 1

An algorithm of order n to a certain power is better than an algorithm of order a ( > 1) to the power of n

64Chung-Ang University Spring 2012

Page 65: Data Structure

Asymptotic analysis - terminology

Special classes of algorithms:logarithmic: O(log n)linear: O(n)quadratic: O(n2)polynomial: O(nk), k ≥ 1exponential: O(an), n > 1

Polynomial vs. exponential ?Logarithmic vs. polynomial ?

65Chung-Ang University Spring 2012

Page 66: Data Structure

“Relatives” of Big-Oh

“Relatives” of the Big-Oh (f(n)): Big Omega – asymptotic lower bound (f(n)): Big Theta – asymptotic tight bound

Big-Omega – think of it as the inverse of O(n)g(n) is (f(n)) if f(n) is O(g(n))

Big-Theta – combine both Big-Oh and Big-Omegaf(n) is (g(n)) if f(n) is O(g(n)) and g(n) is (f(n))

Make the difference: 3n+3 is O(n) and is (n)3n+3 is O(n2) but is not (n2)

66Chung-Ang University Spring 2012

Page 67: Data Structure

More “relatives”

Little-oh – f(n) is o(g(n)) if for any c>0 there is n0 such that f(n) < c(g(n)) for n > n0.Little-omegaLittle-theta

2n+3 is o(n2) 2n + 3 is o(n) ?

67Chung-Ang University Spring 2012

Page 68: Data Structure

One More Example – Prefix Average

Problem: prefix averagesGiven an array XCompute the array A such that A[i] is the average of elements X[0] … X[i], for i=0..n-1

Sol 1At each step i, compute the element X[i] by traversing the array A and determining the sum of its elements, respectively the average

Sol 2 At each step i update a sum of the elements in the array ACompute the element X[i] as sum/I

Big question: Which solution to choose?68Chung-Ang University Spring 2012

Page 69: Data Structure

ExampleRemember the algorithm for computing prefix averages

- compute an array A starting with an array X - every element A[i] is the average of all elements X[j] with j

< i

Remember some pseudo-code … Solution 1Algorithm prefixAverages1(X):Input: An n-element array X of numbers.Output: An n -element array A of numbers such that A[i] is the

average of elements X[0], ... , X[i].Let A be an array of n numbers.for i 0 to n - 1 do

a 0for j 0 to i do

a a + X[j] A[i] a/(i+ 1)

return array A

Analyze this

69Chung-Ang University Spring 2012

Page 70: Data Structure

Example (cont’d)

Algorithm prefixAverages2(X):Input: An n-element array X of numbers.Output: An n -element array A of numbers such

that A[i] is the average of elements X[0], ... , X[i]. Let A be an array of n numbers.s 0for i 0 to n do

s s + X[i] A[i] s/(i+ 1)

return array A

70Chung-Ang University Spring 2012

Page 71: Data Structure

Analyzing recursive algorithms

int fact (int n) {if (n < 1) return (1); else return (n * fact(n-1));

}

71Chung-Ang University Spring 2012

Page 72: Data Structure

Solving recursive equations by repeated substitution

T(n) = T(n-1) + c substitute for T(n-1)= T(n-2) + c + c substitute for T(n-2)= T(n-3) + c + c + c= T(n-4) + 4c in more compact form= …= T(n-k) + kc “inductive leap”

T(n) = ?

72Chung-Ang University Spring 2012

Page 73: Data Structure

Solving recursive equations by telescoping

T(n) = T(n-1) + c initial equation

T(n-1) = T(n-2) + c so this holds T(n-2) = T(n-3) + c and this … T(n-3) = T(n-4) + c and this …

… T(3) = T(2) + c eventually … T(2) = T(1) + c and this … T(1) = T(0) + c sum equations, canceling

the terms appearing on both sides

T(n) = O(?)

73Chung-Ang University Spring 2012