gordon moore gordon moore, cofounder of intel 1965: 2 x trans. per chip/year after 1970: 2 x trans....

Post on 21-Dec-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Gordon MooreGordon Moore, cofounder of Intel

1965:2 x trans. per chip/year

After 1970:2 x trans. per chip/1.5year

摩爾定律

Growth in CPU transistor count

Consequences of Moore’s law

Cost of a chip remains unchanged during the growth of in density => cost down

Electrical path length is shortened => increase operating speed

Computer becomes smaller

Reduction in power More circuitry on each chip => fewer inter-chip connections => more

reliable

Chap.4 The Role of Performance

Jen-Chang Liu, Spring 2005

Hardware performance is often key to the effectiveness of an entire system of hardware and software.

What do we mean by saying one computer has better performance than another?

Example: performance of airplanes

Performance of a hardware system

What do we mean by better performance? Fast speed ?

Response time (execution time): the time between the start and completion of a task 完成工作所需的時間

Throughput : the total amount of work done in a given time 單位時間完成的工作

Ex. multi-user system

Performance measure

Performance X

1

Execution time x

=

* Relative performance:

Performance A

Performance B

= n =

Machine A is n times faster than B

Execution time B

Execution time A

Ex. machine A runs a program in 10 sec., machine B runs a program in 15 sec.,

Performance A

Performance B

= 1.5Execution time B

Execution time A

= =15

10

Quantitative relation of performance and execution time on machine x:

Problem with previous definition of performance

The definition of execution time How about multiple tasks run

concurrently? Use which programs to evaluate the

performance of a computer ?

Execution Time ? The total time to complete a task –

response time, elapsed time In a timeshared system, such as Unix, a

processor work on several programs Including disk access, memory access, I/O,

OS overhead…

執行時間的定義

使用者觀點

Program A swap Prog. B I/O Program A

Response time for A

CPU time CPU execution time

Does not include waiting for I/O, running other programs

CPU exec. time = user CPU time + system CPU time

user CPU time CPU time spent in the program

system CPU time CPU time spent in the OS about our program

不含 I/O, 執行其他程式時間

Example : CPU time Unix command : time

90.7u 12.9s 2:39 65%

user CPUsystem CPU

elapsed time

90.7+12.9

159= 0.65

We will discuss CPU performance, i.e. user CPU time in the following discussion

Unit of time Seconds Clock cycle

Ex. Clock cycle time = 2ns

Clock rate = 1

2x10-6= 500 MHz

CPU timefor a program

CPU clock cyclesfor a program= x Clock cycle time

Instructionsfor a program= x

Average clock cycleper instruction xClock cycle

time(CPI)

Example 1 Machine A,B has the same ISA, for the

same program Machine A: clock cycle = 1ns, CPI = 2 Machine B: clock cycle = 2ns, CPI = 1.2

CPU timeA= Inst. count x CPI x clock cycle time= I x 2 x 1= 2I

CPU timeB =I x 1.2 x 2 = 2.4 I

Performance A

Performance B

Execution time B

Execution time A

= =2.4I

2I= 1.2

A is 1.2 times faster than B

Example 2Instruction class CPI

ABC

123

Code 1: 2 1 2Code 2: 4 1 1

Compiler generate 2 different code sequences

A B C

CPU clock cycle1 = 2x1 + 1x2 + 2x3 = 10 cyclesCPU clock cycle2 = 4x1 + 1x2 + 1x3 = 9 cycles

Total inst.56

faster?

faster

Short conclusion Computer Performance

software hardware

Response time

CPU timeI/O, other prog.s

Instructioncount

CPI Clock cyclelength

How to optimize them in a hardware design?

Problem with previous definition of performance

The definition of execution time How about multiple tasks run

concurrently? Use which programs to evaluate the

performance of a computer ?

Choose programs to evaluate performance

Benchmarks: programs chosen to measure performance

SPEC (System Performance Evaluation Cooperative) suit of benchmarks Started in 1989 http://open.specbench.org/ SPEC95 in textbook is retired… SPECx contains a set of benchmark programs

SPEC – money…

SPEC95 benchmarks

Integer benchmarkswritten in C

floating-pt benchmarkswritten in Fortran 77

Summarize performance Which is faster?

Computer A

Computer B

Program 1(sec) 1 10

Program 2(sec) 1000 100

Total time(sec) 1001 110

Performance B

Performance A

Execution time A

Execution time B

= =1001

110= 9.1

* Assume the programs occur in equal probability.

SPEC ratio The execution time of a benchmark

program is normalized (compared to a baseline system)

SPECint95, SPECfp95

SPEC ratio = Exec. Time on Sun SPARCstation 10/40Exec. Time on the measured machine

SPECint95 = geometric mean of SPEC ratios

Example: SPECint95 for Pentium and Pentium Pro

Clock rate (MHz)

SP

EC

int

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Pentium

Pentium Pro

1

1 Performanceimprovement

2

2 Clock rate x2

SPECint x 1.7 ?

Amdahl’s law in computing

CPU timefor a program

CPU clock cyclesfor a program= x

Clock rate

1

Clock rate => CPU time 2 2

* Improvement of one aspect of a machine does not increaseperformance by the same ratio

部分的改進

* Ex. The bottleneck in the memory system does not improve

Exec. timeafter improve.

=Exec. time affected by improve.

Amount of improvementExec. timeunaffected

+

as in previous example

Example: Amdahl’s law A program takes 100s to run 20% multiplication, 50% memory op.,

30% others What’s the speed up for

Multiply speed 4

Memory access 2

10020/4 + 50 + 30

=1.18

10020 + 50/2 + 30

=1.33

MIPS as a measurement (not good…)

MIPS = Million Instructions Per Second

High MIPS => faster ?

MIPS=Instruction count

Execution time x 106

Pitfalls: MIPS cannot be used to compare computers with

different instruction sets => inst. count differs MIPS varies between programs on the same

computer => no single MIPS for a machine

Example: MIPS ? Example: 500 MHz machine

Code 1

Code 2

Inst. Count(x109) for each inst. classA B C

5 1 1

10 1 1

2 compilers for the same source program:

Instruction class CPIABC

123

Example: MIPS?

MIPS1 =Inst. count

Exec timex106= (5+1+1)x109

20x106=350

MIPS2 = (10+1+1)x109

30x106=400

Exec. time1 < Exec. time2

MIPS1< MIPS2

Exec. Time1 = (5x1+1x2+1x3)x109cycles

500x106 cycles/sec= 20 sec.

Exec. Time2 = (10x1+1x2+1x3)x109cycles

500x106 cycles/sec= 30 sec.

top related