chapter 1 fundamentals of computer design 吳俊興 高雄大學資訊工程學系 september 2004...

55
Chapter 1 Fundamentals of Computer Design 吳吳吳 吳吳吳吳吳吳吳吳吳吳 September 2004 EEF011 Computer Architecture 吳吳吳吳吳

Post on 19-Dec-2015

283 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Chapter 1Fundamentals of Computer Design

吳俊興高雄大學資訊工程學系

September 2004

EEF011 Computer Architecture計算機結構

Page 2: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Chapter 1. Fundamentals of Computer Design

1.1 Introduction1.2 The Changing Face of Computing and the Task of

the Computer Designer1.3 Technology Trends1.4 Cost, Price, and Their Trends1.5 Measuring and Reporting Performance1.6 Quantitative Principles of Computer Design1.7 Putting It All Together1.8 Another View1.9 Fallacies and Pitfalls1.10 Concluding Remarks

Page 3: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Computer Architecture –1.1 Introduction

• Rapid rate of improvement in the roughly 55 years– the technology used to build computers– innovation in computer design

Beginning in about 1970,computer designers became largely dependent upon IC technology

• Emergence of microprocessor: late 1970s– C: virtual elimination of assembly language programming

(portability)– UNIX: standardized, vendor-independent OS

Lowered the cost and risk of bringing out a new architecture

Page 4: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

• Development of RISC (Reduced Instruction Set Computer): early 1980sFocused the attention of designers on– The exploitation of instruction-level parallelism

• Initially through pipelining, and

• later through multiple instruction issue

– The use of caches• initially in simple forms and

• later using more sophisticated organizations and optimizations

1.1 Introduction (cont.)

Page 5: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Figure 1.1: 20 years of sustained growth in performance at an annual rate of over 50% (two different versions of SPEC, i.e., SPEC92 and SPEC95)

EffectsSignificantly enhanced the capability available to usersDominance of microprocessor-based computers across the entire

range of computer design• Minicomputers and mainframes replaced• Supercomputers built with collections of microprocessors

Page 6: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

1.2 Changing Face of Computing

• 1960s: dominated by mainframes– business data processing, large-scale scientific computing

• 1970s: the birth of minicomputers– scientific laboratories, time-sharing multiple-user

environment• 1980s: the rise of the desktop computer (personal

computer and workstation)• 1990s: the emergence of the Internet and the World

Wide Web– handheld computing devices– emergence of high-performance digital consumer

electronics

Page 7: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Segmented Markets•Desktop computing: to optimize price-performance

–the largest market in dollar terms–Web-centric, interactive applications

•Servers–Web-based services, replacing traditional mainframe

Availability, Scalability, Throughput

•Embedded Computers–Features

•The presence of the computers is not immediately obvious•The application can usually be carefully tuned for the processor and system

–Widest range of processing power and cost•price is a key factor in the design of computers•Real-time performance requirement

–Key characteristics•the need to minimize memory and power•The use of processor cores together with application-specific circuitry

Page 8: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構
Page 9: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Task of Computer Designer

• Tasks– Determine what attributes are important for a new

machine, then– design a machine to maximize performance while staying

within cost and power constraints• Efforts

– instruction set design, functional organization, logic design, and implementation (IC design, packaging, power, and cooling)

Design a computer to meet functional requirements as well as price, power, and performance goals

(see Figure 1.4)

Page 10: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Task of a Computer Designer

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

Implement NextImplement NextGeneration SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity

Effect of trendsDesigner often needs to design for the next technology because during a single product cycle (typically 4-5 years), key technologies, such as memory, change sufficiently that the designer must plan for these changes.

Page 11: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構
Page 12: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

1.3 Technology TrendsThe designer must be especially aware of rapidly occurring changes in implementation technology

1.Integrated circuit logic technology– transistor density increases by about 35% per year– die size are less predictable and slower

combined effect: growth rate in transistor count about 55% per year

2.Semiconductor DRAM– density increases by between 40% and 60% per year– cycle time decreased by about one-third in 10 years– bandwidth per chip increases about twice

3.Magnetic disk technology

4.Network technology

Page 13: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Moore’s Law

Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors that could be crammed on a chip doubles every year.

http://www.intel.com/research/silicon/mooreslaw.htm

Page 14: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Technology Trends

Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changed in architecture, Operating Systems, and Programming practices.

Capacity Speed (latency)

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years

Page 15: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

1.4 Cost, Price, and Their Trends

• Price: what you sell a finished good– 1999: more than half the PCs sold were

priced at less than $1000

• Cost: the amount spent to produce it, including overhead

Page 16: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

• Time– Learning curve: manufacturing cost reduces over time– Yield: the percentage of the manufactured devices that survi

ve the testing procedure– As the technology matures over time, the yield improves and

hence, things get cheaper• Volume

– Increasing volume decreases cost (time for learning curve),– Increases purchasing and manufacturing efficiency

• Commodities– Products are sold by multiple vendors in large volumes and

are essentially identical– The low-end business such as DRAMs, disks, monitors– Improved competition reduced cost

Cost and Its Trend

Page 17: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Between the start of a project and the shipping of a produce, say, two years, the cost of a new DRAM drops by a factor of between 5 and 10 in constant dollars.

Prices of six generations of DRAMs

Page 18: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Price of an Intel Pentium III at a Given Frequency

It decreases over time as yield enhancements decrease the cost of a good die and competition forces price reductions.

Page 19: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Cost and Its Trend (Cont.)

• Time– Learning curve: manufacturing cost reduces over time– Yield: the percentage of the manufactured devices that survive

the testing procedure– As the technology matures over time, the yield improves and h

ence, things get cheaper• Volume

– Increasing volume decreases cost (time for learning curve),– Increases purchasing and manufacturing efficiency

• Commodities– Products are sold by multiple vendors in large volumes and ar

e essentially identical– The low-end business such as DRAMs, disks, monitors– Improved competition reduced cost

Page 20: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Figure 1.8. This wafer contains 564 MIPS64 R20K processors in 0.18Figure 1.7. Intel Pentium 4 microprocessor Die

Wafer and DiesExponential cost decrease – technology basically the same:

A wafer is tested and chopped into dies that are packaged

Page 21: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Cost of IC =

Cost of an Integrated Circuit (IC)

yield Die per wafer Dies#

wafterofCost die ofCost

)area Die desity Detect

1( yield Wafer yield Die

Today’s technology: 4, detect density 0.4 and 0.8 per cm2

(A greater portion of the cost that varies between machines)

(sensitive to die size)

# of dies along the edge

Page 22: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Cost of an IC (cont.)

Page 23: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Cost Example in a $1000 PC in 2001

Page 24: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

1.5 Measuring and Reporting Performance

• Designing high performance computers is one of the major goals of any computer architect.

• As a result, assessing the performance of computer hardware is the at the heart of computer design and greatly affects the demand and market value of the computer.

• However, measuring performance of a computer system is not a straightforward task:

Metrics – How do we describe in a numerical way the performance of a computer?

What tools do we use to find those metrics? How do we summarize the performance?

Page 25: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

What is the computer user interested in?

• Reduce the time to run certain task–Execution time (response time) –The time between the start and the completion of

an event• Increase the tasks per week, day, hour, sec, ns …

–Throughput –The total amount of work done in a given time

Page 26: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Do the following changes to a computer system increase throughput, reduce response time, or both?

1) Replacing the processor in a computer with a faster version

2) Adding additional processors to a system that uses multiple processors for separate tasks –for example handling an airline reservation system

Answer1) Both response time and throughput are improved2) Only throughput increases

Example

Page 27: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Execution Time and Performance Comparison

• In this subject, we will be primarily interested in execution time as a measure of performance

• The relationship between performance and execution time on a computer X (reciprocal) is defined as follows:

• To compare design alternatives, we use the following equation:

• “X is n times faster than Y” or “the throughput of X is n times higher than Y” means that the execution time is n times less on X than Y

• To maximize performance of an application, we need to minimize its execution time

XX timeExecution

1 ePerformanc

nY

X

X

Y

ePerformanc

ePerformanc

TimeExecution

TimeExecution

Page 28: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Measure Performance – user CPU time

• Response time may include disk access, memory access, input/output activities, CPU event and operating system overhead - everything

• In order to get an accurate measure of performance, we use CPU time instead of using response time

• CPU time is the time the CPU spends computing a program and does not include time spent waiting for I/O or running other programs

• CPU time can also be divided into user CPU time (program) and system CPU time (OS)

• Key in UNIX command time, we have,90.7u 12.9s 2.39 65% (user CPU, system CPU, total response,%)

• In our performance measures, we use user CPU time – because of its independence on the OS and other factors

Page 29: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Programs for Measuring Performance

• Real applications: text processing software (Word), compliers (C), and other applications like Photoshop – have inputs, outputs, and options when the user wants to use them

One major downside: Real applications often encounter portability problems arising from dependences on OS or complier

• Modified (or scripted) applications: Modification is to enhance portability or focus on one particular aspect of system performance. Scripts are used to simulate the interaction behaviors

• Kernels: small, key pieces from real programs. Typically used to evaluate individual features of the machine

• Toy benchmarks: typically between 10 and 100 lines of code and produce a known result

• Synthetic benchmarks: artificially created code to match an average execution profile

Page 30: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

• A key advantage of such suites is that the weakness of any one benchmark is lessened by the presence of the other benchmarks

• Good vs. Bad benchmarks

– Improving product for real programs vs. improving product for benchmarks to get more sales

– If benchmarks are inadequate, then sales wins!

Benchmark Suites

• They are a collection of programs (workload) that try to explore and capture all the strengths and weaknesses of a computer system (real programs, kernels)

Page 31: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

SPEC Benchmarks

• Most successful attempts and widely adopted• First generation 1989

– 10 programs yielding a single number (“SPECmarks”)• Second generation 1992

– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)

– Unlimited compiler flags • Third generation 1995

– New set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point)

– Single flag setting for all programs: SPECint_base95, SPECfp_base95

– “benchmarks useful for 3 years”• SPEC CPU 2000

– CINT2000 (11 integer programs) andCFP2000 (14 floating point programs)

SPEC: Standard Performance Evaluation Cooperation

Page 32: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

SPEC BenchmarksCINT2000 (Integer Component of SPEC CPU2000)

Program Language What Is It164.gzip C Compression175.vpr C FPGA Circuit Placement and Routing176.gcc C C Programming Language Compiler181.mcf C Combinatorial Optimization186.crafty C Game Playing: Chess197.parser C Word Processing252.eon C++ Computer Visualization253.perlbmkC PERL Programming Language254.gap C Group Theory, Interpreter255.vortex C Object-oriented Database256.bzip2 C Compression300.twolf C Place and Route Simulator

http://www.spec.org/osg/cpu2000/CINT2000/

Page 33: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

SPEC BenchmarksCFP2000 (Floating Point Component of SPEC CPU2000)

Program Language What Is It168.wupwise Fortran 77 Physics / Quantum Chromodynamics171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field173.applu Fortran 77 Parabolic / Elliptic Differential Equations177.mesa C 3-D Graphics Library178.galgel Fortran 90 Computational Fluid Dynamics179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation187.facerec Fortran 90 Image Processing: Face Recognition188.ammp C Computational Chemistry189.lucas Fortran 90 Number Theory / Primality Testing191.fma3d Fortran 90 Finite-element Crash Simulation 200.sixtrack Fortran 77 High Energy Physics Accelerator Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/

Page 34: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

More Benchmarks

TPC: Transaction Processing Council

– Measure the ability of a system to handle transactions, which consist of database accesses and updates.

– Many variants depending on transaction complexity

– TPC-A: simple bank teller transaction style

– TPC-C: complex database query

– 34 kernels in 5 classes

– 16 automotive/industrial; 5 consumer; 3 networking; 4 office

automation; 6 telecommunications

EDN Embedded Microprocessor Benchmark Consortium (EEMBC, “embassy”)

Page 35: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Management would like to have one numberTechnical people want more:1.They want to have evidence of reproducibility – there should be

enough information so that you or someone else can repeat the experiment

2.There should be consistency when doing the measurements multiple times

How to Summarize Performance

How would you report these results?

Computer A Computer B Computer C

Program P1 (secs) 1 10 20

Program P2 (secs) 1000 100 20

Total Time (secs) 1001 110 40

Page 36: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Comparing and Summarizing Performance

Comparing the performance by looking at individual programs is not fair

Total execution time: a consistent summary measure

Arithmetic Mean – provides a simple average

– Timei: execution time for program i in the workload

– Doesn’t account for weight: all programs are treated equal

n

iiTime

n 1

1

Page 37: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Weighted Variants

What is the proper mixture of programs for the workload? Weight is a weighting factor to each program to indicate t

he relative frequency of the program in the workload: % of use

Weighted Arithmetic Mean

– Weighti: frequency of program i in the workload

– May be better but beware the dominant program time

n

iii TimeWeight

1

Page 38: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

A B C W(1) W(2) W(3)

Program P1 (secs) 1 10 20 0.5 0.909 0.999

Program P2 (secs) 1000 100 20 0.5 0.091 0.001

Arithmetic Mean 500.5 55 20

Weighted Arithmetic Mean (1) 500.5 55 20

Weighted Arithmetic Mean (2) 91.82 18.18 20

Weighted Arithmetic Mean (3) 2 10.09 20

Example: Figure 1.16

n

iii TimeWeight

1

Weighted Arithmetic Mean =

Page 39: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Normalized Time Metrics

Normalized execution time metricsMeasure the performance by normalizing it to a reference machine: Execution time ratioi

Geometric Mean

Geometric mean is consistent no matter which machine is the reference

The arithmetic mean should not be used to average normalized execution times

However, geometric mean still doesn’t form accurate predication models (doesn’t predict execution time). It encourages designers to improve the easiest benchmarks rather the slowest benchmarks (due to no weighting)

ni

n

1iRatio TimeExecution

Page 40: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Example

A B C W(1)

Program P1 (secs) 1 10 20 100/101

Program P2 (secs) 1000 100 20 1/101

Program P1 (secs) 1 10 20 Normalized to A

Program P2 (secs) 1 0.1 0.02

Geometric Mean 1 1 0.63

Arithmetic Mean 500.5 55 20

Weighted Arithmetic Mean (1) 10.89 10.89 20

Machines A and B have the same performance according to the Geometric Mean measure, yet this would only be true for a workload that P1 runs 100 times more than P2 according to the Weighted Arithmetic Mean measure

Page 41: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

1.6 Quantitative Principles of Computer Design

Already known how to define, measure and summarize performance, then we can explore some of the principles and guidelines in design and analysis of computers.

Make the common case fast

In making a design trade-off, favor the frequent case over the infrequent case Improving the frequent event, rather than the rare event, will obviously help

performance Frequent case is often simpler and can be done faster than the infrequent

case We have to decide what the frequent case is and how much performance

can be improved by making the case faster

Page 42: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Two equations to evaluate design alternatives

Amdahl’s Law states that the performance improvement to be gained from using some fast mode of execution is limited by the fraction of the time the faster mode can be used

Amdahl’s Law defines the speedup that can be gained by using a particular feature

The performance gain that can be obtained by improving some porting of a computer can be calculated using Amdahl’s Law

Amdahl’s Law

The CPU Performance Equation Essentially all computers are constructed using a clock running

at a constant rate CPU time then can be expressed by the amount of clock cycles

Page 43: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Amdahl's Law

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

tEnhancemenWithTimeExecution

tEnhancemenWithoutTimeExecutionESpeedup

___

___)(

Speedup due to enhancement E:

This fraction enhanced

• Fractionenhanced: the fraction of the execution time in the original machine that can be converted to take advantage of the enhancement

• Speedupenhanced: the improvement gained by the enhanced execution mode

Page 44: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

ExTimenew = ExTimeold x (1 - Fractionenhanced) +

Speedupoverall =ExTimeold

ExTimenew

Speedupenhanced

=1

(1 - Fractionenhanced) +Speedupenhanced

This fraction enhanced

ExTimeold ExTimenew

Fractionenhanced

Fractionenhanced

Amdahl's Law

Page 45: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Example: Floating point (FP) instructions are improved to run faster by a factor of 2; but only 10% of actual instructions are FP. What’s the overall speedup gained by this improvement?

Speedupoverall = 10.95

= 1.053

ExTimenew = ExTimeold x (0.9 + 0.1/2) = 0.95 x ExTimeold

Amdahl's Law

Answer

Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resource to improve cost-performance

It is particularly useful for comparing performances both of the overall system and the CPU of two design alternatives

More example: page 41 and page 42

Page 46: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

All computers are constructed using a clock to operate its circuits

• Typically measured by two basic metrics

• Clock rate – today in MHz and GHz

• Clock cycle time: clock cycle time = 1/clock rate

• E.g., 1 GHz rate corresponds to 1 ns cycle time

Thus the CPU time for a program is given by:

Or,

timecycleClock program afor cyclesclock CPUTime CPU

rateClock

program afor cyclesclock CPU Time CPU

CPU Performance

Page 47: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

More typically, we tend to count # instructions executed, known as Instruction Count (IC)

CPI: average number of clock cycles per instruction

Hence, alternative method to get the CPU timeIC

program afor cyclesclock CPU CPI

timecycleClock CPIIC timeCPU rateClock

CPIIC

CPU performance is equally dependent upon 3 characteristics: clock cycle time, CPI, IC. They are independent and making one better often makes another worse because the basic technologies involved in changing each characteristics are interdependent.

Clock cycle time: hardware technology and organizationCPI: organization and instruction set architectureIC: instruction set architecture and complier technology

Depends on IS of the computer and its compiler

CPU Performance

Page 48: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Example: Suppose we have 2 implementations of the same instruction set architecture. Computer A has a clock cycle time of 10 nsec and a CPI of 2.0 for some program, and computer B has a clock cycle time of 20 nsec and a CPI of 1.2 for the same program.

What machine is faster for this program?

Answer

Assume the program require IN instructions to be executed: CPU clock cycleA = IN x 2.0

CPU clock cycleB = IN x 1.2CPU timeA = IN x 2.0 x 10 = 20 x IN nsecCPU timeB = IN x 1.2 x 20 = 24 x IN nsec

So computer A is faster than computer B.

timecycleClock CPIIC timeCPU

CPU Performance

Page 49: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Often the overall performance is easy to deal with on a

per instruction set basis

The overall CPI can be expressed as:

n

ii

i

n

iii

1

1 CPIcountn Instructio

IC

countn Instructio

CPIIC CPI

timecycleClock CPIIC timeCPU1

n

iii

n

iii

1

CPIIC cyclesclock CPU

# times instruction i is executed

CPI for instruction i

CPU Performance

Page 50: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Cycles Per Instruction

Base Machine (Reg / Reg)

Instruction Freq CPI (% Time)

ALU 50% 1 (33%)

Load 20% 2 (27%)

Store 10% 2 (13%)

Branch 20% 2 (27%)

(Total CPI 1.5)

Example: Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.

How do we get total CPI?How do we get %time?

Page 51: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Example: Suppose we have made the following measurements: Frequency of FP operations (other than FPSQR) = 25% Avg. CPI of FP operations = 4.0 Avg. CPI of other instructions = 1.33 Frequency of FPSQR = 2% CPI of FPSQR = 20Compare two designs:

1) decrease CPI of FPSQR to 2;2) decrease the avg. CPI of all FP operations to 2.5.

Answer

First, find the original CPI:

The CPI with design 1:

The CPI with design 2:

0.233.1%754%25CPIcountn Instructio

IC CPI

1o

n

ii

i

1.642)-(202%-2.0FPSQRfor CPI Improved%2CPI CPI1 o

1.6251.3375%2.525% CPI2 So design 2 is better

CPU Performance

Page 52: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Other important fundamental observations come from the properties of programs.

Principle of locality: Programs tend to reuse data and instructions they have used recently.

There are two different types of locality: (Ref. Chapter 5) Temporal Locality (locality in time): If an item is referenced, it

will tend to be referenced again soon (loops, reuse, etc.) Spatial Locality (locality in space/location): If an item is

referenced, items whose addresses are close one another tend to be referenced soon (straight line code, array access, etc.)

We can predict with reasonable accuracy what instructions and data a program will use in the near future based on its accesses in the past.

A program spends 90% of its execution time in only 10% of the code.

Principle of Locality

Page 53: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Some “Misleading” Performance Measures

There are certain computer performance measures which are famous with computer manufactures and sellers – but may be misleading.

MIPS (Million Instructions Per Second)

MIPS depends on the instruction set to make it difficult to compare MIPS of different instructions.

MIPS varies between programs on the same computer – different programs use different instruction mix.

MIPS can vary inversely to performance –most importantly.

Page 54: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Some “Misleading” Performance Measures

MFLOPS: Focus on one type of work MFLOPS (Million Floating-point Operations Per Second) depends on the

program. Must be FP intensive.

MFLOPS depends on the computer as well.

The floating-point operations vary in complexity (e.g., add & divide).

Peak Performance: Performance that the manufacture guarantees you won’t exceed

Difference between peak performance and average performance is huge.

Peak performance is not useful in predicting observed performance.

Page 55: Chapter 1 Fundamentals of Computer Design 吳俊興 高雄大學資訊工程學系 September 2004 EEF011 Computer Architecture 計算機結構

Summary1.1 Introduction1.2 The Changing Face of Computing and the Task of

the Computer Designer1.3 Technology Trends1.4 Cost, Price, and Their Trends1.5 Measuring and Reporting Performance

– Benchmarks– (Weighted) Arithmetic Mean, Normalized Geometric Mean

1.6 Quantitative Principles of Computer Design– Amdahl’s Law– CPU Performance: CPU Time, CPI– Locality