csc317l13

7/30/2019 csc317l13

1/14

Computer Organization & Architecture

Lecture #13

Computer Evolution and Performance

The evolution of computers has been characterized by increasing processor speed,decreasing component size, increasing memory size, and increasing I/O capacityand speed.

One factor responsible for the great increase in processor speed in the shrinkingsize of the microprocessor components; this reduces the distance betweencomponents and hence increases speed. However, the true gains in speed in recentyears have come from the organization of the processor, including heavy use of

pipelining and parallel execution techniques and the use of speculative execution

techniques, which results in the tentative execution of future instructions that mightbe needed. All of these techniques are designed to keep the processor busy as muchof the time as possible.

A critical issue in computer system design is balancing the performance of thevarious elements, so that gains in performance in one area are not handicapped bya lag in other areas. In particular, processor speed has increased more rapidly thanmemory access time. A variety of techniques are used to compensate for thismismatch, including caches, wider data paths from memory to processor, and moreintelligent memory chips.

A Brief History of Computers

The First Generation: Vacuum Tubes

ENIAC Electronic Numerical Integrator And Computer

Designed by John Mauchly and John Presper Eckert University of Pennsylvania 1943 to 1946 Developed for calculating artillery firing tables Generally regarded as the first electronic computer Enormous!!!

o 30 tonso 1500 square feet of floor spaceo 18,000 tubes

7/30/2019 csc317l13

2/14

o 140 kW of power 5000 additions per second Decimal number system 20 accumulators 10 digits Programmed manually with switches and cables Disassemble in 1955The von Neumann Machine

The task of entering and altering programs for the ENIAC was extremely tedious.The programming process could be facilitated if the program could be representedin a form suitable for storing in memory alongside the data. Then, a computercould get its instructions by reading them from memory, and a program could beset or altered by setting the values of a portion of memory.

Developed by John von Neumann Princeton Institute for Advanced Studies 1945 to 1952 Prototype of all subsequent general-purpose computers. IAS computer

o Stored-Program concepto Main memory stores both data and instructionso Arithmetic and logic unit (ALU) capable of operating on binary datao Control unit, which interprets and executes the instructions in memoryo Input and output (I/O) equipment operated by the control unit

Shown below is the general structure of the IAS computer:

7/30/2019 csc317l13

3/14

With rare exceptions, all of todays computers have this same general structure andfunction and are referred to as von Neumann machines.

IAS details

1000x40 words of storage both data and instructions

2x20 bit instructionsShown below is the number word format:

Shown below is the instruction word format:

The control unit operates the IAS by fetching instructions from memory andexecuting them one at a time. The control unit and the ALU contain storagelocations, called registers.

Memory buffer register (MBR): contains a word to be stored in memory, or issued to receive a word from memory.

Memory address register (MAR): specifies the address in memory of theword to be written from or read into the MBR.

Instruction register (IR): contains the 8-bit opcode instruction being executed. Instruction buffer register (IBR): employed to hold temporarily the right-hand instruction from a word in memory. Program Counter (PC): contains the address of the next instruction-pair to be

fetched from memory.

Accumulator (AC) and multiplier quotient (MQ): employed to holdtemporarily operands and results of ALU operations.

7/30/2019 csc317l13

4/14

Shown below is the expanded structure of IAS Computer:

Shown below is the IAS instruction cycle:

7/30/2019 csc317l13

5/14

The IAS operates by repetitively performing an instruction cycle. Each instructioncycle consists of two subcycles.

Fetch cycle: the opcode of the next instruction is loaded into the IR and theaddress portion is loaded into the MAR. This instruction may be taken from theIBR, or it can be obtained from memory by loading a word into the MBR, andthen down to the IBR, IR, and MAR.

Execute cycle: the control circuitry interprets the opcode and executes theinstruction by sending out the appropriate control signals to cause data to bemoved or an operation to be performed by the ALU.

The IAS computer had 21 instructions which can be grouped as follows:

Data transfer: move data between memory and ALU registers or between twoALU registers.

Unconditional branch: used to facilitate repetitive operations. Conditional branch: branch can be made dependent on a condition, thus

allowing decision points.

Arithmetic: operations performed by the ALU. Address modify: permits address to be computed in the ALU and then inserted

into instructions stored in memory.

7/30/2019 csc317l13

6/14

Commercial Computers

1947 Eckert-Mauchly Computer Corporation formed to manufacture computerscommercially

1950 UNIVAC I (Universal Automatic Computer) commissioned by Bureau ofthe Census

o first successful commercial computero scientific and commercial applications

Eckert-Mauchly Computer Corporation became part of the UNIVAC divisionof Sperry-Rand Corporation

Late 1950s UNIVAC II released with greater memory capacity and higherperformance than UNIVAC I upward compatible

IBM major manufacturer of punched-card processing equipment 1953 IBM 701 first electronic stored-program computer

o scientific applications 1955 IBM 702 introduced

o business applications IBM 700/7000 series established IBM as the overwhelmingly dominant

computer manufacturer

The Second Generation: Transistors

Transistors replaced vacuum tubeso Smallero Cheapero Less heato Same functionalityo Solid-state device made from silicon (sand)

Bell Labs 1947 Fully transistorized computers commercially available late 1950s NCR and RCA first to produce small transistor machines IBM 7000 Digital Equipment Corporation (DEC) PDP-1 High-level programming languages Provision of system software with computers

7/30/2019 csc317l13

7/14

Third Generation: Integrated Circuits

Single, self-contained transistor discrete component Manufacturing process was very expensive and cumbersome using discrete

components Early second generation computers contained 10,000 transistors expanding to

hundreds of thousands with newer machines

1958 Integrated circuit invented IBM System/360 DEC PDP-8Microelectronics

Means small electronics Computer consists of logic gates, memory cells and interconnections Manufactured on a semiconductor such as silicon Many transistors can be produced on a single wafer of siliconShown below is the relationship between Wafer, Chip, and Gate

The table below shows a summary of technology generations:

Generation Dates Technology Speed (ops per sec)

1 1946-1957 Vacuum Tube 40,000

2 1958-1964 Transistor 200,000

3 1965-1971 SSI and MSI 1,000,000

4 1972-1977 LSI 10,000,000

5 1978- VLSI 100,000,000

7/30/2019 csc317l13

8/14

Moores Law Gordon Moore cofounder of Intel - 1965

The number of transistors on a chip will double every year Since the 1970s the number of transistors has doubled every 18 months

Cost of a chip has remained virtually unchanged cost of computer logic andmemory circuitry has fallen at a dramatic rate

Higher packing density shorter electrical path increased operating speed Computers become smaller available in more environments Reduced power and cooling requirements Fewer interconnections increase in reliabilityShown below is the Growth in CPU Transistor Count:

IBM System/360 Series see Table 2.4

1964 Replaced 7000 series not compatible Industrys first planned family of computers

o Similar or identical instruction setso Similar or identical operating systems (O/S)o Increasing speedo Increasing number if I/O ports more terminal connectionso Increasing memory sizeo Increasing cost

Multiplexed switch structure see Figure 2.5

7/30/2019 csc317l13

9/14

DEC PDP-8 see Table 2.5

1964 First minicomputer named after miniskirt

Did not need an air conditioned room Small enough to sit on a lab bench Could not do everything that a mainframe computer could

o $16,000 versus $100,000+ IBM System/360 Original equipment manufacturers (OEM) would integrate PDP-8 as part of an

integrated system package

Introduced the bus structure that is virtually universal for all minicomputersand microcomputers

o Omnibus 96 signal paths control, address, and data signalsShow below is the Omnibus:

Semiconductor Memory

1950s and 1960s core memory Tiny rings of ferromagnetic material that were strung up on grids of fine wire

suspended on small screens inside the computer

Magnetized one way for a one and magnetized the other way for a zero Relatively fast 1 millionth of a second to read a stored bit Expensive and bulky Destructive read

o Data erased during reado Extra circuits required to restore data after read

1970 Fairchild Size of a single core 256 bits of memory Nondestructive read Much faster than core 70 billionths of a second to read a stored bit Cost initially much higher than core changed in 1974 11 generations each generation provided four times the storage density

7/30/2019 csc317l13

10/14

Microprocessors

1971 Intel 4004 4 bito First microprocessoro

All CPU components on a single chipo Designed for specific applications 1972 Intel 8008 8 bit

o Twice as complex as the 4004o Designed for specific applications

1974 Intel 8080 8 bito First general-purpose microprocessor

Table 2.6 shows the evolution of the Intel Microprocessors.

Designing for Performance

Microprocessor Speed

Chipmakers release new generations of chips every three years each with fourtimes as many transistors

Memory chips have quadrupled the capacity of dynamic-access memory(DRAM) every three years

Microprocessor speed boosts that come from reducing the distance betweencircuits has improved performance four- or fivefold every three years since Intellaunched the x86 family in 1978

The raw speed of the microprocessor will not achieve its potential unless if is fed aconstant stream of work to do in the form of computer instructions. While thechipmakers have been busy learning how to fabricate chips of greater and greaterdensity, the processor designers must come up with ever more elaborate techniquesfor feeding the monster.

Branch prediction the processor looks ahead in the instruction code fetchedfrom memory and predicts which branches, or groups of instructions, are likelyto be processed next.

Data flow analysis the processor analyzes which instructions are dependenton each others results, or data, to create an optimized schedule of instructions.

7/30/2019 csc317l13

11/14

Speculative execution using branch prediction and data flow analysis, someprocessors speculatively execute instructions ahead of their actual appearance inthe program execution, holding the results in temporary locations.

Performance Balance

While processor power has raced ahead at breakneck speed, other criticalcomponenets of the computer have not kept up. The result is a need to look for

performance balance: an adjusting of the organization and architecture tocompensate for the mismatch among the capabilities of the various components.

Nowhere is the problem created by such mismatches than in the interface betweenprocessor and main memory.

Shown below is the evolution of DRAM and Processor Characteristics:

While processor speed and memory capacity have grown rapidly, the speed withwhich data can be transferred between main memory and the processor has not.

The interface between processor and main memory is the most critical pathway inthe entire computer, because it is responsible for carrying a constant flow ofprogram instructions and data between memory chips and the processor.

7/30/2019 csc317l13

12/14

Shown below are the trends in DRAM use:

The amount of main memory is going up but, but DRAM density is going upfaster. The net result is that, on average, the number of DRAMs per system isgoing down. This has an affect on transfer rates, because there is less opportunityfor parallel transfer of data.

There are a number of ways that a system architect can attack this problem:

Increase the number of bits that are retrieved at one time wider data paths Change the DRAM interface include cache or other buffering techniques Reduce the frequency of memory access include one or more level of cache

both on- and off-chip between the processor and main memory

Increase the interconnect bandwidth between processors and memory higher-speed buses and using a hierarchy of buses to buffer and structure data flow

7/30/2019 csc317l13

13/14

Pentium and PowerPC Evolution

Pentium Evolution

8080o First general purpose microprocessor

o 8 bit data patho Used in first personal computer Altair

8086o Much more powerfulo 16 bito Instruction cache prefetch few instructionso 8088 (8 bit external bus) used in first IBM PC

80286o 16 Mbyte memory up form 1 Mbyte

80386o 32 bito Support for multitasking

80486o Sophisticated powerful cache and instruction pipeliningo Built in math co-processor

Pentiumo Introduced superscalar techniqueso Multiple instructions executed in parallel

Pentium Proo Increased superscalar organizationo Aggressive register renamingo Branch predictiono Data flow analysiso Speculative execution

Pentium IIo MMX technology video, audio, graphics processing

Pentium IIIo Additional floating point instructions for graphics

Pentium 4o Arabic not Roman numeralso Additional floating point and multimedia enhancements

Itaniumo 64 bit

7/30/2019 csc317l13

14/14

PowerPC Evolution

601o Introduce the market to the PowerPC architectureo

32 bit 603

o Used for low-end desktop and portable computerso 32 bito Lower cost and a more efficient implementation

604o Used for desktop computers and low-end serverso 32 bito Used advanced superscalar design techniques

620o Used in high-end serverso 64 bit

740/750 (G3)o Two levels of cache in the main processor significant performance

improvement over machines with off-chip cache

G4o Increase the parallelism and internal speed of the processor

csc317l13

Documents