chapter 1 an introduction to processor design 부산대학교 컴퓨터공학과

19
Chapter 1 An Introduction to Processor Design 부부부부부 부부부부부부

Upload: tracey-bates

Post on 13-Jan-2016

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

Chapter 1An Introduction to Processor

Design

부산대학교컴퓨터공학과

Page 2: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 2

1.1 Processor Architecture & Organization

All modern general-purpose computers employ “stored program concept” IAS computer by von Neumann at Princeton

Institute for Advanced Studies (in 1946) First implemented in ‘Baby Machine’ at Univ. of

Manchester, England (in 1948) [Figure 1.1] The state in a stored-program digital

computer

address

instructions

processor

memory

registers

instructions

data

00..0016

FF..FF16

and data

Page 3: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 3

1.1 Processor Architecture & Organization

50 years of development: performance of processors cost cost-effective computers (principles of operation not

changed much) Most of improvements:

Advances in technology of electronics Vacuum tubes -> transistors -> ICs -> VLSI

New insights: Virtual memory (early 1960s) Cache memory Pipelining RISC

cas
millions of trs on a single chip
Page 4: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 4

1.2 Abstraction in Hardware Design

Transistors (elementary component) Logically act as inverters

Logic gates CMOS NAND gate (using 4 trs)

If A = B = Vdd, output = Vss If either A or B (or both) = Vss, output =Vdd => output = not(A.B)

Transistor circuit, logic symbol, truth table

Vdd

Vss

A

B

A.B

A

Boutput

Logic symbol Truth table

A B Output

0 0 1

0 1 1

1 0 1

1 1 0

Page 5: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 5

1.2 Abstraction in Hardware Design

The gate abstraction Simplify the process of designing circuits with great number of

trs Removes the need to know that the gate is built from trs Free from implementation technology in function level

Eg. Field effect tr, bipolar tr, etc. However, performance difference exists

Levels of abstraction Trs Gates, memory cells Adder, MUX, decoder, registers ALUs, shifters, memory blocks Processors, peripherals, memories ICs PCBs PCs, controllers, mobile phones

Page 6: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 6

1.3 MU0 – a simple processor A simple form of processor can be built from a

few basic components PC (program counter) ACC (accumulator) ALU (arithmetic-logic unit) IR (instruction register) Instruction decoder, control logic

The MU0 instruction set A 16-bit machine with a 12-bit address space (4K x 2

bytes: 8K bytes memory) Instructions: 16 bits long (op: 4 bits, address field: 12

bits)opcode S

12 bits4 bits

Page 7: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 7

1.3 MU0 – a simple processor [Table 1.1] The MU0 instruction set

Instruction Opcode Effect

LDA S 0000 ACC := mem16[S]

STO S 0001 mem16[S] := ACC

ADD S 0010 ACC := ACC + mem16[S]

SUB S 0011 ACC := ACC - mem16[S]

JMP S 0100 PC := S

JGE S 0101 if ACC >= 0 PC := S

JNE S 0110 if ACC !=0 PC := S

STP 0111 stop

Page 8: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 8

1.3 MU0 – a simple processor Datapath

A register transfer level (RTL) design style based on registers, MUXs, and so on

[Figure 1.5] MU0 datapath example

IRPC

ACCALU

memory

control

address bus

data bus

Page 9: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 9

RTL level design [Figure 1.6] MU0 register transfer level organization Control signals:

enables on all of regs function select lines to ALU select control lines for two MUXs control for a tri-state driver to send ACC value to memory MEMrq (memory request) RnW (read/write control lines)

memory

ACC

IRce

PCce

ALUfs

Bsel

ACCce

ACCoe

MEMrq RnW

mux0 1

Asel

ALUAB

PC

ACC[15]

ACCz

IR

opcode

MU0

Page 10: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 10

1.4 Instruction set design

To build a high-performance processor (beyond MU0 inst. set), inst. set design is important.

4 address insts (the most general form) Ex) add d, s1, s2, next_i; d := s1 + s2

3 address insts Make address of the next inst. implicit using PC (except

for branch) Ex) add d, s1, s2; d := s1 + s2

function op 1 addr. op 2 addr. dest. addr. next_i addr.

n bitsn bitsn bitsn bitsf bits

function op 1 addr. op 2 addr. dest. addr.

n bitsn bitsn bitsf bits

Page 11: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 11

1.4 Instruction set design

2 address insts Make destination reg. the same as one of source reg. Ex) add d, s1; d := d + s1

1 address insts AC is used as destination Ex) add s1; AC := AC + s1

0 address insts (using a stack) Ex) add; tos := tos + next on stack

function op 1 addr. dest. addr.

n bitsn bitsf bits

function op 1 addr.

n bitsf bits

functionf bits

Page 12: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 12

1.4 Instruction set design Addressing modes

Immediate addressing: immediate data Absolute addressing: inst. contains full address for data Indirect addressing: inst. contains address of location that

contains address of data Register addressing: data is in a reg. Register indirect addressing Index addressing Stack addressing

Page 13: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 13

1.4 Instruction set design

Control flow instructions Branch, jump Conditional branch

Subroutine calls & returns System calls

Branch to an operating system routine Exceptions

Error handling

Page 14: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 14

1.5 Processor design trade-offs CISC vs RISC

CISC To reduce semantic gap b/w high level language & machine

instruction Complex sequence of operations Make compiler’s job easy

RISC ARM’s middle name: from RISC Reducing semantic gap is not the right way to make an efficient

computer [Table 1.3] Typical dynamic instruction usage

Instruction type Dynamic usage

Data movement 43%

Control flow 23%

Arithmetic operations 15%

Comparisons 13%

Logical operations 5%

Other 1%

Page 15: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 15

1.5 Processor design trade-offs Data movement b/w regs and memory:

almost half Control flow such as branches &

procedure calls: almost quarter Arithmetic operations: only 15%

Complex arithmetic insts do not help much The most important tech: pipelining,

cache memory To make processors go faster

Page 16: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 16

1.5 Processor design trade-offs Pipelines

1. Fetch2. Decode3. REG: get operands from register bank4. ALU5. MEM: access memory for an operand, if necessary6. RES: write result back to register bank

[Figure 1.13] Pipelined instruction execution

fetch dec reg ALU mem res1

fetch dec reg ALU mem res

fetch dec reg ALU mem res

2

3

time

instruction

Page 17: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 17

1.5 Processor design trade-offs Pipeline hazards

Read after write hazard (data hazard) Result from one inst is used as an operand by the next inst

=> inst2 must stall until the result is available [Figure 1.14] Read-after-write pipeline hazard

fetch dec reg ALU mem res1

fetch dec reg ALU mem res2

time

stall

instruction

Page 18: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 18

1.5 Processor design trade-offs Branch hazard

Solution: Compute branch target earlier (if possible) The target may be computed speculatively Delayed branch

[Figure 1.15] Pipelined branch behavior

Pipeline efficiency The deeper the pipeline, the worse the problems get: RISC

approach is better

fetch dec reg ALU mem res1 (branch)

fetch dec reg ALU mem res

fetch dec reg ALU mem res

2

3

time

instruction

fetch dec reg ALU mem res

fetch dec reg ALU mem res

4

5 (branch target)

Page 19: Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과

23年 4月 21日 PNU Computer Eng. 19

1.6 RISC In 1980, Patterson: RISCI project RISCI arch

Fixed (32-bit) inst size with few formats Load-store arch:

Insts that process data operate only on regs Separate insts to access memory A large register bank (32 32-bit regs) to allow load-store arch to

operate efficiently RISCI organization

Hard-wired inst decode logic Pipelined execution Single cycle execution

RISCI advantages A smaller die size A shorter development time A higher performance (controversial)