chapter 1 an introduction to processor design 부산대학교 컴퓨터공학과
TRANSCRIPT
Chapter 1An Introduction to Processor
Design
부산대학교컴퓨터공학과
23年 4月 21日 PNU Computer Eng. 2
1.1 Processor Architecture & Organization
All modern general-purpose computers employ “stored program concept” IAS computer by von Neumann at Princeton
Institute for Advanced Studies (in 1946) First implemented in ‘Baby Machine’ at Univ. of
Manchester, England (in 1948) [Figure 1.1] The state in a stored-program digital
computer
address
instructions
processor
memory
registers
instructions
data
00..0016
FF..FF16
and data
23年 4月 21日 PNU Computer Eng. 3
1.1 Processor Architecture & Organization
50 years of development: performance of processors cost cost-effective computers (principles of operation not
changed much) Most of improvements:
Advances in technology of electronics Vacuum tubes -> transistors -> ICs -> VLSI
New insights: Virtual memory (early 1960s) Cache memory Pipelining RISC
23年 4月 21日 PNU Computer Eng. 4
1.2 Abstraction in Hardware Design
Transistors (elementary component) Logically act as inverters
Logic gates CMOS NAND gate (using 4 trs)
If A = B = Vdd, output = Vss If either A or B (or both) = Vss, output =Vdd => output = not(A.B)
Transistor circuit, logic symbol, truth table
Vdd
Vss
A
B
A.B
A
Boutput
Logic symbol Truth table
A B Output
0 0 1
0 1 1
1 0 1
1 1 0
23年 4月 21日 PNU Computer Eng. 5
1.2 Abstraction in Hardware Design
The gate abstraction Simplify the process of designing circuits with great number of
trs Removes the need to know that the gate is built from trs Free from implementation technology in function level
Eg. Field effect tr, bipolar tr, etc. However, performance difference exists
Levels of abstraction Trs Gates, memory cells Adder, MUX, decoder, registers ALUs, shifters, memory blocks Processors, peripherals, memories ICs PCBs PCs, controllers, mobile phones
23年 4月 21日 PNU Computer Eng. 6
1.3 MU0 – a simple processor A simple form of processor can be built from a
few basic components PC (program counter) ACC (accumulator) ALU (arithmetic-logic unit) IR (instruction register) Instruction decoder, control logic
The MU0 instruction set A 16-bit machine with a 12-bit address space (4K x 2
bytes: 8K bytes memory) Instructions: 16 bits long (op: 4 bits, address field: 12
bits)opcode S
12 bits4 bits
23年 4月 21日 PNU Computer Eng. 7
1.3 MU0 – a simple processor [Table 1.1] The MU0 instruction set
Instruction Opcode Effect
LDA S 0000 ACC := mem16[S]
STO S 0001 mem16[S] := ACC
ADD S 0010 ACC := ACC + mem16[S]
SUB S 0011 ACC := ACC - mem16[S]
JMP S 0100 PC := S
JGE S 0101 if ACC >= 0 PC := S
JNE S 0110 if ACC !=0 PC := S
STP 0111 stop
23年 4月 21日 PNU Computer Eng. 8
1.3 MU0 – a simple processor Datapath
A register transfer level (RTL) design style based on registers, MUXs, and so on
[Figure 1.5] MU0 datapath example
IRPC
ACCALU
memory
control
address bus
data bus
23年 4月 21日 PNU Computer Eng. 9
RTL level design [Figure 1.6] MU0 register transfer level organization Control signals:
enables on all of regs function select lines to ALU select control lines for two MUXs control for a tri-state driver to send ACC value to memory MEMrq (memory request) RnW (read/write control lines)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
23年 4月 21日 PNU Computer Eng. 10
1.4 Instruction set design
To build a high-performance processor (beyond MU0 inst. set), inst. set design is important.
4 address insts (the most general form) Ex) add d, s1, s2, next_i; d := s1 + s2
3 address insts Make address of the next inst. implicit using PC (except
for branch) Ex) add d, s1, s2; d := s1 + s2
function op 1 addr. op 2 addr. dest. addr. next_i addr.
n bitsn bitsn bitsn bitsf bits
function op 1 addr. op 2 addr. dest. addr.
n bitsn bitsn bitsf bits
23年 4月 21日 PNU Computer Eng. 11
1.4 Instruction set design
2 address insts Make destination reg. the same as one of source reg. Ex) add d, s1; d := d + s1
1 address insts AC is used as destination Ex) add s1; AC := AC + s1
0 address insts (using a stack) Ex) add; tos := tos + next on stack
function op 1 addr. dest. addr.
n bitsn bitsf bits
function op 1 addr.
n bitsf bits
functionf bits
23年 4月 21日 PNU Computer Eng. 12
1.4 Instruction set design Addressing modes
Immediate addressing: immediate data Absolute addressing: inst. contains full address for data Indirect addressing: inst. contains address of location that
contains address of data Register addressing: data is in a reg. Register indirect addressing Index addressing Stack addressing
23年 4月 21日 PNU Computer Eng. 13
1.4 Instruction set design
Control flow instructions Branch, jump Conditional branch
Subroutine calls & returns System calls
Branch to an operating system routine Exceptions
Error handling
23年 4月 21日 PNU Computer Eng. 14
1.5 Processor design trade-offs CISC vs RISC
CISC To reduce semantic gap b/w high level language & machine
instruction Complex sequence of operations Make compiler’s job easy
RISC ARM’s middle name: from RISC Reducing semantic gap is not the right way to make an efficient
computer [Table 1.3] Typical dynamic instruction usage
Instruction type Dynamic usage
Data movement 43%
Control flow 23%
Arithmetic operations 15%
Comparisons 13%
Logical operations 5%
Other 1%
23年 4月 21日 PNU Computer Eng. 15
1.5 Processor design trade-offs Data movement b/w regs and memory:
almost half Control flow such as branches &
procedure calls: almost quarter Arithmetic operations: only 15%
Complex arithmetic insts do not help much The most important tech: pipelining,
cache memory To make processors go faster
23年 4月 21日 PNU Computer Eng. 16
1.5 Processor design trade-offs Pipelines
1. Fetch2. Decode3. REG: get operands from register bank4. ALU5. MEM: access memory for an operand, if necessary6. RES: write result back to register bank
[Figure 1.13] Pipelined instruction execution
fetch dec reg ALU mem res1
fetch dec reg ALU mem res
fetch dec reg ALU mem res
2
3
time
instruction
23年 4月 21日 PNU Computer Eng. 17
1.5 Processor design trade-offs Pipeline hazards
Read after write hazard (data hazard) Result from one inst is used as an operand by the next inst
=> inst2 must stall until the result is available [Figure 1.14] Read-after-write pipeline hazard
fetch dec reg ALU mem res1
fetch dec reg ALU mem res2
time
stall
instruction
23年 4月 21日 PNU Computer Eng. 18
1.5 Processor design trade-offs Branch hazard
Solution: Compute branch target earlier (if possible) The target may be computed speculatively Delayed branch
[Figure 1.15] Pipelined branch behavior
Pipeline efficiency The deeper the pipeline, the worse the problems get: RISC
approach is better
fetch dec reg ALU mem res1 (branch)
fetch dec reg ALU mem res
fetch dec reg ALU mem res
2
3
time
instruction
fetch dec reg ALU mem res
fetch dec reg ALU mem res
4
5 (branch target)
23年 4月 21日 PNU Computer Eng. 19
1.6 RISC In 1980, Patterson: RISCI project RISCI arch
Fixed (32-bit) inst size with few formats Load-store arch:
Insts that process data operate only on regs Separate insts to access memory A large register bank (32 32-bit regs) to allow load-store arch to
operate efficiently RISCI organization
Hard-wired inst decode logic Pipelined execution Single cycle execution
RISCI advantages A smaller die size A shorter development time A higher performance (controversial)