1/ 75 מבנה מחשבים 0368-2159 lecture 1 הקדמה נתן אינטרטור ויהודה...
Post on 24-Dec-2015
236 Views
Preview:
TRANSCRIPT
1/ 75
מחשבים מבנה0368-2159
Lecture 1הקדמה
נתן אינטרטור ויהודה אפק
מתרגלים: הילל אבנינועה בן-עמוס
2/ 75
מה זה מבנה ?מחשבים
חומרה - טרנזיסטורים
מעגלים לוגיים
ארכיטקטורת מחשבים
3/ 77
על מה נדבר :היום
Introduction : Computer Architecture
Administrative Matters
History
במחשב בסיסיות בינריות פעולות ועד וחשמל ממוליכים
• חשמלי מתח
מוליכים•
• למחצה: מוליך סיליקון
טרנזיסטור•
• אלקטרוניים ברכיבים בינריות פעולות
4/ 77
Computing Devices Then…
EDSAC, University of Cambridge, UK, 1949
5/ 77
Computing Devices Now
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this p icture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Robots
SupercomputersAutomobiles
Laptops
Set-top boxes
Games
Smart phones
Servers
Media Players
Sensor Nets
Routers
Cameras
6/ 77
,מבנה מחשבים?מה זה
7/ 77
8/ 77
Mother board
9/ 77
10/ 77
The paradigm (Patterson)
Every Computer Scientist should master the “AAA”
ArchitectureAlgorithmsApplications
11/ 77
Computer Architecture: GOAL
The goal of Computer ArchitectureTo build “cost effective systems”
•How do we calculate the cost of a system ?•How we evaluate the effectiveness of the system?
To optimize the system•What are the optimization points ?
Fact: most of the computer systems still use Von-Neumann principle of operation, even though, internally, they are much different from the computer of that time.
Fast, Effective and Cheap
12/ 77
Anatomy: 5 components of any Computer (since 1946)
Personal Computer
Processor
Computer
Control(“brain”)
Datapath(“brawn”)
Memory
(where programs, data live whenrunning)
Devices
Input
Output
Keyboard, Mouse
Display, Printer
Disk (where programs, data live whennot running)
13/ 77
Computer System Structure
CPU
I/O BUS
Bridge Memory
KeyBoardMouse
Scanner
LAN
LanAdap
USBHub
GraphicAdapt
VideoBuffer
Mem BUSCPU BUS
Cache
Scsi/IDEAdap
Scsi Bus
HardDisk
14/ 77
The Instruction Set: a Critical Interface
instruction set
software
hardware
15/ 77
”Computer Architecture “מה זה ?
Computer Architecture =
Instruction Set Architecture +
Machine Organization + …
ארכיטקטורה + = הנדסה
16/ 77
מבנה מחשבים
What are “Machine Structures”?
* Coordination of many
levels (layers) of abstraction
I/O systemProcessor
CompilerOperating
System(Linux, Win, ..)
Application (ex: browser)
Digital DesignCircuit Design
Instruction Set Architecture
Datapath & Control
transistors
MemoryHardware
Software Assembler
Physics
17/ 77
Levels of Representation
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw$15, 0($2)lw$16, 4($2)sw $16, 0($2)sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
°°
ALUOP[0:3] <= InstReg[9:11] & MASK
18/ 77
Computer Architecture’s Changing Definition 1950s to 1960s Computer Architecture Course
• Computer Arithmetic
1970s to mid 1980s Computer Architecture Course
• Instruction Set Design, especially ISA appropriate for compilers
1990s Computer Architecture Course• Design of CPU, memory system, I/O system, Multi-
processors, Networks
2000s Computer Architecture Course: • Special purpose architectures, Functionally
reconfigurable, Special considerations for low power/mobile processing
2005 – futue (?) Multi processors, Parallelism• Synchronization, Speed-up, How to Program ??? !!!
19/ 77
Forces on Computer Architecture
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
Cleverness
20/ 77
Computers in the News: Sony Playstation 2000
As reported in Microprocessor Report, Vol 13, No. 5:• Emotion Engine: 6.2 GFLOPS, 75 million polygons per second
• Graphics Synthesizer: 2.4 Billion pixels per second
• Claim: Toy Story realism brought to games!
The Playstation 3 will deliver nearly 2 teraflops overall performance, said Ken Kutaragi, president and group CEO of Sony Computer Entertainment
21/ 77
Where are We Going??
מבנהמחשבים
µProc60%/yr.(2X/1.5yr)
DRAM9%/yr.(2X/10 yrs)
1
10
100
1000
19
80 1
98
1 19
83 1
98
4 19
85 1
98
6 19
87 1
98
8 19
89 1
99
0 19
91 1
99
2 19
93 1
99
4 19
95 1
99
6 19
97 1
99
8 19
99 2
00
0
DRAM
CPU
19
82
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
34-b it A LU
LO register(16x2 bits)
Load
HI
Cle
arH
I
Load
LO
M ultiplicandRegister
S h iftA ll
LoadM p
Extra
2 bits
3 232
LO [1 :0 ]
Result[H I] Result[LO]
32 32
Prev
LO[1]
Booth
Encoder E N C [0 ]
E N C [2 ]
"LO
[0]"
Con trolLog ic
InputM ultiplier
32
S ub /A dd
2
34
34
32
InputM ultiplicand
32=>34sig nEx
34
34x2 M U X
32=>34sig nEx
<<13 4
E N C [1 ]
M ulti x2 /x1
2
2HI register(16x2 bits)
2
01
3 4 Arithmetic
Single/multicycleDatapaths
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
Pipelining
Memory Systems
I/O
22/ 77
שקופית מאחת ההרצאות לקראת סוף הסמסטר
23/ 77
Course Administration Instructors:
Yehuda Afek (afek@post.tau.ac.il)
Nathan Intrator (nin@post.tau.ac.il)
TA: Hillel Avni (hillel.avni@gmail.com )
Noa Ben Amos(noaben4@post.tau.ac.il)http://cs.tau.ac.il/~nin/Courses/CompStruct/CompStruct.htm
http://virtual.tau.ac.il
Books:
1. V. C. Hamacher, Z. G. Vranesic, S. G. Zaky Computer Organization. McGraw-Hill, 1982
2. H. Taub Digital Circuits and Microporcessors. McGraw-Hill 1982
3. הפתוחה האוניברסיטה בהוצאות ספרתיות מערכות
4. Hennessy and Patterson, Computer Organization Design, the hardware/software interface, Morgan Kaufman 1998
24/ 77
Gradingציון:
סופי 80%מבחן
20%תרגילים
תרגילים 7
25/ 77
Architecture & Microarchitecture Elements Architecture:
• Registers data width (8/16/32/64)
• Instruction set
• Addressing modes
• Addressing methods (Segmentation, Paging, etc...)
Architecture:• Physical memory size• Caches size and structure
• Number of execution units, number of execution pipelines
• Branch prediction
• TLB
Timing is considered Arch (though it is user visible!)
Processors with the same arch may have different Arch
26/ 77
Compatibility Backward compatibility
– New hardware can run existing software
– Example: Pentium 4 can run software originally written for Pentium III, Pentium II, Pentium , 486, 386, 286
Forward compatibility– New software can run on existing (old) hardware
– Example: new software written with MMXTM must still run on older Pentium processors which do not support MMXTM
– Less important than backward compatibility
New ideas: architecture independent– JIT – just in time compiler: Java and .NET
– Binary translation
27/ 77
How to compare between different systems?
28/ 77
Benchmarks – Programs for Evaluating Processor Performance
Toy Benchmarks– 10-100 line programs
– e.g.: sieve, puzzle, quicksort
Synthetic Benchmarks– Attempt to match average frequencies of real workloads
– e.g., Winstone, Dhrystone
Real programs– e.g., gcc, spice
SPEC: System Performance Evaluation Cooperative– SPECint (8 integer programs)
– and SPECfp (10 floating point)
29/ 77
CPI – to compare systems with same instruction set architecture
(ISA) The CPU is synchronous - it works according to a clock
signal.• Clock cycle is measured in nsec (10-9 of a second).• Clock rate (= 1/clock cycle) is measured in MHz (106
cycles/second). CPI - cycles per instruction
• Average #cycles per Instruction (in a given program)
• IPC (= 1/CPI) : Instructions per cycles
Clock rate is mainly affected by technology, CPI by the architecture
CPI breakdown: how many cycles (in average) the program spends for different causes; e.g., in executing, memory I/O etc.
CPI =#cycles required to execute the program #instruction executed in the program
31/ 77
CPU Time
CPU Time– The time required by the CPU to execute a given program:
CPU Time = clock cycle #cyc = clock cycle CPI IC
Our goal: minimize CPU Time– Minimize clock cycle: more MHz (process, circuit, Arch)
– Minimize CPI: Arch (e.g.: more execution units)
– Minimize IC: architecture (e.g.: MMXTM technology)
Speedup due to enhancement E
oEewPerformanc
EewPerformanc=
EExTimew
oEExTimew=ESpeedup
/
/
/
/
32/ 77
Speedupoverall =ExTimeold
ExTimenew
=1
Speedupenhanced
Fractionenhanced(1 - Fractionenhanced) +
ExTimenew = ExTimeold xSpeedupenhanced
Fractionenhanced(1 - Fractionenhanced) +
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:
Amdahl’s Law
33/ 77
• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
Speedupoverall =1
0.95= 1.053
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Corollary:
Make The Common Case Fast
Amdahl’s Law: Example
34/ 77
instruction set
software
hardware
Instruction Set Design
The ISA is what the user and the compiler sees
The ISA is what the hardware needs to implement
35/ 77
Why ISA is important?
Code size
• long instructions may take more time to be fetched
• Requires large memory (important in small devices, e.g., cell phones)
Number of instructions (IC)
• Reducing IC reduce execution time (assuming same CPI and frequency)
Code “simplicity”
• Simple HW implementation which leads to higher frequency and lower power
• Code optimization can better be applied to “simple code”
36/ 77
The impact of the ISA
RISC vs CISC
37/ 77
CISC Processors
CISC - Complex Instruction Set Computer
The idea: a high level machine languageCharacteristic
•Many instruction types, with many addressing modes
•Some of the instructions are complex: - Perform complex tasks- Require many cycles
•ALU operations directly on memory- Usually uses limited number of registers
•Variable length instructions- Common instructions get short codes save code
length
Example: x86
38/ 77
CISC Drawbacks
Compilers do not take advantage of the complex instructions and the complex indexing methods
Implement complex instructions and complex addressing modes complicate the processor slow down the simple, common instructions
contradict Amdahl’s law corollary:
Make The Common Case Fast
Variable length instructions are real pain in the neck:• It is difficult to decode few instructions in parallel
- As long as instruction is not decoded, its length is unknown
It is unknown where the instruction ends
It is unknown where the next instruction starts
• An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures)
Examples: VAX, x86 (!?!)
39/ 77
RISC Processors RISC - Reduced Instruction Set Computer
The idea: simple instructions enable fast hardware
Characteristic• A small instruction set, with only a few instructions formats
• Simple instructions- execute simple tasks
- require a single cycle (with pipeline)
• A few indexing methods
• ALU operations on registers only- Memory is accessed using Load and Store instructions only.
- Many orthogonal registers
- Three address machine: Add dst, src1, src2
• Fixed length instructions
Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM
40/ 77
RISC Processors (Cont.)
Simple architecture Simple micro-architecture •Simple, small and fast control logic•Simpler to design and validate•Room for on die caches: instruction cache + data
cache- Parallelize data and instruction access
•Shorten time-to-market
Using a smart compiler •Better pipeline usage•Better register allocation
Existing RISC processor are not “pure” RISC •e.g., support division which takes many cycles
41/ 77
RISC and Amdhal’s Law (Example)
In compare to the CISC architecture:• 10% of the static code, that executes 90% of the
dynamic has the same CPI
• 90% of the static code, which is only 10% of the dynamic, increases in 60%
• The number of instruction being executed is increased in 50%
• The speed of the processor is doubled - This was true for the time the RISC processors were
invented
We get
And then
1.061.60.10.91 =+=Speedup
Fraction+Fraction=
CPI
CPI
enhanced
enhancedenhanced
old
new
Speedup overall=CPU TimeoldCPU Timenew
=clockoldclock new
∗CPI oldCPI new
∗IC old
IC new=2/1.06∗1.5=1.26
42/ 77
So, what is better, RISC or CISC
Today CISC architectures (X86) are running as fast as RISC (or even faster)
The main reasons are:• Translates CISC instructions into RISC instructions
(ucode)
• CISC architecture are using “RISC like engine”
We will discuss this kind of solutions later on in this course.
43/ 77
Year
Tra
nsis
tors
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
Technology Trends: Microprocessor Complexity
2X transistors/ChipEvery 1.5 years
Called “Moore’s Law”
Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million
Moore’s Law
Athlon (K7): 22 Million
Itanium 2: 410 Million
44/ 77
45/ 77
46/ 77
Technology Trends: Processor Performance
0100200300400500600700800900
87 88 89 90 91 92 93 94 95 96 97
DEC Alpha 21264/600
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
1.54X/yr
Intel P4 2000 MHz(Fall 2001)
year
Per
form
ance
mea
sure
47/ 77
Technology Trends: Memory Capacity(Single-Chip DRAM)
size
Year
Bit
s
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size (Mbit)
1980 0.0625
1983 0.25
1986 1
1989 4
1992 16
1996 64
1998 128
2000 256
2002 512• Now 1.4X/yr, or 2X every 2 years.• 8000X since 1980!
48/ 77
Technology Trends Imply Dramatic Change
Processor• Logic capacity: about 30% per year
• Clock rate: about 20% per year
Memory• DRAM capacity: about 60% per year (4x every 3
years)
• Memory speed: about 10% per year
• Cost per bit: improves about 25% per year
Disk• Capacity: about 60% per year
• Total data use: 100% per 9 months!
Network Bandwidth• Bandwidth increasing more than 100% per year!
49/ 77
1980-2003, CPU--DRAM Speed gap
10
DRAM
CPU
Performance(1/latency)
100
1000
1980
2000
1990 Year
Gap grew 50% per year
Q. How do architects address this gap?
A. Put smaller, faster “cache” memories between CPU and DRAM.
10000The
power wall
2005
CPU60% per yr2X in 1.5 yrs
DRAM9% per yr2X in 10 yrs
50/ 77
Dimensions
1 cm 1 mm 0.1 mm 10µm 1 µm 0.1 µm 10 nm 1 nm 1 Å
Chip size(1 cm)
Diameter ofHuman Hair
(25 µm)
1996 devices(0.35 µm)
2007 devices(0.01 µm)
Siliconatomradius
(1.17 Å)
Deep UVWavelength(0.248 µm)
X-rayWavelength
(0.6 nm)
2001 devices(0.18 µm)
2005: 0.12 10e-6 = 1.2 10e-7
2006: 0.04 10e-6
Demo
51/ 77
ארכיטקטורת מחשבים בשנים הבאות :אנרגיה / צריכת חשמל בעבר non issue.
:היום Power Wall.חשמל יקר. טרנזיסטורים הם בחינם
:ביצועים משתפרים ע"י מיקבול ברמת פקודות המכונה, בעבר ,pipelining יחיד (CPUקומפיילרים חכמים, וארכיטקטורות
superscalar, out-of-order execution, speculations(
:היום ILP Wall.שיפורי חומרה לשיפור ביצועים לא משתלם
:כפל איטי, גישה לזיכרון מהירה.בעבר
:היום Memory Wall.כפל מהיר גישות לזיכרון איטיות
מחזורים לכפל)DRAM 4 מחזורי שעון ל200 (
:ביצועי מעבד יחיד בעבר X 2 שנים.1.5 כל
:אולי כל הנ"ל היום :X 2 שנים??5 כל
ליבות 40 עד 4) כל שנתיים. היום Cores מעבדים (ליבות X 2 אבל למעבד
52/ 77
Physics / Transistor’s History
First point contact transistor (germanium), 1947John Bardeen and Walter Brattain
Bell Laboratories
Audion (Triode), 1906Lee De Forest
19061906 19471947
53/ 77
History
Intel Pentium II, 1997Clock: 233MHz
Number of transistors: 7.5 MGate Length: 0.35
First integrated circuit (germanium), 1958Jack S. Kilby, Texas Instruments
Contained five components, three types:transistors resistors and capacitors
19581958 19971997
54/ 77
Annual Sales
1018 transistors manufactured in 2003 alone• 100 million for every human on the planet
0
50
100
150
200
1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
Year
Global S
emiconductor B
illings(B
illions of US
$)
55/ 77
56/ 77
57/ 77
58/ 77
59/ 77
Integrated Circuits (2003 state-of-the-art)
Primarily Crystalline Silicon
1mm - 25mm on a side
2003 - feature size ~ 0.13µm = 0.13 x 10-6 m
100 - 400M transistors
(25 - 100M “logic gates")
3 - 10 conductive layers
“CMOS” (complementary metal oxide semiconductor) - most common.
Package provides:
• spreading of chip-level signal paths to board-level
• heat dissipation.
Ceramic or plastic with gold wires.
Chip in Package
Bare Die
60/ 77
Printed Circuit Boards
fiberglass or ceramic
1-20 conductive layers
1-20in on a side
IC packages are soldered down.
61/ 77
nMOS Transistor
Four terminals: gate, source, drain, body
Gate – oxide – body stack looks like a capacitor• Gate and body are conductors
• SiO2 (oxide) is a very good insulator
• Called metal – oxide – semiconductor (MOS) capacitor
• Even though gate is
no longer made of metal
n+
p
GateSource Drain
bulk Si
SiO2
Polysilicon
n+
Off
Onn+ n+
p-type body
W
L
tox
SiO2 gate oxide(good insulator, ox = 3.90)
polysilicongate
62/ 77
nMOS Operation
Body is commonly tied to ground (0 V)
When the gate is at a low voltage:• P-type body is at low voltage
• Source-body and drain-body diodes are OFF
• No current flows, transistor is OFF
n+
p
GateSource Drain
bulk Si
SiO2
Polysilicon
n+D
0
S
Off
63/ 77
nMOS Operation Cont.
When the gate is at a high voltage:• Positive charge on gate of MOS capacitor
• Negative charge attracted to body
• Inverts a channel under gate to n-type
• Now current can flow through n-type silicon from source through channel to drain, transistor is ON
n+
p
GateSource Drain
bulk Si
SiO2
Polysilicon
n+D
1
SOn
64/ 77
pMOS Transistor
Similar, but doping and voltages reversed
• Body tied to high voltage (VDD)
• Gate low: transistor ON
• Gate high: transistor OFF
• Bubble indicates inverted behavior
SiO2
n
GateSource Drain
bulk Si
Polysilicon
p+ p+
65/ 77
66/ 77
Example: Inverter
67/ 77
Example: NAND3
Horizontal N-diffusion and p-diffusion strips
Vertical polysilicon gates
Metal1 VDD rail at top
Metal1 GND rail at bottom
32 by 40
68/ 77
69/ 77
70/ 77
CMOS Inverter
A Y
0
1
VDD
A Y
GNDA Y
71/ 77
CMOS Inverter
A Y
0
1 0
VDD
A=1 Y=0
GND
ON
OFF
A Y
72/ 77
CMOS Inverter
A Y
0 1
1 0
VDD
A=0 Y=1
GND
OFF
ON
A Y
73/ 77
74/ 77
75/ 77
Multiplexers
2:1 multiplexer chooses between two inputs
S D1 D0 Y
0 X 0
0 X 1
1 0 X
1 1 X
0
1
S
D0
D1Y
76/ 77
Multiplexers
2:1 multiplexer chooses between two inputs
S D1 D0 Y
0 X 0 0
0 X 1 1
1 0 X 0
1 1 X 1
0
1
S
D0
D1Y
77/ 77
Transmission Gate Mux
Nonrestoring mux uses two transmission gates• Only 4 transistors
S
S
D0
D1
YS
78/ 77
out
79/ 77
מה למדנו היוםComputer Architecture: integrates few levels, from programming languages to logic design.
Instruction Set Architecture (ISA)
Amdahl’s law
Moor’s law
Processor (CPU) --- Memory speed gap
History
Transistors. What, and how.
From transistors to logic design
top related