struktury počítačových systémů -...
TRANSCRIPT
6.1.2016
1
Computer System Structures
cz:Struktury počítačových systémů
Lecturer: Richard Šusta
ČVUT-FEL in Prague, CR – subject A0B35SPS
Version: 1.1
Von Neumann Machine, 1945
Stored-program Model
Both data and programs
are stored in the same main memory
Sequential Execution
Memory,
Input/Output,
Arithmetic/Logic Unit,
Control Unit
[Arnold S. Berger: Hardware Computer Organization for the Software Professional]
2
6.1.2016
2
von Neumann and Harvard Architectures
von Neumann
CPU
Memory
Instructions
Data
Address,
Data and
Status
Busses
von Neumann
“bottleneck”
Harvard
CPU
Instruction
memory
Data
Memory
Instruction
Address,
Data and
Status
Busses
Data space
Address,
Data and
Status
Busses
[Arnold S. Berger: Hardware Computer Organization for the Software Professional]
3
Processor 42 has Harvard architecture
Arithmetic
and Logic
Unit (ALU)
Data Memories
Outputs
Inputs
Control
Unit
Program
Counter
Program
Memory
selecting data source
sto
re re
sult
fetc
h o
pera
nds
PC instruction
immediate jump address/offset in instruction
immediate address/offset in instruction
control commands
Status signals
selecting destination of result
operation
result
operands
4
6.1.2016
3
Microprocessors
5
Definition: Microprocessor
A Microprocessor is a Single VLSI Chip that has a CPU and may also have some Other Units (for example, caches, floating point processing arithmetic unit, pipelining, and super-scaling units)
Source Microprocessor Family
Motorola 68HCxxx
Intel 80x86 & i860
Sun SPARC
IBM PowerPC
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 6
6.1.2016
4
Existovalo a dodnes existuje mnoho typů od různých výrobců
http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum_contents/Microprocessor_Evolution_Poster.jpg
Minimum computer with
8008 microprocessor
Source: http://juliepalooza.8m.com/sl/80088080.htm
6.1.2016
5
From 8 bit to 32 bit CISC processors
8 bit µ-processor 8008 8080 8085 Z80
Year 1972 1974 1976 1976
Instructions 66 111 113 158
MIPS 0.06 0.29 0.37 0.58
Minimum system (IO circuits) 20-40 7 3 3
16 and 32 bit
µ-processor 8086/88 80286 80386 80486 Pentium
64bit example: Intel
I5-2500K Quad core
Year 1978/79 1982 1985-91 1989-94 1993-98 2011
Instructions processor
+ numeric cooprocessor 133 ~175 ~195 ~200 ~225 ~500
MIPS 0.33-0.75 0.9-2.6 2.5-11 16-70 100-300 ~80000
Memory Physical / Virtual 1 MB 16 MB/1 GB 4 GB / 64TB 32 GB / 256 TB
Definition: Microcontroller
A microcontroller is a single-chip VLSI unit which, though having limited computational capabilities possesses enhanced input-output capabilities and a number of on-chip functional units
Source Microcontroller Family
Motorola 68HC11xx, HC12xx, HC16xx
Intel 80x86, 8051, 80251
Microchip PIC 16F84, 16F876, PIC18
ARM ARM9, ARM7
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 10
6.1.2016
6
Microcontroller
Microcontroller
(uC)
Outp
ut in
terfaces
Inp
ut in
tefaces
In device1
In device2
In device3
Out device1
Out device2
Auxiliary
units
11
Example: “Original” 8051 Microcontroller
Oscillator
and timing
4096 Bytes
Program
Memory
128 Bytes
Data
Memory
Two 16 Bit
Timer/Event
Counters
8051
CPU
64 K Byte Bus
Expansion
Control
Programmable
I/O
Programmable
Serial Port Full
Duplex UART
Synchronous Shifter
Internal data bus
External interrupts
subsystem interrupts
Control Parallel ports
Address Data Bus
I/O pins Serial Input
Serial Output
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 12
6.1.2016
7
SPS 13
Hardcore/Softcore Processors
Hard-core Processors
A Fabricated Integrated Circuit that may or
may not be Embedded into Additional Logic
Soft-core Processors
A Processor Described in a HDL that is
implemented in Reconfigurable Logic (ie, in
an FPGA), e.g NIOS, MicroBlaze, OpenRISC
SPS 14
Soft-core Processors Logic Elements
192
200
204
390
510
1170
1324
1800
1928
1984
2536
2600
3500
5000
6000
37000
60000
100 1000 10000 100000
Xilinx PicoBlaze
LatticeMico8
PacoBlaze
Nios II/e
DSPuva16
Nios II/s
Xilinx MicroBlaze
Nios II/f
OpenFire
LatticeMico32
aeMB
ARM Cortex-M1
LEON3
LEON2
OpenRISC 1200
S1 Core
Ultra SPARC S1 Core/f
Su
n M
icro
syste
ms
Op
en
so
urc
e
143 Our CPU Simple
6.1.2016
8
Copyright © 2005 Altera Corporation
Nios II: RISC soft-core processor
Copyright © 2005 Altera Corporation
Nios II Versions
Nios II Processor Comes In Three ISA (Instruction Set Architecture) Compatible Versions
Software Code is Binary Compatible
No Changes Required When CPU is Changed
FAST: Optimized for Speed
STANDARD: Balanced for Speed and
Size
ECONOMY: Optimized for Size
16
6.1.2016
9
Copyright © 2005 Altera Corporation
Nios II: Hard Numbers
Nios II/ f Nios II /s Nios II /e
Stratix II 200 DMIPS @
175MHz
1180 LEs
90 DMIPS @
175MHz
800 LEs
28 DMIPS @
190MHz
400 LEs
Cyclone 100 DMIPS @
125MHz
1800 LEs
62 DMIPS @
125MHz
1200 LEs
20 DMIPS @
140MHz
550 LEs
DMIPS = Dhrystone benchmark MIPS (million instructions per second)
Dhrystone is without float point operations
(Note: Whetstone [cz brousek] has float points)
Comparing with Pentium (1996) 224 DMIPS @ 200MHz
cca 196 DMIPS@175MHz - cca 140 DMIPS@125 MHz
17
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Minimum Nios II Processor
Program Controller
& Address
Generation
clock
reset
Status & Control
Registers
Instruction Master Port
Data Master Port
General Purpose Registers
Nios II Processor Core
= Configurable
= Fixed
Arithmetic Logic Unit
18
6.1.2016
10
Inside / Outside of Processors
Image source: Tron
Setup-hold problems
Synchronous
hazards
20
6.1.2016
11
SPS 21
Metastability in FSM
[image source: http://www.asic-world.com/]
21
Synchronous mod-5 divider
22
State
Current Q2 Q1 Q0
Next Q2 Q1 Q0
0 0 0 0 0 1
0 0 1 0 1 0
0 1 0 0 11
0 11 1 0 0
1 0 0 0 0 0
1 0 1 x x x
11 0 x x x
111 x x x
0 0 0 1 1 1 1 0
0 0 0 1 0 1 0 1 0 0 0 11
1 0 0 0 x x x x x x x x x
Q1 Q0
Q2
Q2 Q1 Q0
6.1.2016
12
SPS
Synchronous mod-5 divider
23
0 0 0 1 1 1 1 0
0 0 0 1 0 1 0 1 0 0 0 11
1 0 0 0 x x x x x x x x x
Q1 Q0
Q2 Q2 Q1 Q0
0 0 0 1 11 1 0
0 0 0 1 0
1 0 x x x
Q1 Q0
Q2
Q2
0 0 0 1 11 1 0
0 0 1 0 1
1 0 x x x
Q1 Q0
Q2
Q1
0 0 0 1 11 1 0
0 1 0 0 1
1 0 x x x
Q1 Q0
Q2
Q0
Q0 = Q2' . Q0'
= (Q2 + Q0)'
Q1 = Q0 xor Q1 Q2= Q1 . Q0
SPS
Result: Synchronous Counter 5 / Divider 5
VCC
CLOCK_50
INPUT
CLRN
D PRN
Q
DFF
inst15
CLRN
D PRN
Q
DFF
inst17
CLRN
D PRN
Q
DFF
inst18
VCC VCC
VCC
VCC VCC
VCC
AND2
inst25
XOR
inst26
NOR2
inst27
SQ2 OUTPUT
SQ1 OUTPUT
SQ0 OUTPUT
Warning: This circuit can have synchronous hazards under some
conditions - before copying, see the next slides dealing with metastability.
24
6.1.2016
13
SPS 25
Conditions for DFF Operation
Trigger on positive clock edge
Setup time – minimum time for which input must be maintained prior to
the occurrence of the clock transition.
Hold time – minimum time for which the input must not change after the
clock transition.
Propagation delay – time interval between clock transition and
stabilization of the output to a new state.
clock
input changing stable
setup hold
propagation delay output
Cyclone II speed grade 6 for DFFE
without LUT and interconnections
delayes
tsetup>-36 ps, thold>266 ps
tprop= 141ps
After adding delays of LUT and
interconnection, actual times are
much longer typically 5x, 10x or
more….
SPS 26
Minimum pulse width constraint
constraint: cz: omezující podmínka, ohraničení, omezení
eng: the quality or state of being checked, restricted, or compelled to avoid or
perform some action <the individual spirit anxious for freedom from constraint—
W.C.Brownell>, a restriction or limitation that contains a motion or other process
(as the action of a cam (cz:vačka) in machinery)
[quoted from Merriam-Webster's dictionary]
Cyclone II speed grade 6
tmin>>191 ps → tmin>=1,91 ns
ACLRN
minimum pulse width
output
6.1.2016
14
Clocks
In a system design, clocks may be generated by external chips
The idea assumption is that the clock edge are synchronized at each device
A digital clock signal is ideally a 50% duty cycle square wave
A “clock domain” is comprised of the a set of signals that are referenced to the same idea clock signal.
CPUs RAM Memory & I/O control
clock
27
What happens
when constraints are violated? Two stable states and one
metastable state
Ugly characteristic:
unbounded recovery time tr
Graphic source: J. Rabaey, Digital Integrated Circuits,
© Prentice-Hall, 1996
Vi1
Vo1=Vi2
Vo2
Vi1 Vo2
Vo
1
Vi2
= V
o1
Vi2
= V
o1
Vi1 = Vo2
A
C
B
Metastable
point
Vout1
=H V
out2=L
Vout1
=L V
out2=H
Metastable
State
28
6.1.2016
15
Metastability
Metastability
CLK
DIN
DOUT FF CLK
DIN DOUT
[John Brickner, FPGA Technologis, QuickLogic 2005]
Metastability Window: The
specific length of time, during
which both the data
and clock should not occur. If both
signals do occur, the output may
go metastable.
29
What happens
when constraints are violated? Three possible scenarios
1. New D value is correctly recorded
2. Circuit remains in old D value for an extra cycle
3. Metastability - “stuck” output between legal 0 and 1 until it “resolves”
CLK
D
Q1
Q2
Q3
Resolution Time tr
thold tsetup
30
6.1.2016
16
Recovery time
tclk-q
Number
Of
Occurrences
Recovery Time
Measured by statistical methods, depends on chip
technology
The resolution time is a random variable with
exponentioal distribution function (τ is decay constant)
Typical” values are for old 74 flip-flops is about 70ns,
for a 0.25µm ASIC library flip-flop is around 2.3 ns
31
Setup nad Hold Times Dependency
Setup and hold times mainly depend on
the technology of flip-flops
routing of signals, i.e. delays on wires
clock distributions to flip-flops
Setup and hold times do not depend on
frequency
32
6.1.2016
17
Example: Metastability Caused by Adding Circuit
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
External
Circuit
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
Q5OUTPUT
OK
Metastable due to hold time violation
33 An external circuit can cause different placement of flip-flops in FPGA
Delay - wires + logic
t
x
datapath Logic Element
Data busses
clock
clock skew, delay in distribution
34
6.1.2016
18
Metastability Caused by Clock Skew Flip flop 0
clock
Flip-flop 0
output
Flip-flip-1
output
Flip-flop 2
output
Flip-flop 2
clock
Flip-flip 0
-input
• Flip-flop2 clock signal is accidently ahead of flip-flop 0 clock signal in FPGA circuit.
• Hold time violation occurs at flip-flop 0 data input for the edge that depends on flip-flop 2
output.
• Flip-flop 0 metastability can result in wrong final state of flip-flop 0 output.
• Divider can accidently dived by 4 instead of 5.
35
Quartus Compiler Report
36
6.1.2016
19
Removing Metastability
Removing metastability means removing setup
and hold time violations in circuit. Many
methods can be tested, e.g.
1. Rearranging circuits
2. Increasing delay by adding LCELL
3. Synchronizers
4. Negative/Positive edges phases
5. Sometimes, it helps if you utilize correct
Synopsys Design Constraint File (.sdc) :-)
and others.....
37
1. Rearranging Circuit
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
External
Circuit
External
Circuit
Hold time violation was reduced, the divider operates OK under
normal conditions, under non-normal ???
Metastable
38
6.1.2016
20
2. Increasing Delay by LCELLs
External
Circuit
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
LCELL
inst
LCELL
inst6
LCELL
inst7
LCELL
inst8
39
Hold time violation removed, the divider is OK
The LCELL buffer allocates a logic cell for the project and always consumes one logic cell.
It is not removed from a project during logic synthesis. When you turn on the "Ignore
LCELL Buffers logic option", the Compiler automatically converts all LCELL buffers
to WIRE primitives. [Source: Quartus Guide]
3. Synchronizer
VCC CLK INPUT
Q5 OUTPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
VCC
VCC
CLRN
D PRN
Q
DFF
inst5
External
Circuit
Synchronizers isolate metastable signals
and datapaths, they are preferred solutions of metastability.
D Q
>C
D Q
>C
clk
unknown input
potentially metastable
signal “probably safe” signal
40
The divider is OK, but its
output is delayed by 1 clock.
6.1.2016
21
4. Negative/Positive Edge Phases
VCC CLK INPUT
Q5 OUTPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
VCC
VCC
CLRN
D PRN
Q
DFF
inst6
NO
T
inst
External
Circuit
41
Hold time violation was removed, the divider operates OK
without delay, but its maximum frequency is limited to 1/2
analogy of master slave
The Ten Commandments
cz:10 přikázání
6.1.2016
22
Old English
Subjective Objective Possessive
1st Pers. I me my/mine
2nd Pers. thou thee thy/thine
3rd Pers. he/she
/it him/her/it
his/her/
its
1st Pers. PL we us our
2nd Pers. P
L ye/you you your
3rd Pers. Pl. they them their
Present tense
to be to have
I am I have
thou art thou hast
he/she/it is he/she/it hath
we are we have
ye are ye have
they are they have
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous.
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
23
SPS 45
Recommended FSM
Present State
Register
δ Next State
function
ω Output
function
X:set of inputs
PresentStateRegister
Next State
Z:set of outputs
clock
reset_asyn
only
powerup
reset
synchronous
NextStateFunction
OutputFunction
Do not use
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
24
Fast paths
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
External
Circuit
VCC CLK INPUT
VCC VCC
VCC
VCC VCC
VCC
CLRN
D PRN
Q
DFF
inst1
CLRN
D PRN
Q
DFF
inst2
CLRN
D PRN
Q
DFF
inst3
NOR2
inst11
XOR
inst12
AND2
inst13
Q5OUTPUT
OK
Metastabile for thold violation
47
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
6. Cross clock domains with the greatest of
caution. Synchronize thy signals!
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
25
Clocks
A “clock domain” is comprised of the a set of signals that are referenced to the same idea clock signal.
CPUs RAM Memory & I/O control
clock
49
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
6. Cross clock domains with the greatest of
caution. Synchronize thy signals!
7. Have no dead states in thy state machines
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
26
SPS 51
Define paths from dead states
NextStateFunction : PROCESS (stReg, reset) -- transient function
BEGIN
IF (reset='1') THEN stNext <= state0;
ELSE
CASE stReg IS
WHEN state0 => stNext <= state1;
WHEN state1 => stNext <= state2;
WHEN state2 => stNext <= state3;
WHEN state3 => stNext <= state0;
WHEN OTHERS => stNext <= state0;
report "Reach undefined state";
-- if it is active -> the error will be in compile time
END CASE;
END IF;
END PROCESS;
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
6. Cross clock domains with the greatest of
caution. Synchronize thy signals!
7. Have no dead states in thy state machines
8. Have no logic with unbroken asynchronous
feedback lest the fleas of myriad Test
Engineers infest thee
9. All decode logic must be crafted carefully —
eschew asynchronicity
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
27
SPS 53
Asynchronnous divider by 5
Divider by 5
- divider is
asynchronously
cleared in state 6 to 0
and sometimes in others
states:-)
Q0 Q1
Q0
Q0
Q2
CLR CLR CLR
Q0
Q1
Q2
NAND
výstup 1 0
1 2 3 4 5 6 1 2 3 4 5
0
0
0
6 Clock
Yes, it is
deeply
analyzed
case but it is
not reliable
on FPGA
The Ten Commandments for Successful Design
1. All state machine outputs shall always be
registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets,
shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
6. Cross clock domains with the greatest of caution.
Synchronize thy signals!
7. Have no dead states in thy state machines
8. Have no logic with unbroken asynchronous
feedback lest the fleas of myriad Test Engineers
infest thee
9. All decode logic must be crafted carefully —
eschew asynchronicity
10. Trust not thy simulator — it may beguile thee
when thy design is junk
Peter Chambers, Peter’s Provocative Pontifications, 97
6.1.2016
28
Paralel Bus
It assures setup/hold
times
It is fast, but with
crosstalk problems
55
56
Simple NIOS System on DE2
32-Bit
Nios
Processor
Switch PIO
LED PIO
7-Segment
LED PIO
PIO-32
User-Defined
Interface UART Timer
Address (32)
Read
Write
Data In (32)
Data Out (32)
Nios Processor
Avalo
n B
us
6.1.2016
29
Copyright © 2005 Altera Corporation
Nios II/e on Basic Computer for DE2
57
Copyright © 2005 Altera Corporation
Nios II/s: DE2 Media Computer
58
6.1.2016
30
Interconnect Modelling for a Bus
59
CL - Ground Capacitance,
CI - Inter-wire Capacitance,
λ = CI /CL, RT -Total Resistance,
Source: Sudeep Pasricha, On Chip
Communication, Colorado 2011
Bus Clocking - setup/hold
Asynchronous Bus Not clocked
Requires a handshaking protocol
performance not as good as that of synchronous bus
No need for frequency converters, but does need extra lines
Does not suffer from clock skew like the synchronous bus
60
Source: Sudeep Pasricha, On Chip
Communication, Colorado 2011
6.1.2016
31
Bus Clocking
Synchronous Bus Includes a clock in control lines
Fixed protocol for communication that is relative to clock
Involves very little logic and can run very fast
Require frequency converters across frequency domains
61
Source: Sudeep Pasricha, On Chip
Communication, Colorado 2011
Bus Communication Protocols
Synchronous bus with fixed-latency devices.
Clock
Address placed on the bus
Wait Wait Data availability ensured
Address
Data Wait
Request
Address or data
Ack
Ready
Handshaking on an asynchronous bus for an input operation
(e.g., reading from memory).
62
6.1.2016
32
Copyright © 2005 Altera Corporation
Avalon Bus
Examples of transfers
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon bus (NIOS) read, 1 wait
Wait cycle specified by design
64
6.1.2016
33
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon read
slave, 2 wait specified by design
65
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon read slave, additional wait request generated by slave device
66
6.1.2016
34
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon read
slave, 1 set up and 1 wait
67
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon write
slave, 0 wait
68
6.1.2016
35
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon write
slave, 1 wait
69
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon write
slave, wait request generated by slave
70
6.1.2016
36
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.
and Altera marks in and outside the U.S.
Avalon write
slave, 1 set up, 1 hold, 0 wait
1 su 1 hold
71
Serial bus
Reducing problems
with wires
72
6.1.2016
37
Some Ways of Signal
Back
11 1 1 1 1 10 0+V
-V
+V
-V
1
0
11 1 1 1 1 10 0 Raw data
RZ, bipolar
1 1 1 1 1 1 100 NRZ, Manchester
Ethernet
RS232, PS/2
73
Preferred Signals with Minimized DC Component
A DC component heats channel.
cz: stejnosměrná složka vytápí linku
74
6.1.2016
38
Non-Return-to-Zero Inverted
USB, FDDI (Fiber Distributed Data Interface)
Differential in USB
75
Differential Current Drive
RTERM
High Speed Signaling Current
High Speed Differential Receiver
High Speed Signaling Current
RTERM
RTERM
RTERM
Data+
Data-
High Speed Differential Receiver
Power pair
Differential Signal pair
+5V, GND
76
6.1.2016
39
RS232 Interface Cable
Canon 9M - male Canon 9F - female
77
78
DE2 DE2
DTE - DTE
DCE - DCE DTE - DCE
DE2 PC
TxD
RxD
TxD
RxD
TxD
RxD
RxD
TxD
DTE -Data Terminal Equipment
/ DCE -Data Circuit Equipment Connections
6.1.2016
40
RS-232 Signal Levels
Data signals are shown in picture.
Control signals have opposite polarity.
Note: In devices, higher voltages are usually created by charge pumps
79
Technology for FPGA
What is inside of this miracle?
[ http://radio411.com/ ]
6.1.2016
41
The Memory Element
There are four kinds
SRAM brought to us by Xilinx
EEPROM brought to us by Altera
Flash (EEPROM) brought to us by
Actel
Antifuse brought to us by Actel
R
E
F
A
SPS 82
CMOS SRAM Cell Advantages
Static RAM (SRAM)
+ Unlimited fast read / write programming cycles
bit’ bit
word
R
6.1.2016
42
SPS 83
SRAM Drawbacks
Static (SRAM)
- Data stored only as long as supply is applied
- Large (4-6 transistors/cell)
R
SPS 84
DRAM - Dynamic RAM
Write: 1. Drive bit line
2. Select row
Read: 1. Precharge bit line to Vdd/2
2. Select row
3. Cell and bit line share charges Minute voltage changes on the bit line
4. Sense (fancy sense amp) Can detect changes of ~1 million electrons
5. Write: restore the value
Refresh 1. Just do a dummy read to every cell.
row select
bit
Read is really a read followed by a restoring write
DRAMs are smaller, but they require periodical refreshing
and are suitable only as internal memory in FPGA
[Source: Vijay Narayanan: FPGA technolgies]
R
6.1.2016
43
Volatile Memory Comparison
Larger cell lower density,
higher cost/bit
No refresh required
No dissipation of data
Read non-destructive
Simple read faster access
Standard IC process natural
for integration with logic
Smaller cell higher density,
lower cost/bit
Needs periodic refresh,
and refresh after read
Complex read longer access
time
Special IC process difficult to
integrate with logic circuits
word line
bit line bit line
word line
bit line
The primary difference between different memory types is the bit cell.
addr
data
SRAM DRAM
[Source: Vijay Narayanan: FPGA technolgies]
R
EEPROM: FLOTOX transistor
Floating gate
Source
Substrate p
Gate
Drain
n 1 n 1
FLOTOX transistor Fowler-Nordheim
I-V characteristic
20 – 30 nm
10 nm
-10 V
10 V
I
V GD
• Applying high voltage over this insulator allows electron to travel
to/from floating gate via Fowler-Nordheim tunneling.
• Erasing only requires reversing applied voltage.
[Source: Vijay Narayanan: FPGA technolgies]
E
6.1.2016
44
EEPROM Cell
WL
BL
V DD
Access Device
Programming Device
• Removing too much charge from
floating gate results in a depletion
device that cannot be turned off by
the standard word-line signals.
• Extra transistor is required to access
the transistor and FLOTOX
transistor is used only for storage
(open or closed).
• Fabrication of thin oxide is
challenging and costly
• Repeated programming causes shift
in threshold voltages
[Source: Vijay Narayanan: FPGA technolgies]
E
Flash EEPROM
Control gate
erasure
p- substrate
Floating gate
Thin tunneling oxide
n 1 source n 1 drain programming
Extra transistor of EEPROM is eliminated.
F
6.1.2016
45
Basic Operations in a NOR Flash Memory―Write
Always off
- Apply high voltage at gate to write selected device.
- Raises threshold of device. [Source: Vijay Narayanan: FPGA technolgies]
F
Basic Operations in a NOR Flash Memory―Erase
- Erase all cells simultaneously via tunneling by applying high
voltage at source.
[Source: Vijay Narayanan: FPGA technolgies]
F
6.1.2016
46
Basic Operations in a NOR Flash Memory―Read
- Read operation is the same as any NOR ROM structure.
[Source: Vijay Narayanan: FPGA technolgies]
F
Cross-sections of NVM cells
EPROM Flash Courtesy Intel
F
6.1.2016
47
SRAM / Flash F
Structure of AntiFuse
substrát
Kov 1
oxid
amorfní silikon oxid oxid
Kov 2
The drawing is related to the most used technology ViaLink or MicroVia, the both differs only in used metals 1 and 2
A
6.1.2016
48
Metal to metal antifuse moved the antifuse out of silicon
making the part denser and faster
A
Logic Logic
M1 M1
M2 M2
Logic Logic
M1
M2
M1 M1 M1
M2
M1 M1
M2
M3 M3
M2
SRAM SRAM
Antifuse
Antifuse: SX Routing Efficiency
Faster Signal Propagation than SRAM FPGA
Signals travel through more interconnect in SRAM
Interconnect dominates delays in deep sub-micron
Antifuse architecture uses minimum silicon area
SRAM needs silicon overhead for interconnect
A
6.1.2016
49
Antifuse / Flash
45 nm
Vivien, L.: FPGA Technologies
SEE problems SEU: Single Event Upset
- a change of state or transient induced by an energetic particle such as a cosmic ray or proton in a device. This may occur in digital, analog, and optical components or may have effects in surrounding interface circuitry (a subset known as Single Event Transients [SETs]). These are "soft" errors in that a reset or rewriting of the device causes normal device behavior thereafter.
SHE: Single Hard Error - an SEU which causes a permanent change to the operation of a device. An example is a stuck bit in a memory device.
SEE: Single Event Effect - any measurable effect to a circuit due to an ion strike, includes SEU, SHE and others.
from: http://nepp.nasa.gov/
R E F A
6.1.2016
50
Overview of technlogies
Source McCollum, J: Programmable Elements and Their Impact on FPGA Architecture, Performance, and
Radiation Hardness
SRAM Antifuse Flash EEPROM
Cell size 1 1/10 1/7 1/3
Speed (←capacity, resistance) Worse Excedent Worse? Middle
Density of intigration Middle Good Excedent Worse
SEE resistence Bad Excellent Middle Middle
Power consumption Different Excedent Worse Good
Criteria / Type
Reprogramming Yes,
unlimited No Yes,
<100000 Yes,
<1000
R E F A
Good Cost Excedent Worse Worse
100% Testability in production 100% No, 1-3% failures 100%
Protect design level 0,
1 when encrypted
level 3,
(ICs have 2) >4, nearly
impossible
>4, nearly
impossible
Semi-Programmable ASIC
I do not perform a construction,
I do an eternal reconstruction! [www.metropoleparis.com/]
6.1.2016
51
SPS 101
Modern ways of logic designes
ASIC-Application Specific Integrated Circuits
• Programmable Logic Device (PLD)
• Complex PLD (CPLD )
• Field Programmable Gate Arrays
(FPGA)
Graphical State Diagram
Graphical Flowchart
When clock rises
If (s == 0)
then y = (a & b) | c;
else y = c & !(d ^ e);
Textual HDL
Top-level
block-level
schematic
Block-level schematic
Not used
Microprocessor
& RAM
Full Custom
Standard
Logic
TTL
74xx
CMOS
4xxx
ASICs
Gate
Arrays
Standard
Cells
Programmable
PLDs FPGAs CPLDs
Semi-Programmable
>2000 >4000 >=1 > from 20
thousands
to milions rentability
SPS 102
Gate Array
Vd
d G
nd
Horizontal Routing Channel Vertic
al R
outin
g C
hannel
Sea of Gates: Routing Channels removed,
route at higher metal layers
G
6.1.2016
52
SPS 103
Gate Array Example
Vd
d G
nd
A A
B B
Vdd Vdd
Schematic
A
A
B
B
Out
Out
G
SPS 104
Gate-array Layout
Transistors pre-placed, fixed in size
Personalized by metal routing
Fastest to manufacture
Lowest mask cost
Lends itself to automated placement and wiring
G
6.1.2016
53
SPS 105
Gate-array Disadvantages
Non-optimized spacing
Limited transistor sizing options
Density
Performance
Power
Wiring blockages/inefficiencies
Excess circuitry
G
SPS 106
Standard Cell Layout
Routing Channel
Routing Channel Feed-through cell
Note uniform
height
S
6.1.2016
54
SPS 107
Standard Cell Layout
Design partitioned into cells of standard height
Power and Ground (Power grid) wiring preset
Technology provider supplies libraries of pre-designed cell
elements for usage (utilize varying numbers of cells)
Primitives (NAND, NOR, etc.)
Storage Elements (DFF)
Libraries can be tailored to specific applications (e.g., low
power vs. high performance)
Requires full manufacturing sequence
Typically automated place and wiring
S
SPS 108
Standard Cell Disadvantages
Cell height restrictions limits cell library contents
Full set of masks
Longer manufacturing times
S
6.1.2016
55
SPS 109
Macro-cell Layout
Library elements provided by technology
supplier (e.g., foundry)
Elements can be of varying heights and widths
Richer variety of library elements (IP friendly)
S
SPS 110
Macro-cell Disadvantages
Similar to Standard-cell in length of
manufacturing times, mask costs
Placement and wiring more complex
Pre-layout of power grid more difficult, may not
be possible
S