struktury počítačových systémů -...

55
6.1.2016 1 Computer System Structures cz:Struktury počítačových systémů Lecturer: Richard Šusta ČVUT-FEL in Prague, CR subject A0B35SPS Version: 1.1 Von Neumann Machine, 1945 Stored-program Model Both data and programs are stored in the same main memory Sequential Execution Memory, Input/Output, Arithmetic/Logic Unit, Control Unit [Arnold S. Berger: Hardware Computer Organization for the Software Professional] 2

Upload: others

Post on 16-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

1

Computer System Structures

cz:Struktury počítačových systémů

Lecturer: Richard Šusta

ČVUT-FEL in Prague, CR – subject A0B35SPS

Version: 1.1

Von Neumann Machine, 1945

Stored-program Model

Both data and programs

are stored in the same main memory

Sequential Execution

Memory,

Input/Output,

Arithmetic/Logic Unit,

Control Unit

[Arnold S. Berger: Hardware Computer Organization for the Software Professional]

2

Page 2: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

2

von Neumann and Harvard Architectures

von Neumann

CPU

Memory

Instructions

Data

Address,

Data and

Status

Busses

von Neumann

“bottleneck”

Harvard

CPU

Instruction

memory

Data

Memory

Instruction

Address,

Data and

Status

Busses

Data space

Address,

Data and

Status

Busses

[Arnold S. Berger: Hardware Computer Organization for the Software Professional]

3

Processor 42 has Harvard architecture

Arithmetic

and Logic

Unit (ALU)

Data Memories

Outputs

Inputs

Control

Unit

Program

Counter

Program

Memory

selecting data source

sto

re re

sult

fetc

h o

pera

nds

PC instruction

immediate jump address/offset in instruction

immediate address/offset in instruction

control commands

Status signals

selecting destination of result

operation

result

operands

4

Page 3: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

3

Microprocessors

5

Definition: Microprocessor

A Microprocessor is a Single VLSI Chip that has a CPU and may also have some Other Units (for example, caches, floating point processing arithmetic unit, pipelining, and super-scaling units)

Source Microprocessor Family

Motorola 68HCxxx

Intel 80x86 & i860

Sun SPARC

IBM PowerPC

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 6

Page 5: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

5

From 8 bit to 32 bit CISC processors

8 bit µ-processor 8008 8080 8085 Z80

Year 1972 1974 1976 1976

Instructions 66 111 113 158

MIPS 0.06 0.29 0.37 0.58

Minimum system (IO circuits) 20-40 7 3 3

16 and 32 bit

µ-processor 8086/88 80286 80386 80486 Pentium

64bit example: Intel

I5-2500K Quad core

Year 1978/79 1982 1985-91 1989-94 1993-98 2011

Instructions processor

+ numeric cooprocessor 133 ~175 ~195 ~200 ~225 ~500

MIPS 0.33-0.75 0.9-2.6 2.5-11 16-70 100-300 ~80000

Memory Physical / Virtual 1 MB 16 MB/1 GB 4 GB / 64TB 32 GB / 256 TB

Definition: Microcontroller

A microcontroller is a single-chip VLSI unit which, though having limited computational capabilities possesses enhanced input-output capabilities and a number of on-chip functional units

Source Microcontroller Family

Motorola 68HC11xx, HC12xx, HC16xx

Intel 80x86, 8051, 80251

Microchip PIC 16F84, 16F876, PIC18

ARM ARM9, ARM7

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 10

Page 6: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

6

Microcontroller

Microcontroller

(uC)

Outp

ut in

terfaces

Inp

ut in

tefaces

In device1

In device2

In device3

Out device1

Out device2

Auxiliary

units

11

Example: “Original” 8051 Microcontroller

Oscillator

and timing

4096 Bytes

Program

Memory

128 Bytes

Data

Memory

Two 16 Bit

Timer/Event

Counters

8051

CPU

64 K Byte Bus

Expansion

Control

Programmable

I/O

Programmable

Serial Port Full

Duplex UART

Synchronous Shifter

Internal data bus

External interrupts

subsystem interrupts

Control Parallel ports

Address Data Bus

I/O pins Serial Input

Serial Output

Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 12

Page 7: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

7

SPS 13

Hardcore/Softcore Processors

Hard-core Processors

A Fabricated Integrated Circuit that may or

may not be Embedded into Additional Logic

Soft-core Processors

A Processor Described in a HDL that is

implemented in Reconfigurable Logic (ie, in

an FPGA), e.g NIOS, MicroBlaze, OpenRISC

SPS 14

Soft-core Processors Logic Elements

192

200

204

390

510

1170

1324

1800

1928

1984

2536

2600

3500

5000

6000

37000

60000

100 1000 10000 100000

Xilinx PicoBlaze

LatticeMico8

PacoBlaze

Nios II/e

DSPuva16

Nios II/s

Xilinx MicroBlaze

Nios II/f

OpenFire

LatticeMico32

aeMB

ARM Cortex-M1

LEON3

LEON2

OpenRISC 1200

S1 Core

Ultra SPARC S1 Core/f

Su

n M

icro

syste

ms

Op

en

so

urc

e

143 Our CPU Simple

Page 8: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

8

Copyright © 2005 Altera Corporation

Nios II: RISC soft-core processor

Copyright © 2005 Altera Corporation

Nios II Versions

Nios II Processor Comes In Three ISA (Instruction Set Architecture) Compatible Versions

Software Code is Binary Compatible

No Changes Required When CPU is Changed

FAST: Optimized for Speed

STANDARD: Balanced for Speed and

Size

ECONOMY: Optimized for Size

16

Page 9: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

9

Copyright © 2005 Altera Corporation

Nios II: Hard Numbers

Nios II/ f Nios II /s Nios II /e

Stratix II 200 DMIPS @

175MHz

1180 LEs

90 DMIPS @

175MHz

800 LEs

28 DMIPS @

190MHz

400 LEs

Cyclone 100 DMIPS @

125MHz

1800 LEs

62 DMIPS @

125MHz

1200 LEs

20 DMIPS @

140MHz

550 LEs

DMIPS = Dhrystone benchmark MIPS (million instructions per second)

Dhrystone is without float point operations

(Note: Whetstone [cz brousek] has float points)

Comparing with Pentium (1996) 224 DMIPS @ 200MHz

cca 196 DMIPS@175MHz - cca 140 DMIPS@125 MHz

17

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Minimum Nios II Processor

Program Controller

& Address

Generation

clock

reset

Status & Control

Registers

Instruction Master Port

Data Master Port

General Purpose Registers

Nios II Processor Core

= Configurable

= Fixed

Arithmetic Logic Unit

18

Page 10: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

10

Inside / Outside of Processors

Image source: Tron

Setup-hold problems

Synchronous

hazards

20

Page 11: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

11

SPS 21

Metastability in FSM

[image source: http://www.asic-world.com/]

21

Synchronous mod-5 divider

22

State

Current Q2 Q1 Q0

Next Q2 Q1 Q0

0 0 0 0 0 1

0 0 1 0 1 0

0 1 0 0 11

0 11 1 0 0

1 0 0 0 0 0

1 0 1 x x x

11 0 x x x

111 x x x

0 0 0 1 1 1 1 0

0 0 0 1 0 1 0 1 0 0 0 11

1 0 0 0 x x x x x x x x x

Q1 Q0

Q2

Q2 Q1 Q0

Page 12: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

12

SPS

Synchronous mod-5 divider

23

0 0 0 1 1 1 1 0

0 0 0 1 0 1 0 1 0 0 0 11

1 0 0 0 x x x x x x x x x

Q1 Q0

Q2 Q2 Q1 Q0

0 0 0 1 11 1 0

0 0 0 1 0

1 0 x x x

Q1 Q0

Q2

Q2

0 0 0 1 11 1 0

0 0 1 0 1

1 0 x x x

Q1 Q0

Q2

Q1

0 0 0 1 11 1 0

0 1 0 0 1

1 0 x x x

Q1 Q0

Q2

Q0

Q0 = Q2' . Q0'

= (Q2 + Q0)'

Q1 = Q0 xor Q1 Q2= Q1 . Q0

SPS

Result: Synchronous Counter 5 / Divider 5

VCC

CLOCK_50

INPUT

CLRN

D PRN

Q

DFF

inst15

CLRN

D PRN

Q

DFF

inst17

CLRN

D PRN

Q

DFF

inst18

VCC VCC

VCC

VCC VCC

VCC

AND2

inst25

XOR

inst26

NOR2

inst27

SQ2 OUTPUT

SQ1 OUTPUT

SQ0 OUTPUT

Warning: This circuit can have synchronous hazards under some

conditions - before copying, see the next slides dealing with metastability.

24

Page 13: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

13

SPS 25

Conditions for DFF Operation

Trigger on positive clock edge

Setup time – minimum time for which input must be maintained prior to

the occurrence of the clock transition.

Hold time – minimum time for which the input must not change after the

clock transition.

Propagation delay – time interval between clock transition and

stabilization of the output to a new state.

clock

input changing stable

setup hold

propagation delay output

Cyclone II speed grade 6 for DFFE

without LUT and interconnections

delayes

tsetup>-36 ps, thold>266 ps

tprop= 141ps

After adding delays of LUT and

interconnection, actual times are

much longer typically 5x, 10x or

more….

SPS 26

Minimum pulse width constraint

constraint: cz: omezující podmínka, ohraničení, omezení

eng: the quality or state of being checked, restricted, or compelled to avoid or

perform some action <the individual spirit anxious for freedom from constraint—

W.C.Brownell>, a restriction or limitation that contains a motion or other process

(as the action of a cam (cz:vačka) in machinery)

[quoted from Merriam-Webster's dictionary]

Cyclone II speed grade 6

tmin>>191 ps → tmin>=1,91 ns

ACLRN

minimum pulse width

output

Page 14: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

14

Clocks

In a system design, clocks may be generated by external chips

The idea assumption is that the clock edge are synchronized at each device

A digital clock signal is ideally a 50% duty cycle square wave

A “clock domain” is comprised of the a set of signals that are referenced to the same idea clock signal.

CPUs RAM Memory & I/O control

clock

27

What happens

when constraints are violated? Two stable states and one

metastable state

Ugly characteristic:

unbounded recovery time tr

Graphic source: J. Rabaey, Digital Integrated Circuits,

© Prentice-Hall, 1996

Vi1

Vo1=Vi2

Vo2

Vi1 Vo2

Vo

1

Vi2

= V

o1

Vi2

= V

o1

Vi1 = Vo2

A

C

B

Metastable

point

Vout1

=H V

out2=L

Vout1

=L V

out2=H

Metastable

State

28

Page 15: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

15

Metastability

Metastability

CLK

DIN

DOUT FF CLK

DIN DOUT

[John Brickner, FPGA Technologis, QuickLogic 2005]

Metastability Window: The

specific length of time, during

which both the data

and clock should not occur. If both

signals do occur, the output may

go metastable.

29

What happens

when constraints are violated? Three possible scenarios

1. New D value is correctly recorded

2. Circuit remains in old D value for an extra cycle

3. Metastability - “stuck” output between legal 0 and 1 until it “resolves”

CLK

D

Q1

Q2

Q3

Resolution Time tr

thold tsetup

30

Page 16: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

16

Recovery time

tclk-q

Number

Of

Occurrences

Recovery Time

Measured by statistical methods, depends on chip

technology

The resolution time is a random variable with

exponentioal distribution function (τ is decay constant)

Typical” values are for old 74 flip-flops is about 70ns,

for a 0.25µm ASIC library flip-flop is around 2.3 ns

31

Setup nad Hold Times Dependency

Setup and hold times mainly depend on

the technology of flip-flops

routing of signals, i.e. delays on wires

clock distributions to flip-flops

Setup and hold times do not depend on

frequency

32

Page 17: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

17

Example: Metastability Caused by Adding Circuit

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

External

Circuit

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

Q5OUTPUT

OK

Metastable due to hold time violation

33 An external circuit can cause different placement of flip-flops in FPGA

Delay - wires + logic

t

x

datapath Logic Element

Data busses

clock

clock skew, delay in distribution

34

Page 18: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

18

Metastability Caused by Clock Skew Flip flop 0

clock

Flip-flop 0

output

Flip-flip-1

output

Flip-flop 2

output

Flip-flop 2

clock

Flip-flip 0

-input

• Flip-flop2 clock signal is accidently ahead of flip-flop 0 clock signal in FPGA circuit.

• Hold time violation occurs at flip-flop 0 data input for the edge that depends on flip-flop 2

output.

• Flip-flop 0 metastability can result in wrong final state of flip-flop 0 output.

• Divider can accidently dived by 4 instead of 5.

35

Quartus Compiler Report

36

Page 19: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

19

Removing Metastability

Removing metastability means removing setup

and hold time violations in circuit. Many

methods can be tested, e.g.

1. Rearranging circuits

2. Increasing delay by adding LCELL

3. Synchronizers

4. Negative/Positive edges phases

5. Sometimes, it helps if you utilize correct

Synopsys Design Constraint File (.sdc) :-)

and others.....

37

1. Rearranging Circuit

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

External

Circuit

External

Circuit

Hold time violation was reduced, the divider operates OK under

normal conditions, under non-normal ???

Metastable

38

Page 20: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

20

2. Increasing Delay by LCELLs

External

Circuit

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

LCELL

inst

LCELL

inst6

LCELL

inst7

LCELL

inst8

39

Hold time violation removed, the divider is OK

The LCELL buffer allocates a logic cell for the project and always consumes one logic cell.

It is not removed from a project during logic synthesis. When you turn on the "Ignore

LCELL Buffers logic option", the Compiler automatically converts all LCELL buffers

to WIRE primitives. [Source: Quartus Guide]

3. Synchronizer

VCC CLK INPUT

Q5 OUTPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

VCC

VCC

CLRN

D PRN

Q

DFF

inst5

External

Circuit

Synchronizers isolate metastable signals

and datapaths, they are preferred solutions of metastability.

D Q

>C

D Q

>C

clk

unknown input

potentially metastable

signal “probably safe” signal

40

The divider is OK, but its

output is delayed by 1 clock.

Page 21: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

21

4. Negative/Positive Edge Phases

VCC CLK INPUT

Q5 OUTPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

VCC

VCC

CLRN

D PRN

Q

DFF

inst6

NO

T

inst

External

Circuit

41

Hold time violation was removed, the divider operates OK

without delay, but its maximum frequency is limited to 1/2

analogy of master slave

The Ten Commandments

cz:10 přikázání

Page 22: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

22

Old English

Subjective Objective Possessive

1st Pers. I me my/mine

2nd Pers. thou thee thy/thine

3rd Pers. he/she

/it him/her/it

his/her/

its

1st Pers. PL we us our

2nd Pers. P

L ye/you you your

3rd Pers. Pl. they them their

Present tense

to be to have

I am I have

thou art thou hast

he/she/it is he/she/it hath

we are we have

ye are ye have

they are they have

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous.

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 23: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

23

SPS 45

Recommended FSM

Present State

Register

δ Next State

function

ω Output

function

X:set of inputs

PresentStateRegister

Next State

Z:set of outputs

clock

reset_asyn

only

powerup

reset

synchronous

NextStateFunction

OutputFunction

Do not use

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous

4. Beware fast paths lest they bite thine ankles

5. Minimize skew of thine clocks

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 24: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

24

Fast paths

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

External

Circuit

VCC CLK INPUT

VCC VCC

VCC

VCC VCC

VCC

CLRN

D PRN

Q

DFF

inst1

CLRN

D PRN

Q

DFF

inst2

CLRN

D PRN

Q

DFF

inst3

NOR2

inst11

XOR

inst12

AND2

inst13

Q5OUTPUT

OK

Metastabile for thold violation

47

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous

4. Beware fast paths lest they bite thine ankles

5. Minimize skew of thine clocks

6. Cross clock domains with the greatest of

caution. Synchronize thy signals!

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 25: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

25

Clocks

A “clock domain” is comprised of the a set of signals that are referenced to the same idea clock signal.

CPUs RAM Memory & I/O control

clock

49

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous

4. Beware fast paths lest they bite thine ankles

5. Minimize skew of thine clocks

6. Cross clock domains with the greatest of

caution. Synchronize thy signals!

7. Have no dead states in thy state machines

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 26: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

26

SPS 51

Define paths from dead states

NextStateFunction : PROCESS (stReg, reset) -- transient function

BEGIN

IF (reset='1') THEN stNext <= state0;

ELSE

CASE stReg IS

WHEN state0 => stNext <= state1;

WHEN state1 => stNext <= state2;

WHEN state2 => stNext <= state3;

WHEN state3 => stNext <= state0;

WHEN OTHERS => stNext <= state0;

report "Reach undefined state";

-- if it is active -> the error will be in compile time

END CASE;

END IF;

END PROCESS;

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous

4. Beware fast paths lest they bite thine ankles

5. Minimize skew of thine clocks

6. Cross clock domains with the greatest of

caution. Synchronize thy signals!

7. Have no dead states in thy state machines

8. Have no logic with unbroken asynchronous

feedback lest the fleas of myriad Test

Engineers infest thee

9. All decode logic must be crafted carefully —

eschew asynchronicity

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 27: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

27

SPS 53

Asynchronnous divider by 5

Divider by 5

- divider is

asynchronously

cleared in state 6 to 0

and sometimes in others

states:-)

Q0 Q1

Q0

Q0

Q2

CLR CLR CLR

Q0

Q1

Q2

NAND

výstup 1 0

1 2 3 4 5 6 1 2 3 4 5

0

0

0

6 Clock

Yes, it is

deeply

analyzed

case but it is

not reliable

on FPGA

The Ten Commandments for Successful Design

1. All state machine outputs shall always be

registered

2. Thou shalt use registers, never latches

3. Thy state machine inputs, including resets,

shall be synchronous

4. Beware fast paths lest they bite thine ankles

5. Minimize skew of thine clocks

6. Cross clock domains with the greatest of caution.

Synchronize thy signals!

7. Have no dead states in thy state machines

8. Have no logic with unbroken asynchronous

feedback lest the fleas of myriad Test Engineers

infest thee

9. All decode logic must be crafted carefully —

eschew asynchronicity

10. Trust not thy simulator — it may beguile thee

when thy design is junk

Peter Chambers, Peter’s Provocative Pontifications, 97

Page 28: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

28

Paralel Bus

It assures setup/hold

times

It is fast, but with

crosstalk problems

55

56

Simple NIOS System on DE2

32-Bit

Nios

Processor

Switch PIO

LED PIO

7-Segment

LED PIO

PIO-32

User-Defined

Interface UART Timer

Address (32)

Read

Write

Data In (32)

Data Out (32)

Nios Processor

Avalo

n B

us

Page 29: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

29

Copyright © 2005 Altera Corporation

Nios II/e on Basic Computer for DE2

57

Copyright © 2005 Altera Corporation

Nios II/s: DE2 Media Computer

58

Page 30: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

30

Interconnect Modelling for a Bus

59

CL - Ground Capacitance,

CI - Inter-wire Capacitance,

λ = CI /CL, RT -Total Resistance,

Source: Sudeep Pasricha, On Chip

Communication, Colorado 2011

Bus Clocking - setup/hold

Asynchronous Bus Not clocked

Requires a handshaking protocol

performance not as good as that of synchronous bus

No need for frequency converters, but does need extra lines

Does not suffer from clock skew like the synchronous bus

60

Source: Sudeep Pasricha, On Chip

Communication, Colorado 2011

Page 31: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

31

Bus Clocking

Synchronous Bus Includes a clock in control lines

Fixed protocol for communication that is relative to clock

Involves very little logic and can run very fast

Require frequency converters across frequency domains

61

Source: Sudeep Pasricha, On Chip

Communication, Colorado 2011

Bus Communication Protocols

Synchronous bus with fixed-latency devices.

Clock

Address placed on the bus

Wait Wait Data availability ensured

Address

Data Wait

Request

Address or data

Ack

Ready

Handshaking on an asynchronous bus for an input operation

(e.g., reading from memory).

62

Page 32: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

32

Copyright © 2005 Altera Corporation

Avalon Bus

Examples of transfers

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon bus (NIOS) read, 1 wait

Wait cycle specified by design

64

Page 33: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

33

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon read

slave, 2 wait specified by design

65

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon read slave, additional wait request generated by slave device

66

Page 34: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

34

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon read

slave, 1 set up and 1 wait

67

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon write

slave, 0 wait

68

Page 35: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

35

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon write

slave, 1 wait

69

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon write

slave, wait request generated by slave

70

Page 36: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

36

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off.

and Altera marks in and outside the U.S.

Avalon write

slave, 1 set up, 1 hold, 0 wait

1 su 1 hold

71

Serial bus

Reducing problems

with wires

72

Page 37: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

37

Some Ways of Signal

Back

11 1 1 1 1 10 0+V

-V

+V

-V

1

0

11 1 1 1 1 10 0 Raw data

RZ, bipolar

1 1 1 1 1 1 100 NRZ, Manchester

Ethernet

RS232, PS/2

73

Preferred Signals with Minimized DC Component

A DC component heats channel.

cz: stejnosměrná složka vytápí linku

74

Page 38: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

38

Non-Return-to-Zero Inverted

USB, FDDI (Fiber Distributed Data Interface)

Differential in USB

75

Differential Current Drive

RTERM

High Speed Signaling Current

High Speed Differential Receiver

High Speed Signaling Current

RTERM

RTERM

RTERM

Data+

Data-

High Speed Differential Receiver

Power pair

Differential Signal pair

+5V, GND

76

Page 39: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

39

RS232 Interface Cable

Canon 9M - male Canon 9F - female

77

78

DE2 DE2

DTE - DTE

DCE - DCE DTE - DCE

DE2 PC

TxD

RxD

TxD

RxD

TxD

RxD

RxD

TxD

DTE -Data Terminal Equipment

/ DCE -Data Circuit Equipment Connections

Page 40: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

40

RS-232 Signal Levels

Data signals are shown in picture.

Control signals have opposite polarity.

Note: In devices, higher voltages are usually created by charge pumps

79

Technology for FPGA

What is inside of this miracle?

[ http://radio411.com/ ]

Page 41: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

41

The Memory Element

There are four kinds

SRAM brought to us by Xilinx

EEPROM brought to us by Altera

Flash (EEPROM) brought to us by

Actel

Antifuse brought to us by Actel

R

E

F

A

SPS 82

CMOS SRAM Cell Advantages

Static RAM (SRAM)

+ Unlimited fast read / write programming cycles

bit’ bit

word

R

Page 42: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

42

SPS 83

SRAM Drawbacks

Static (SRAM)

- Data stored only as long as supply is applied

- Large (4-6 transistors/cell)

R

SPS 84

DRAM - Dynamic RAM

Write: 1. Drive bit line

2. Select row

Read: 1. Precharge bit line to Vdd/2

2. Select row

3. Cell and bit line share charges Minute voltage changes on the bit line

4. Sense (fancy sense amp) Can detect changes of ~1 million electrons

5. Write: restore the value

Refresh 1. Just do a dummy read to every cell.

row select

bit

Read is really a read followed by a restoring write

DRAMs are smaller, but they require periodical refreshing

and are suitable only as internal memory in FPGA

[Source: Vijay Narayanan: FPGA technolgies]

R

Page 43: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

43

Volatile Memory Comparison

Larger cell lower density,

higher cost/bit

No refresh required

No dissipation of data

Read non-destructive

Simple read faster access

Standard IC process natural

for integration with logic

Smaller cell higher density,

lower cost/bit

Needs periodic refresh,

and refresh after read

Complex read longer access

time

Special IC process difficult to

integrate with logic circuits

word line

bit line bit line

word line

bit line

The primary difference between different memory types is the bit cell.

addr

data

SRAM DRAM

[Source: Vijay Narayanan: FPGA technolgies]

R

EEPROM: FLOTOX transistor

Floating gate

Source

Substrate p

Gate

Drain

n 1 n 1

FLOTOX transistor Fowler-Nordheim

I-V characteristic

20 – 30 nm

10 nm

-10 V

10 V

I

V GD

• Applying high voltage over this insulator allows electron to travel

to/from floating gate via Fowler-Nordheim tunneling.

• Erasing only requires reversing applied voltage.

[Source: Vijay Narayanan: FPGA technolgies]

E

Page 44: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

44

EEPROM Cell

WL

BL

V DD

Access Device

Programming Device

• Removing too much charge from

floating gate results in a depletion

device that cannot be turned off by

the standard word-line signals.

• Extra transistor is required to access

the transistor and FLOTOX

transistor is used only for storage

(open or closed).

• Fabrication of thin oxide is

challenging and costly

• Repeated programming causes shift

in threshold voltages

[Source: Vijay Narayanan: FPGA technolgies]

E

Flash EEPROM

Control gate

erasure

p- substrate

Floating gate

Thin tunneling oxide

n 1 source n 1 drain programming

Extra transistor of EEPROM is eliminated.

F

Page 45: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

45

Basic Operations in a NOR Flash Memory―Write

Always off

- Apply high voltage at gate to write selected device.

- Raises threshold of device. [Source: Vijay Narayanan: FPGA technolgies]

F

Basic Operations in a NOR Flash Memory―Erase

- Erase all cells simultaneously via tunneling by applying high

voltage at source.

[Source: Vijay Narayanan: FPGA technolgies]

F

Page 46: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

46

Basic Operations in a NOR Flash Memory―Read

- Read operation is the same as any NOR ROM structure.

[Source: Vijay Narayanan: FPGA technolgies]

F

Cross-sections of NVM cells

EPROM Flash Courtesy Intel

F

Page 47: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

47

SRAM / Flash F

Structure of AntiFuse

substrát

Kov 1

oxid

amorfní silikon oxid oxid

Kov 2

The drawing is related to the most used technology ViaLink or MicroVia, the both differs only in used metals 1 and 2

A

Page 48: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

48

Metal to metal antifuse moved the antifuse out of silicon

making the part denser and faster

A

Logic Logic

M1 M1

M2 M2

Logic Logic

M1

M2

M1 M1 M1

M2

M1 M1

M2

M3 M3

M2

SRAM SRAM

Antifuse

Antifuse: SX Routing Efficiency

Faster Signal Propagation than SRAM FPGA

Signals travel through more interconnect in SRAM

Interconnect dominates delays in deep sub-micron

Antifuse architecture uses minimum silicon area

SRAM needs silicon overhead for interconnect

A

Page 49: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

49

Antifuse / Flash

45 nm

Vivien, L.: FPGA Technologies

SEE problems SEU: Single Event Upset

- a change of state or transient induced by an energetic particle such as a cosmic ray or proton in a device. This may occur in digital, analog, and optical components or may have effects in surrounding interface circuitry (a subset known as Single Event Transients [SETs]). These are "soft" errors in that a reset or rewriting of the device causes normal device behavior thereafter.

SHE: Single Hard Error - an SEU which causes a permanent change to the operation of a device. An example is a stuck bit in a memory device.

SEE: Single Event Effect - any measurable effect to a circuit due to an ion strike, includes SEU, SHE and others.

from: http://nepp.nasa.gov/

R E F A

Page 50: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

50

Overview of technlogies

Source McCollum, J: Programmable Elements and Their Impact on FPGA Architecture, Performance, and

Radiation Hardness

SRAM Antifuse Flash EEPROM

Cell size 1 1/10 1/7 1/3

Speed (←capacity, resistance) Worse Excedent Worse? Middle

Density of intigration Middle Good Excedent Worse

SEE resistence Bad Excellent Middle Middle

Power consumption Different Excedent Worse Good

Criteria / Type

Reprogramming Yes,

unlimited No Yes,

<100000 Yes,

<1000

R E F A

Good Cost Excedent Worse Worse

100% Testability in production 100% No, 1-3% failures 100%

Protect design level 0,

1 when encrypted

level 3,

(ICs have 2) >4, nearly

impossible

>4, nearly

impossible

Semi-Programmable ASIC

I do not perform a construction,

I do an eternal reconstruction! [www.metropoleparis.com/]

Page 51: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

51

SPS 101

Modern ways of logic designes

ASIC-Application Specific Integrated Circuits

• Programmable Logic Device (PLD)

• Complex PLD (CPLD )

• Field Programmable Gate Arrays

(FPGA)

Graphical State Diagram

Graphical Flowchart

When clock rises

If (s == 0)

then y = (a & b) | c;

else y = c & !(d ^ e);

Textual HDL

Top-level

block-level

schematic

Block-level schematic

Not used

Microprocessor

& RAM

Full Custom

Standard

Logic

TTL

74xx

CMOS

4xxx

ASICs

Gate

Arrays

Standard

Cells

Programmable

PLDs FPGAs CPLDs

Semi-Programmable

>2000 >4000 >=1 > from 20

thousands

to milions rentability

SPS 102

Gate Array

Vd

d G

nd

Horizontal Routing Channel Vertic

al R

outin

g C

hannel

Sea of Gates: Routing Channels removed,

route at higher metal layers

G

Page 52: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

52

SPS 103

Gate Array Example

Vd

d G

nd

A A

B B

Vdd Vdd

Schematic

A

A

B

B

Out

Out

G

SPS 104

Gate-array Layout

Transistors pre-placed, fixed in size

Personalized by metal routing

Fastest to manufacture

Lowest mask cost

Lends itself to automated placement and wiring

G

Page 53: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

53

SPS 105

Gate-array Disadvantages

Non-optimized spacing

Limited transistor sizing options

Density

Performance

Power

Wiring blockages/inefficiencies

Excess circuitry

G

SPS 106

Standard Cell Layout

Routing Channel

Routing Channel Feed-through cell

Note uniform

height

S

Page 54: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

54

SPS 107

Standard Cell Layout

Design partitioned into cells of standard height

Power and Ground (Power grid) wiring preset

Technology provider supplies libraries of pre-designed cell

elements for usage (utilize varying numbers of cells)

Primitives (NAND, NOR, etc.)

Storage Elements (DFF)

Libraries can be tailored to specific applications (e.g., low

power vs. high performance)

Requires full manufacturing sequence

Typically automated place and wiring

S

SPS 108

Standard Cell Disadvantages

Cell height restrictions limits cell library contents

Full set of masks

Longer manufacturing times

S

Page 55: Struktury počítačových systémů - cvut.czdcenet.felk.cvut.cz/edu/fpga/lectures/Eng2015pr12_Inside...Resolution Time t r setup t hold 30 6.1.2016 16 Recovery time t clk-q Number

6.1.2016

55

SPS 109

Macro-cell Layout

Library elements provided by technology

supplier (e.g., foundry)

Elements can be of varying heights and widths

Richer variety of library elements (IP friendly)

S

SPS 110

Macro-cell Disadvantages

Similar to Standard-cell in length of

manufacturing times, mask costs

Placement and wiring more complex

Pre-layout of power grid more difficult, may not

be possible

S