skku 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 digital signal processing with fpgas...

55
SKKU 휴휴휴휴휴 © 휴휴휴 2008 1 조 조 조 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 1

조 준 동

2008.1

1

Digital Signal Processing With FPGAs

Paul EkasJean-Charles Bouzigues

Page 2: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 2 2

OptionOption ResourceResource Area UsageArea Usage

1 Logic Multipliers

Logic Elements (Traditional)

500 LEs per 18x18 Multiplier

2 Hard Multipliers DSP Blocks 4 18x18 Multipliers per

DSP Block

3 Soft Multipliers RAM 1 to 2 Embedded Memory Blocks

Multiplier Options In FPGAs Multiplier Options In FPGAs

Page 3: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 3 3

Logic Elements

• Smallest Unit of Logic• Grouped into Logic Array

Blocks (LABs) of Ten LEs• Features

– Four-Input Look-Up Table (LUT)

– Configurable Register– Dynamic Add/Subtract Control– Carry-Select Chain Logic

LE14

4

4

4

4

4

4

4

4

4

Control Signals

LocalInterconnect

LE2

LE3

LE4

LE5

LE6

LE7

LE8

LE9

LE10

LogicElement

Logic ArrayBlock

Page 4: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 4 4

18 Bit x 18 Bit

4 Multiplies

2 Multiplies with Accumulate

1 Sum of 2 Multipliers (Complex Multiply)

1 Sum of 4 Multiplies

9 Bit x 9 Bit

8 Multiplies

2 Multiplies with Accumulate

2 Sum of 2 Multipliers(Complex Multipliers)

2 Sum of 4 Multiplies

+

Op

tio

nal

Pip

elin

ing

Ou

tpu

t R

egis

ter

Un

it

Ou

tpu

t M

UX

144144

36

36

36

36

37

37

38

+ -

+ -

Inp

ut

Reg

iste

r U

nit

36 Bit x 36 Bit

1 Multiply

DSP Block: Optimized Hard MACDSP Block: Optimized Hard MAC

Page 5: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 5 5

Soft Multipliers: Lookup Based Multiplication

Soft Multipliers: Lookup Based Multiplication

• Use Embedded RAM Blocks as Look-Up Tables (LUTs) for Generating Partial Products

• Coefficient or Sum of Coefficients Values Stored in RAM Blocks• MSB Partial Product Shifted & Added to LSB Partial Product

ADDRESS MULT_RESULT

00000 0

00001 C

00010 2*C

00011 3*C

… ….

11111 31*C

32*18M512

C = Coefficient[12:0]

Multiplier Table5

18

Address

Data Output

• Example– Multiplication of 5-Bit

Input with 13-Bit Coefficient

• All 18 Bit Possible Results Stored at 32*18 Look Up Table

Page 6: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 6 6

Altera FPGA Memory Architectures

• Today’s applications need more high performance memory• One size does not fit all • Wide choice of modes and widths

M512 Blocks M4K Blocks M-RAM External Memory Devices DDR SDRAM & SRAM SDR SDRAM QDR & QDRII SRAM ZBT SRAM DDR FCRAM

True Dual Port RAM Embedded Shift Register

Mode 512K bits 300 Mhz Operates Up to 300Mhz Mixed Clock Mode

True Dual Port RAM Embedded Shift

Register Mode Operates Up to

312Mhz Mixed Clock Mode

Rate Changing Embedded Shift

Register Mode Operates Up to

312Mhz Mixed Clock Mode

More Bits For Larger Memory Buffering

More Data Ports for Greater Memory Bandwidth

Page 7: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 7 7

Soft Multiplier: Sum of Multiplications Soft Multiplier: Sum of Multiplications

M51232*18

18

M51232*18

1

18

1935

1 1

4ADDRESSADDRESS MULT_RESULTMULT_RESULT

0000 0

0001 C0

0010 C1

0011 C0+C1

… ….

1111 C0+C1+C2+C3

16-Bit Serial Shift Registers

Sum of Multiplications Table

Output

Input

(Sample 16-Bit, Coefficient 16 Bit)

Example: FIR FilterMemory: 2 M512

++

++

4

16-Bit Serial Shift Registers

Page 8: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 8

조 준 동

2008.1

8

Example Direct Sequence Spread Spectrum (DSSS)

Modem

Page 9: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 9 9

DSSS Modem

• Five Independent Data Channels Spread to 3.84 Mcps• Three-Stage FIR Interpolation-by-32• Root-Raise Cosine Pulse Shaping with 22% Excess Bandwidth• 112 dB SFDR 15.36 MHz Quadrature Carriers• 122.88 MSPS Transmitter Output with 5 MHz Bandwidth & Over 78-dB Out–of-Band Rejecti

on• Automatic Gain Control (AGC) Compensating for Channel Attenuation of up to 30 dB• Costas Loop Carrier Recovery• 4x Oversampling Code Synchronization

DSSSModulator

ChannelModel

DCH0

DCH1

DCH2

DCH3

DCH4

DCH0

DCH1

DCH2

DCH3

DCH4

DSSSDemodulator

Page 10: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 10 10

DSSS Modulator

FIR3 RRC25-Tap FIR

FilterInterpolation x4

Ex BW:22%

NCO FrequencyResolution:

0.03HzSFDR: 112dB

FIR1LPF

2-Channel87-Tap

FIR FilterInterpolation

x2

Length 256Gold CodeSpreader

DCH0

DCH1

DCH2

DCH3

DCH4

PCH

Cch,16,0

Cch,16,1

Cch,16,2

Cch,16,8

Cch,16,9

Cch,16,10

SCH

FIR2LPF

2-Channel47-Tap

FIR FilterInterpolation

x4

FIR3 RRC25-Tap FIR

FilterInterpolation x4

Ex BW:22%

Sin(wn)

Cos(wn)

Carrier PhaseIncrement

K

K

gi

gq

Re[]

Im[]

Page 11: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 11 11

DSSS Demodulator

PeakDetector

NCOFrequencyResolution:

0.03HzSFDR: 112dB

FIRAltera RRC

31-Tap FIR FilterExcess BW: 22%

Fixed Rate

AGC

Free-RunningPhase Increment

FIRAltera RRC

31-Tap FIR FilterExcess BW: 22%

Fixed Rate

CarrierRecovery

Loop

8 Gold CodeCorrelator

4xOversampling

Buffer I-QDerotate

Pilot Monitor

HadamardDespreader

PilotOutput

DataChannels

Output1…5

pn_lock

max_index

8

Page 12: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 12 12

DSSS Modem Resources

Resource Usage Summary

DesignEntity

LogicElements

M512RAM

M4KRAM

MegaRAM

DSP BlockElements

Modulator 9943 1 8 0 12

Demodulator 12196 60 8 1 60

Power Usage Estimates

Power mW

Total Standby Internal Power 75

Total Logic Element Internal Power 283

Total Clocktree Internal Power 175

Total DSP Internal Power 23

Other Internal Power 92

Total Power 505

Page 13: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 13 13

FIR Filter Example* – 16X Cost/Performance Improvement

Device Solution FIR Performance

(MHz)

Device Cost****

Cost perFIR MHz

TI C6713-200 64 cycles** @ 200MHz

3.125 $24.59 $7.87

TI C6416-600 32 cycles** @ 600MHz

18.75 $160 $8.53

Altera 1C3-8 8 cycles*** @ 230MHz 28.75 $14 $0.49

Altera 1C12-8 1 Cycles*** @ 170MHz

170 $84 $0.49

* FIR 128 Tap, 16 bit data, 14 bit coefficients** DSPLib Optimized Assembly Libraries from Texas Instruments*** MegaCore Optimized FIR Compiler from Altera**** Pricing in quantity of 100 at Arrow 6/25/03

* FIR 128 Tap, 16 bit data, 14 bit coefficients** DSPLib Optimized Assembly Libraries from Texas Instruments*** MegaCore Optimized FIR Compiler from Altera**** Pricing in quantity of 100 at Arrow 6/25/03

Page 14: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 14 14

Reconfigurable video processor for SDRAM access optimization

(Henriss, Ernst et al.)

Page 15: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 15 15

Reconfigurable video platform

· SDRAM memory centered design· FPGA based scheduler merges different streams and

random accesses exploitation of SDRAM bank structure

· supports 2 HDTV streams at 1.48 Gbit/s each plus DSP and filter unit access

· reaches 700MByte/s in practical application for 4 Byte SDRAM memory word

· extremly cost efficient design· used in professional video product line

Page 16: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 16 16

Fine-Grained RSOCs: Triscend A7 CSOC

• A7 Family• 32-bit ARM 7 with

8kB Cache• 3200 logic cells m

ax. (40K gates)• Up to 3800 FF’s• Up to 300 Prog. I/

O pins• www.triscend.com

Page 17: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 17 17

Coarse-Grained RSOCsChameleon Structure (2000)

Paul J.M. Havinga, Lodewijk T.smit, Gerard J.M. Smit, Martinus Bos, Paul M.Heysters, www.chameleonsystems.com

• 32-bit ARC control processor• Up to 84 32-bit Datapath Units • DPU=a 32-bit ALU+a 32-bit barrel shif

ter • Up to 24 of 16x24-bit multipliers• Up to 48 of 128x32-bit local memory

modules• Up to 160 Prog. I/O pins• Targeted at 3rd gen. wireless • basestation, wireless local loop, • SW radio, etc.

Design a battery powered personal mobile computing device that has multimedia functionality and can operate in a dynamic environment.

- Do just enough and not too much for a given task (QoS)

Page 18: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 18 18

Field Programmable Function Array

• The FPFA concept has a number of advantage– The FPFA has a highly regular organisation– We use general purpose process core– Its scalability stands in contrast to the dedicated chips de

signed nowadays– The FPFA can do media processing tasks such as compre

ssion/decompression efficiently

Page 19: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 19 19

Field Programmable Function Array

ALU ALU ALU ALU ALU

M M M M M M M M M M Memory

CrossBar

Registers

ALUs

• Processor tiles– Consists of five identical blocks, which share a control unit and a communic

ation unit– An individual block contains an ALU, two memories and four register banks o

f four 20-bit wide register– A crossbar-switch makes flexible routing between the ALUs, registers and m

emories– This structure is convenient for the Fast Fourier Transform(6-input,4-output)

and the Finite impulse response

Page 20: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 20 20

Dedicated Hardware Architecture

Per

form

ance

(M

MA

Cs/

sec)

DSP System Architecture OptionsDSP System Architecture Options

DSP DSP DSP DSP

DSP DSP DSP DSP

DSP DSP DSP DSP

DSP DSP DSP DSP

Processor ArrayStand-Alone Processor

DSP

Processor + Co-Processor

DSP

Page 21: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 21 21

Optional Coprocessor Mappings

ProcessorProcessor

MemoryMemory

FPGAFPGAFPGAFPGA

Processor External to FPGAProcessor External to FPGAProcessor On FPGAProcessor On FPGA

•TI c6x (EMIF)•Mot PPC (MPX)•Mot Starcore (MPX, AHB)•Intel 2850 (PCI Express)•ARM (AHB)•…..

•TI c6x (EMIF)•Mot PPC (MPX)•Mot Starcore (MPX, AHB)•Intel 2850 (PCI Express)•ARM (AHB)•…..

•Nios•ARM (AHB)

•Nios•ARM (AHB)

Page 22: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 22 22

Mapping of DSP Algorithms on the FPFA

DFT

N=8

FFT

N=8

DFT

N=8

DFT

N=8

FFT

N=8

FFT

N=8

FFT

N=8

DFTN=2

DFTN=2

DFTN=2

DFTN=2

• Fast Fourier Transform– FFT recursively divides a DFT into smaller DFTs

+

--

a

b

W

Recursion of a radix 2 FFT with 8 inputs

The radix 2 FFT butterfly

Page 23: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 23 23

OMAPTM(open multimedia application platform)

• OMAP architecture 는 platform 의 전체 clocking 과 idle mode의 전체 control 을 할 수 있는 SW/OS 가 있다 .

• Dual core architecture 는 task 에 대해 가정 적당한 process에게 task 를 할당하는 것이 가능

Page 24: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 24 24

Mapping of DSP Algorithms on the FPFA

1 2 3 4 5O

h4 h3 h2 h1 h0

Cross Bar

Level 2

• Five-tap finite-impulse response filter

Page 25: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 25 25

MorphoSys (1999)

Page 26: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 26 26

Reconfigurable cell

Page 27: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 27 27

RC Array

•Array of reconfigurable cells•64 cells in a 2-D matrix

•SIMD model•Same row(column) share configuration• Each RC operates on different data

Page 28: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 28 28

TinyRISC (Cont’d)

Page 29: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 29 29

Implementation & Performance

•0.35 micron technology•4 metal layers•Operation at 100MHz•170 mm2

Motion Estimation

Block size : 16x16 pixel, Image size : 352x288 pixel

Page 30: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 30 30

Lx de STMicroelectronics

Page 31: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 31 31

DART, Raphael David, IRISA/ENSSAT

With STMicroelectronics, UBO univ.With STMicroelectronics, UBO univ.

• Reconfigurable multigrain= DPR+FPGA

• Reconfiguration Dynamique• Faible Consommation• Distribution hierarchique des r

essources• SCMD (Single Configuration M

ultiple Data)

DARTCluster

11 GOPS/cluster1.6 GMACS/cluster0.64 W @ 11GOPS16 MIPS/mW @ 11GOPS0.18u CMOS

Page 32: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 32 32

Cluster architecture

Configmem.

FPGA

DMA ctrl

Control

DPR1

DPR2

DPR3

DPR4

DPR5

DPR6

Data mem

Segm

ented network

Page 33: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 33 33

DPR architecture

reg1 reg2MUL1 ALU1 MUL2 ALU2

Multibus network

Datamem1

Datamem2

Datamem3

Datamem4

AG1 AG2 AG3 AG4

Loop management

Global bus

Page 34: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 34 34

• Run-time configurable ASIC: DS spreading, Chip shaping (FIR filter), Timing recovery, Antijam, transmission security, Correlator(low precision arithmetic to reduce power consumption)

• Maximize the number of functions performed by the DSP: Data burst, FEC, Interleaving,• Adaptive S.P. Deinterleaver, Adaptive Decoder• SDR 기술에 적용 가능한 분야

Hardware Software-Controlled Hardware Programmable SoftwarePost-Shipping

Programmable Software

Antenna

VCOBaseband B/WOutput Power

Modulator(Switched)Encryption

RF SelectivityIF

Chip-rate processing

ModulationEncryption

Smart AntennaSignal Processing

Source codingIF Selectivity

Power-ManagementSymbol-rate processing

User-interface

Page 35: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 35 35

BB/IF Real/Complex

Digital/Analog

ANTENNA RFChannelSelector/Combiner

BasebandProcessing

DSP

Call/MessageProcessing &

I/O

CommonSystem

Equipment

I/O

MONITOR/CONTROL

Multimedia/WAP

ROUTING

I/O I/O I/O I/O

BBText Flow

Control bits

BBText Flow

Control BitsRFRF

Voice/PSTN

Data/IP

Flow Control

NSS/Network

AIR

I

C

I

C

I

C

I

C

AUX AUX AUX AUX AUX

Ext. Ref

Clock/StobeRef, Power

Remote Control/Display

Local Control

• Typical Signal Processing blocks in software Defined Radio– SDR Forum Recommended

Page 36: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 36 36

• ADC sampling rate• dynamic range (determine precision of arithmetic op

erations)• translation of digital IF to baseband• modulation/demodulation algorithms• error coding/decoding algorithms• synchronization algorithms

Page 37: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 37 37

Soft Radio Research Group

• DARPA’s Adaptive Computing Systems Project• Virginia Tech• University of California at Berkeley• Brigham Young University• Chameleon Systems Inc.• Morphic Inc.• Quicksilver Technology Inc.• Sirius Inc.

Page 38: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 38 38

• low power : low-power DSP and MCU processor in combination with a small, low power programmable logic device (PLD).– Functions needed for GSM Phase 2+ or UMTS termi

nal. – DSP16000 and ARM7 MCU, Xilinx’s CoolRunner PL

D with extreme low power consumption (<0.5mA)

• serve as HW co-processor for MCU, DSP or both.

• reconfigurable coprocessor• SW part designed in Processor Expert™ • Embedded Beans library

Page 39: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 39 39

• Object oriented, component based embedded application CASE development tool

– code portability, component reusability– expert knowledge system assistance.– virtual prototyping– IP sharing by embedded components exchange.

• GSM - UMTS– components (Embedded Beans) as building blocks

• MCU expert knowledge system– calculates overall system timing propagation – automatic connection of peripherals – Verifies the application timing

• Processor Expert™ generates resulting source code (in selected language – typically C, ASM, C++ or VHDL).

Page 40: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 40 40

BRAMs

BRAMs

VersaRing

VersaRing

Ver

saR

ing

Ver

saR

ing

IOB

’s

IOB

’s

IOB’s

IOB’s

DLL DLL

DLLDLL

Control

LUT

Control

LUT

Configurable storageelement

CLBs

Configurable storageelement

StandardArrary of CLBs

LUT :o look up table for logic functionsowide RAM or ROMo shift registerControl :o Combination of both LUTso Arithmetic supporto Carry controlo Route throughConfigurable Storageelement :o clocking modeo polarity asynchronous reset

Xilinx Virtex FPGA : intelligent configurationmechanism for fast and partial

Increasing density and reducing powerIncluded extra functions to support digital signaloperations such as extra arithmetic support andincreased RAMDynamic reconfiguration is also supported.

Block RAM large resource for storage ofapplication data

I n p u t O u t p u tBlocks (IOBs). configurable interfacing

Page 41: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 41 41

Algorithm Definition& Specification

Optimization ofHardware Structure

PerformanceEst.

DSP/MCURequirement

ASIC/FPGA

Verification

Complexity ofReconfiguration

processor technology,such as DSPs, FPGAs,

Complexity & Levels ofReconfigurationComplexity

Software Repositoryand Access Methods

Transparent Reconfiguration Reconfiguration Signalling Verifying the Reconfiguration

TransparentReconfiguration

Selective Redefinitionof Module(s)

Micro and Macro levelProcess Management

Software Repositoryand Access Methods

Page 42: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 42 42

Mode 1

Mode 2

Mode n

RFBB signal

Processing

RFBB signal

Processing

RFBB signal

Processing

RF

RF

RF

Memory forparameter

set

Basebandsignal

processing Pro

gra

mm

able

hig

h p

ow

erB

aseb

an

d s

ign

al p

roce

ssin

g

Fle

xib

le a

nd

ad

apti

ve R

F f

ron

ten

d

Multi-mode terminal with parallel modesMulti-mode terminal with software defined

signal processingFully adaptive software reconfigurable

system

RF BaseBand

수신된 신호를 IF 혹은 Baseband 신호로 변환

변조부, 채널 코덱부, 채널화기, 암호화부,시간/위상 추적부

Page 43: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 43 43

• 다중 대역 안테나• 선형 광대역 RF 부품• 광대역 A/D, D/A 변환기• 고성능 DSP/ 재구성 가능한 로직

Antenna RF ADC DSP

Smart 안테나

고 효율 선형 안테나

광대역, 소형화고 효율, 선형 RF 전력 증폭

기다른 신호와 동일 시간에 간

섭과 잡음이 없는 설계단일 모드와 같은 특성을 내

는 고주파 부품

첫번째 IF 단(아날로그 내림 변환)- ADC- 두번째 IF 단(디지털 내림 변

환)Band pass sigma delta

구조

기저대역부를 SW화 할 수있을 만큼의 성능,

TMS320C62X : 최대 성능1600 MIPS, TMS320C64X :

4800 MIPS

Reconfigurable Logic

FPGA,RC(ReconfigurableComputing) ASIC

Page 44: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 44 44

RFConversion

to IF andA/D

I/Ocontroller

ProcessController

TemporaryStorageBuffer

Output andinterface with

host PC

ProgramMemory

ProgramMemory

Fo

rmat

ion

of

Str

eam

Pa

cket

s/In

terp

reta

tio

n

InterconnectingArray of Processing

Elements

Configurable ASIC FPGA

적절한 수준의 프로그래 밍 능력과 집적도를 제공

할 때 최선의 솔루션 , 낮은 프로그램 능력 집

적도

/ 고속 병렬 선형 신호처리 를 위한 최선의 프로그래머

블 솔루션 , 높은 전력 소비 칩 사이즈

가 큼

DSP

복잡한 분석, 의사 결정을 포함하는 기능에 대한 최선의 프로

그래머블 솔루션ASIC, FPGA에 비해

낮은 성능

Programmability,Level of Integration,

Development/Implementation/Test

Cycle,Performance in required

processing time,Power.

Page 45: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 45 45

Multiplexing &Burst Construction Encription

ChannelCoding

Interleaving

DataProcessing

CRCinsertionModulation

Sequencer

Spreading

Equalization

Rate matching Channelization

Segmentation

RadioResource

Advantage Drawback

Only simple program-Scheduling,

factorization forcommon function

Restrict re-configurabilitywithin macro,

Data path routing-macro function composedof ASIC or FPGA or both, Routing Device-

Sequence

Page 46: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 46 46

Advantage Drawback

Low-complexity ofhardware

Slower reconfiguration process, ifreconfiguration is failed, the

system will not operate-necessaryof default mode

Systematic re-programming of wholebaseband module, new standard is

installed on same hardware

FPGA

MPU

Previous Standard is running

FPGA

FPGA

FPGA

FPGA

MPU

Reconfiuration

FPGA

FPGA

FPGA

FPGA

MPU

Present Standard is running

FPGA

FPGA

FPGA

Page 47: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 47

조 준 동

2008.1

47

Systolic Ring : Scalable Structure Pascal BENOIT

G. Sassatelli – L. Torres – D. Demigny M. Robert – G. Cambon

Page 48: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 48 48

Systolic Ring

• Based on a coarse-grained

configurable PE

• Circular datapaths C: # of layers C = 4 N: # of Dnodes per

layer N = 2 S: # of Rings s = 1

• Control Units (sequencer)

Local Dnode unit Local Ring unit Global unit

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switch

SwitchSw

itch

Switch

layer 1

layer 2

layer 3

layer 4

Dnode Sequencer

Local RingSequencer

Page 49: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 49 49

Remanence

Fe

Fc

FcNcFeN

R PE

..

• NPE: # of processing elements (PE) • Nc: # of PE configurable per cycle• Fe: operating frequency • Fc configuration frequency

• Characterizes the Dynamism• # of cycles to (re)configure the whole architecture• Amount of data to compute between 2 configurations

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0

Sequencer

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0

Sequencer

Page 50: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 50 50

Operative Density

NPE: # of PE

A: Core Area (relative unit ²)

Area can be expressed as a function of NPE

)()(

PE

PEPE

NAN

NOD

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Interconnection

PE PE PE PE PE

instn

Configuration Memory

Processing Elements

Routing

Sequencing Unit

…inst3inst2inst1inst0Sequencer

Page 51: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 51 51

Remanence formalisation

• # of layers : C = 8• # of Dnode per layer : N = 2• 1 Systolic Ring: S = 1

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160 180 # Dnodes

REMANENCE

k = 2k = 4

k = 8

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160 180 # Dnodes

REMANENCE

Switch

Dnode Dnode

Dnode Dnode

Swit

ch

Dnode

Dnode

Switch

Dnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

Switch

Dnode Dnode

Dnode Dnode

Swit

ch

Dnode

Dnode

Switch

Dnode

Dnode

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Switch

Dnode Dnode

layer 1 layer 2

layer 3

layer 4

layer 5layer 6

layer 7

layer 8

k = 1k = 1

k = 2k = 4

k = 8

PEPENkNR .)(

k= C/N

Page 52: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 52 52

Architectural model Characterization

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Global Bus

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

SwitchSwitc

h

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

Switc

h

Switch

SwitchSwitc

h

Switc

h

Switch

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

Switc

h

Switch

SwitchSwitc

h

Switc

h

Switch

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

Switc

h

Switch

SwitchSwitc

h

Switc

h

Switch

Switch

Dnode Dnode

Dnode

Dnode

Dnode

Dnode

Dnode Dnode

Switc

h

Switc

h

Switch

SwitchSwitc

h

Switc

h

Switch

Switch

Global Bus

Global Sequencer

Local RingSequencer

Local RingSequencer

Local RingSequencer

Local RingSequencer

# of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)4 Systolic Ring (S = 4)

Control Units• Local Dnode unit• Local Ring unit• Global unit

•www.qstech.com

Page 53: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 53 53

Best OD and remanence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Op

erat

ive

Den

sity

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anen

ce

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Op

erat

ive

Den

sity

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anen

ce

Design SpaceWorst interconnect resources and processing power

Page 54: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 54 54

Worst OD and remanence

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Op

erat

ive

Den

sity

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anen

ce

0,000

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0 20 40 60 80 100 120 140

# Dnodes

Op

erat

ive

Den

sity

S=1

S=2

S=4

S=8

0

5

10

15

20

Remanence

Rem

anen

ce

Design SpaceBest interconnect resources and processing

power

Page 55: SKKU 휴대폰학과 © 조준동 2008 1 조 준 동 2008.1 1 Digital Signal Processing With FPGAs Paul Ekas Jean-Charles Bouzigues

SKKU 휴대폰학과 © 조준동 2008 55 55

Comparisons of RA

1. Only 1 cycle to (re)configure the DSP

2. Few cycles to (re)configure coarse grain RA (8)

3. Many cycles to (re)configure fine grain RA

NPE Nc RName Type F (MHz)

2304 0.14 16457

24 4 6

24 4 6

128 16 8

ARDOISE

Systolic Ring

DART

MorphoSys

TMS320C62

Fine Grain RA

Coarse Grain RA

Coarse Grain RA

Coarse Grain RA

DSP VLIW 8 8

33

200

130

100

300 1

FcNc

FeNR PE

.

.

Pascal BENOIT