partial runtime reconfiguration for industrial ... · partial runtime reconfiguration for...

25
Dirk Koch ([email protected]) Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 1 Dirk Koch ([email protected]) Partial Runtime Reconfiguration Partial Runtime Reconfiguration for Industrial Applications for Industrial Applications Methods and Tools Methods and Tools

Upload: phungthu

Post on 08-Sep-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 1

Dirk Koch

([email protected])

Partial Runtime Reconfiguration Partial Runtime Reconfiguration for Industrial Applicationsfor Industrial Applications

––Methods and ToolsMethods and Tools

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 2

Why using Runtime Reconfiguration?� Economics of ASIC- and FPGA designs

� FPGA buyers: reduce unit cost

� FPGA vendors: more attractive for high volume designs

Sourc

e: E

lectr

onic

New

s 1

6.0

3.2

006

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 3

Runtime Reconfiguration: Applications

� Applications for non-frequent reconfiguration:

� Rapid prototyping,

� Searching (text, genetic database)

� Mode changing (test equipment, radio)

� Self-repair / self-optimizing

� Applications for high-speed reconfiguration (area saving)

� Networking (exchange packet filters according to traffic)

� Mutually exclusive functionality (MP3-music versus phoning)

� Modulation/frequency/encryption hopping in military radios

� Applications for high-speed reconfiguration (acceleration)

� Crypto (e.g., asym. crypto for key exchange & symmetric for data)

� Sorting (data-base acceleration): optimize individual sort steps

� Possibly video processing (en-/decoding, object classification)

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 4

Applications: Area Saving� Networking

example:

Adapt to different

protocols over time

sourc

e:w

ww

.caid

a.o

rg

dispatcher config.

VoIP

SSH

HTTP

FTP

configurationrepository

FPGA network processor

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 5

Applications: Area Saving� Time-variant resource usage

M0M0 M0

M1 M1

M2 M2

M1

M2

M0

M1

M2

M0

ν

ν

ν ν+1

ν+1

ν+1

ν

ν

ν

ν+1

ν+1

ν+1

ν+2

S0

S1

S2

S0

S2

S0

S1

S2

tt

τ

τ

a) b)

S3

S3

S1

configurationoverhead

S0

S1

S2

� Example SSL encrypted data transfer� Asymmetric key exchange (Montgomery multiplication)� Symmetric data exchange (AES)

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 6

Applications: Area vs. Speed/Power

� Runtime reconfiguration utilizes idle parts of the FPGA

� The size of the idle parts can be tuned with the clock speed

� Simplification of the system level design

� Simple integration of additional functionality

� Optimize clock/power for a particular systems

S0

S1

S2

t

τ

S3

S0

S1

S2

t

τ

S3

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 7

Applications: Acceleration

� May alternatively allow to reduce clock frequency (and power)

� Lower latency might reduce buffer sizes

� May also increase throughput

BA C

S0

S1

S2

A

B

Ct

A

S0

S1

S2

A B C

t

A B C A

BCABCABA C

S0

S1

S2

A

B

Ct

A

S0

S2

A B C

t

A B C A

AC

time

latency S1

� Reduce latency by spending more area on submodules

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 8

Applications: Database Acceleration� In large databases, roughly

30% of CPU- time is spent

on sorting.

� Huge problem classes

(number and size of thekeys can be several GB)

� Different datatypes (int, txt, ...)

1) Build longest sorted subsequences that fit into the FPGA:

2) Merge subsequences in one ormore steps to fully sorted result:

� Reconfigure for exchanging sorters

Assumed HWArchitecture:

2x2

2x4

2x8

2x16

2x32

2x64

2x128

distributed 2BRAM 2BRAM 2BRAM

2x256

2BRAM

2x512

4BRAMdistributeddistributed

2x1024

6BRAM

2x2048

12BRAM

2x4096

24BRAMdistributed

memory

memory

load unit

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 9

� More than 5x faster when using runtime reconfiguration as compared to static solutions

� Reduced I/O throughput � lower power!

� Less iterations required, because more work is done per step

� Useful for many huge problems (CT-image processing,

matrix multiplication, matrix inversion, …)

Applications: Database Acceleration

2x

2

2x

4

2x

8

2x

16

2x

32

2x

64

2x

128

distributed 2BRAM 2BRAM 2BRAM

2x

256

2BRAM

2x

512

4BRAMdistributeddistributed

2x

1024

6BRAM

2x

2048

12BRAM

2x

4096

24BRAMdistributed

memory

memory

load unit

load unit load unit load unit load unit

memory

round robin memory dispatcher

round robin memory collector

memory

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 10

Partial tuntime reconfigration is brilliant, but not used !?!

m3

m4

m1

cconst

m1

m2 m

3

m4

m2

cconstm

5 m6

M

M M M overhead

internal fragmentation

communication cost c

� Internal fragmentation is dominating the overhead

� Can be optimized with small slots � 2D placement

� 2D enhances BRAM/DSP utilization

Requires adequate communication architectures

The Runtime Reconfiguration Paradox

� Missing: efficient methodologies for integrating partial modules

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 11

OK – Small Slots: But How Small?

Impact of Resource Slot Size and Communication Cost on the Average Module Overhead

Result: Slotsize ~200–300 LUTs or ~25–40 CLBs

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 12

0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

D31...D24 D23...D16 D15...D8 D7...D0

0

0

0

start point & mux select value

used connection

unused connection

m1

Communication Architectures: Buses

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 13

0 1 2 3 0 1 2 3

1 2 3 0 1 2 3

2 3 0 1 2 3

3 0 1 2 3

0

0 1

0 1 2

≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1≥1

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

D31...D24 D23...D16 D15...D8 D7...D0

0

0

0

start point & mux select value

used connection

unused connection

0 1 2 3 4 5 6 7

0

1

2

3

x

y

m2

m1

m3

m4

slot3,6

slot indexing: sloty,x

Communication Architectures: Buses

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 14

static system

static system

video

ReCoBus

out

audioout

Slot 0 Slot 1 Slot 5Slot 4Slot 3Slot 2

videoin

audioin

Communication Architectures: I/O Bars

� Selectable read-modify-write connection

� Ideal for data streaming

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 15

Incoming signals

Outgoing signals

Route through signals

Communication Architectures: I/O Bars

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 16

What else do we need for PR?

� Methods for high-speed reconfiguration

� Overclocking: more than 1 GB/s on Virtex-5

� swap a fully featured Microblaze in less than 100 us

� Bitstream decompression

� Configuration prefetching

� Design tools:

� Analysis and Simulation

� Floorplanning(partitioning between static part and reconfigurable regions)

� Bitstream assembly

Christopher Claus from TU München (to be published at DATE)

Koch et al. “Hardware Decompression Techniques...”

Hauck “Configuration Prefetch for Single Context Reconfigurable Coprocessors”

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 17

Design Tools: PlanAhead

� Xilinx floorplanning tool

� Originally designed for densely packed designs and incremental

design flows

� Proxy logic for fixing routing

between static/reconf. parts

� Resource estimator for fast

initial budgeting

� Simple build-in DRC

� Island style � may result in low resource utilization!

� No strict module encapsulation: static design may route througha reconfigurable region � many limitations!

http://www.xilinx.com/tools/planahead.htm

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 18

module1

bitfilemodule

1

bitfile partial module

1

bitfile

bitstream linking [bitscan]

staticnetlist

module1

templates

place & route static [PAR] build partial module1

initial system

bitfile

build partial module1

repository for the run-time system

ReCoBusI/O bars

staticconstraints

module1

netlist

build static bitstream [bitgen] build module1 bitstream [bitgen]

place&route module1

[PAR]

module1

bitfilemodule

1

bitfile full module

1

bitfile

static systembitfile

budgeting[Xilinx XST]

budgeting[Xilinx XST]

module1

constraints

floorplanning andcommunication synthesis

[ReCoBus-Builder]

partial bitstream extraction

[bitscan]

bitlink module.bit X Y \

static.bit initial.bit

Design Tools: ReCoBus-Builder www.Re Bus.de

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 19

Design Tools: ReCoBus-Builder www.Re Bus.de

� Example system

� PPC

� Ext. SDRAM

� Video I/O

� 96 Slots

� PLB compatible

backplane

� 600 MB/s DMA

� All slots can read

the video stream

� Easy to implement with ReCoBus-Builder & Xilinx ISE / EDK

Re Bus

ReCoBus_TT

ReCoBus_TB

ReCoBus_BT

ReCoBus_BB

IOBar_TT

IOBar _TB

IOBar _BT

IOBar _BB

PPC

static

static

top system

bottom system

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 20

Design Tools: ReCoBus-Builder www.Re Bus.de

� Easy usable

builder for

reconfigurable

systems

� Available on

www.recobus.de

System Specification

(Communication Architecture & Floorplan)

generate

static

system

generate

module

repository

bitlink module.bit -pos X,Y static.bit -outfile initial.bit

Re Bus

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 21

Why using Component-based Design?� Closing the design productivity gap

� Enhance design reuse (e.g., by using standardized interfaces)

Sourc

e: M

ichael F

lynn (

AS

AP

2005)

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 22

Context-Switching on FPGAsSoftware: run in timeFPGA / ASIC: run in space

COSRECOS

� Combine the advantages of both by using run-time reconfiguration:

� High performance

� Enhanced resource efficiency

� Simplified design

(Context Switching Reconfigurable Hardware for Communication Sys(Context Switching Reconfigurable Hardware for Communication Systems)tems)

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 23

Context-Switching on FPGAs

� What is the Context of an FPGA?

2) State of a module

(logic level)• Register snapshot

• RAM blocks

• External state

1) Present FPGA configuration

(technology level)

Access via configuration portAccess via configuration port or

extra logic (e.g., scan-chain)

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 24

Context-Switching on FPGAs

Technology level (FPGA)Log

ic leve

l (m

odule

)

• Module runs forever• Single configuration/ module context

• ASIC-like

(e.g., memory controller)

• Configuration swapping• Run-to-completion model(no module context is considered at start)

(e.g., motion-JPEG)

• Multiple module contexts• on a single configuration

(e.g., multi channel crypto)

• module preemption andresuming

• Configuration swapping

• Transparent (like software)

dynamicstatic

dynamic

static

COSRECOS

� All variants may co-exist in a reconfigurable SoC

Dirk Koch ([email protected])Partial Runtime Reconfiguration for Industrial Applications – Methods and Tools 25

Thanks for your attention

� COSRECOS people:

� Jim Tørresen

� Dirk Koch

� Simen Gimle Hansen

� Alexander Wold

� Christian Beckhoff

� + Students

� Very active research project � follow us on

QuestionsSuggestions Comments

This project is funded by the Research Council of Norway