computer generation of ip cores · 2012-07-06 · © markus püschel, 2011 © markus püschel...

23
© Markus Püschel Computer Science Peter Milder (ECE, Carnegie Mellon) James Hoe (ECE, Carnegie Mellon) Markus Püschel (CS, ETH Zürich) Computer Generation of IP Cores A I n addfxp #(16, 1) add15282(.a(a69), .b(a70), .clk(clk), .q(t45)); addfxp #(16, 1) add15297(.a(a71), .b(a72), .clk(clk), .q(t46)); subfxp #(16, 1) sub15311(.a(a69), .b(a70), .clk(clk), .q(t47)); subfxp #(16, 1) sub15325(.a(a71), .b(a72), .clk(clk), .q(t48));

Upload: others

Post on 15-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Peter Milder (ECE, Carnegie Mellon)

James Hoe (ECE, Carnegie Mellon)

Markus Püschel (CS, ETH Zürich)

Computer Generation of IP Cores

A­ In

addfxp #(16, 1) add15282(.a(a69), .b(a70), .clk(clk), .q(t45)); addfxp #(16, 1) add15297(.a(a71), .b(a72), .clk(clk), .q(t46)); subfxp #(16, 1) sub15311(.a(a69), .b(a70), .clk(clk), .q(t47)); subfxp #(16, 1) sub15325(.a(a71), .b(a72), .clk(clk), .q(t48));

TexPoint fonts used in EMF.AW|¹µ

Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

Page 2: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Software Hardware (FPGA, ASIC)

Libraries IP Cores

C/C++/Fortran Verilog/VHDL

Compilation Synthesis

Performance Area/Performance

Page 3: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Software Hardware (FPGA, ASIC)

Libraries IP Cores

C/C++/Fortran Verilog/VHDL

Compilation Synthesis

Performance Area/Performance

Example: Discrete Fourier transform (DFT), size 64

6.12 Gigasamples/second

area

performance

best

Page 4: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Software Hardware (FPGA, ASIC)

Libraries IP Cores

C/C++/Fortran Verilog/VHDL

Compilation Synthesis

Performance Area/Performance

Example: Discrete Fourier transform (DFT), size 64

area

performance

best Pareto-optimal designs 6.12 Gigasamples/second

Page 5: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Problem: For a given function, how to efficiently obtain the Pareto-optimal designs?

Solution: IP core generator to enumerate design space

DSL to represent algorithms

Rewriting on DSL for architectural decisions

Compiler: DSL to RTL Verilog

Page 6: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Current IP Core Generators (www.spiral.net) DFT, other transforms, sorting

Page 7: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Current IP Core Generators (www.spiral.net) DFT, other transforms, sorting

Page 8: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

DFT and FFT: Example n = 4

Fast Fourier transform (FFT):

Description with matrix algebra (SPL)

FFT data flow graph

DFT:

Page 9: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Algorithms

Each one describes a data flow graph and thus could be directly mapped to hardware. But how to obtain a design space?

Page 10: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Design Space: Example 8 Point FFT

All 8 inputs in parallel

Parallelism allows folding

Page 11: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Design Space: Example 8 Point FFT

Two inputs in parallel, streamed over four cycles

Repeated stages allow folding

Page 12: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Design Space: Example 8 Point FFT

Vertical and horizontal folding: Tradeoff between area cost and performance

How to do formally and systematically?

Page 13: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Parallelism: structurally parallel

streaming: partial reuse

n

mn inputs in one cycle

A n

n A

n A n

. . .

. . .

n

streaming: full reuse

n inputs per cycle A n

m cycles

n A n

n A n

. . .

Im=k­sr(Ik­An)kn inputs per cycle

m/k cycles

streaming width w

Page 14: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

iterative: partial reuse

iterative: full reuse

cascade

Repeated stages:

A n A n

m blocks

A n

1 block reused m times

A n A n

k blocks reused m/k times

depth d

Page 15: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Previous Example

All 8 inputs in parallel

Parallelism allows for streaming reuse (sr)

Page 16: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Vertical Folding By Rewriting

Repeated stages allow for iterative reuse (ir)

Page 17: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Vertical & Horizontal Folding By Rewriting

How to build streaming permutations?

Page 18: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

8 w

ord

s in

pa

ralle

l Parallel permutation Example: 8 points

Just wires

2 words per cycle 4 cycles

Streaming permutation Example: 8 points, 2 points per cycle

PP|{z}

stream(2)

?

Page 19: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

2 words per cycle 4 cycles

Streaming permutation Example: 8 points, 2 points per cycle

Data stream must be buffered (using multiple banks of RAMs)

Need to route data carefully to prevent port conflicts

[J. of the ACM 2009; DATE 2009]

P|{z}stream(2)

Page 20: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

IP Core Generator (Sketch)

Problem specification

Hardware directives

Algorithm Generation

Algorithm Rewriting

RTL Generation

Synthesizable Verilog

[TODAES 2012; DAC 2008]

Page 21: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

FPGA: Area vs. Throughput

Pareto optimal

Page 22: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

FPGA: Area vs. Throughput

Similar results for • other DFT sizes incl. non-two-power • real DFT • 2D DFT • DCT

Page 23: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL

© Markus Püschel, 2011

© Markus Püschel Computer Science

Verilog

Not shown:

• ASIC results, power/perf tradeoff

• IP Generator for sorters

• Finding Pareto points without exhaustive design space enumeration

www.spiral.net