veriloggen.thread & stream: 最高性能fpgaコンピューティングを...
TRANSCRIPT
Veriloggen.Thread & Stream:FPGA
(@shtaxxx)E-mail: takamaeda_at_ist.hokudai.ac.jp
2017 9 24FPGAX@
BRein Memory[Ando+,VLSI2017]
n A binary/ternary DNN acceleratorl Employing Processing in Memory (PIM) architecture
n 1st test chip with 6 PIMsl Peak 1.38 TOPS @ 400 MHz, Efficiency 2.3 TOPS/W
l First accelerator chip for binary & ternary DNNs
2
n
n:
3
:QER(w) = || w – wq ||2
:L(w) = E(w) + λQER(w)E(w): , λ:
4
σσ
σσ
σσ
σ
σ
σ
σ
σ
σ
σ
σ
σσ
σσ σσ
σσ
σ
σ
σ
σ
σ
σ
σ
σ
σ
σ
σσ
σσ
σσ
σ
σ
σ
σ
σσ
σσ
σσ
σ
σ
σ
σ
σσ
Majority
VoterC
ircuit
Inverter
Inverter
Inverter
Inverter
Inverter
Inverter
σN σS
Random pulses (type1&type2)10bit
N S
Spin
E W hPhase1Phase2Phase3Phase4
Phase1Phase2Phase3Phase4
Coefficient Memory
SpinMemory
~
phasecounter
Spin Unit
phase
Veriloggen:Python RTL
5
Design Generator by Python
from veriloggen import *m = Module('blinkled')clk = m.Input('CLK')led = m.Output('LED', 8)count = m.Reg('count', 32)m.Assign( led(count[31:24]) )m.Always(Posedge(clk)(
count( count + 1 ) )hdl = m.to_verilog()print(hdl)
blinkled
CLK RST
LED count
assignalways
Veriloggen Object
module blinkled (input CLK,output [7:0] LED
);reg [31:0] count;assign LED = count[31:24];always @(posedge CLK) begincount <= count + 1;
endendmodule
Verilog Source Code
module
input
CLK
input
RST
blinkled
Verilog AST
to_verilog()
Verilog AST
Generator
Verilog Code
Generator
Run on Python Interpreter
Verilog HDLPython Verilog HDL
Verilog
6
Module
Python
(m )Reg
"count <= 0"
"count==1023"
If
(m )Always
Module
Thread, Stream, and RTL together:Veriloggen Mixed HW
7
Veriloggen.Core (RTL)
ThreadRAM
ThreadRAM
Stream
Stream Computing
Unit
ThreadPython-to-FSM
StreamControl
Thread Bus + DMA(AXI4 Master/Slave)
AXI4 Interconnect DRAMCPU
RTLControl
IntrinsicRTL
RTLControl DMA Control
DMA Burst Transfer
Thread: Python-to-FSMLED
8
Module I/OVerilog
( : CLK, RST, LED )
ThreadPython
I/O
ThreadFSM
RTLFSM
9
FSM
LED(th_blink FSM )
RTL
10
RTLSeq ( )
RTL( : )
ThreadPython
RTL
Intrinsic function/method:RTL
11
FSM
send(fsm, value)data <= value;enable <= 1;
/* then */enable <= 0;
wait(fsm)if(ready) goto next;
RTL
→Intrinsic function
RTL
Intrinsic function
Mutex
12
lock()Mutex
unlock()Mutex
Mutexobj
DMA
13
I/F
RAM: RAMAXIM: AXI4 IFAXIS: AXI4 IF
Thread DMA(Async )
dma_read: readdma_write: write
Burst ReadBurst Write
: AXI-S
14
AXI4-lite
AXI4-lite
Stream:
15Stream
ram_a ram_b
ram_c
ACC+
Stream
run()join()
16
Stream
DMA
n Thread + Intrinsic: I/Ol RTL
n Thread + Stream: l
l FPGA
n
l
l
17
n GitHubl Veriloggen: https://github.com/PyHDI/veriloggen
l Pyverilog: https://github.com/PyHDI/Pyverilog
l IPgen: https://github.com/PyHDI/ipgen
n PIP Python
18
$ pip install veriloggen$ pip install pyverilog$ pip install ipgen
$ git clone https://github.com/PyHDI/veriloggen.git$ git clone https://github.com/PyHDI/Pyverilog.git$ git clone https://github.com/PyHDI/ipgen.git
Thread, Stream, and RTL together:Veriloggen Mixed HW
19
Veriloggen.Core (RTL)
ThreadRAM
ThreadRAM
Stream
Stream Computing
Unit
ThreadPython-to-FSM
StreamControl
Thread Bus + DMA(AXI4 Master/Slave)
AXI4 Interconnect DRAMCPU
RTLControl
IntrinsicRTL
RTLControl DMA Control
DMA Burst Transfer