from embedded system to digital ic design 陳 培 殷 教授 國立成功大學 資訊工程系

40
From Embedded System to Digital IC Design 陳 陳 陳 陳陳 陳陳陳陳陳陳 陳陳陳陳陳

Post on 15-Jan-2016

247 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

From Embedded System to

Digital IC Design

陳 培 殷 教授國立成功大學 資訊工程系

Page 2: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

PCB

PC: a general-purpose computing system

PC

Pentium

Page 3: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

PCBEmbedded System: a special-purpose computing system

Most embedded systems are designed for

1. special purpose (customizing and non-programmable)

2. real time applications 3. stable applications

4. automatic applications

Embedded System (1/2)CPUs

Page 4: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

uPUART

MPEG

ROM

RAM

Embedded System (2/2)

Traditional embedded systems uses low-level processors only.

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

Advanced embedded systems

Multi-core

Page 5: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Applications

Information Appliances (IA):

1. Smart phone, VOIP

2. Digital TV, set-top box

3. PlayStation

4. PDA, mp3 player

5. Camera, DV

6. Air-conditioner, microwave oven, refrigerator,

vacuum cleaner, sensor network

7. Motorcycles

8. Car (abs, engine firing, air bag) >100 processors

9. … Ubiquitous computing (many computers for everyone)

Page 6: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Application Everywhere!

Page 7: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Requirements

1. Friendly user interface

2. Multiple-rate matching

3. Short time-to-market

4. Real-time (Speed)

5. Cost

6. Power consummation/dissipation

(cooling strategy and battery how?)

7. Distributed property

Page 8: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Power Consumption

The basic equation to represent the average power consumption in CMOS:

VCfP avg

2

:V

:C

::f

Supply voltage

Capacitance

Clock frequency (*)

Average number of 0-to-1 transitions (*)

Transitions reduction, Sleep mode

Page 9: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Distributed property

Machine #1

OS#1

Machine #2

OS#2

Machine #3

OS#3

Machine #4

OS#4

Network

Distributed applications

Middleware

.

.

.

Page 10: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Design Flow

Specification

System Architecture

Hardware Design Software Design

System Integration

System Verification/Testing

Hardware/software partition is

very difficult !!! (cost time)

Page 11: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Synthesis

Applications

傳輸距離: 100 m功率消耗: 2 W功能:影像傳輸、 語音傳輸

螢 幕: 176 x 220 pixel     65535 色 1.8 吋 TFT

其他

傳輸距離: 100 m功率消耗: 2 W功能:影像傳輸、 語音傳輸

螢 幕: 176 x 220 pixel     65535 色 1.8 吋 TFT

其他

Specifications

Hardware:CPU,

RAM, I/O…

Hardware:CPU,

RAM, I/O…

Software:C,C++

Software:C,C++

System Design

always @(posedge clk) begin if (sel1) begin out=in1; else out=in2; endend

always @(posedge clk) begin if (sel1) begin out=in1; else out=in2; endend

Component Design

Layout

Placement & RoutingFabrication

Marketing

System Development Flow

Testing

Page 12: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

IC Industry in Taiwan

晶粒測試及切割

晶圓切割

設計

導線架

測試 封裝 製造 光罩

晶圓

邏輯設計 封 裝

化學品

成品測試光罩設計

長晶

Page 13: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

Hardware Design -- Chip (1/4)

ASIC

The basic design flow for digital cell-based ASIC

Describe circuits with hardware description language

(HDL 硬體描述語言 ) VHDL and Verilog

Synthesis ( 合成 ) the circuits ….

application specific integrated circuit (ASIC 晶片 ) such as USB, MPEG, ….

Full custom design vs. semi custom (cell-based) design

Page 14: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Hardware Design -- Chip (2/4)

always @(IN)begin OUT = (IN[0] | IN[1]) &

(IN[2] | IN[3]);end

OUT

IN[0]IN[1]

IN[2]IN[3]

Example:

….

….

Page 15: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Hardware Design -- Chip (3/4)

always @(…) if (a==b) if (c==1) d=f; else d=1; else d=0; a

b

c

f

d

fca

b

d

Translate into Boolean Representation

Optimize + MapHDL Source

Target Technology

Synthesis = Translation+Optimization+Mapping

Process of logic synthesis

Page 16: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Hardware Design -- Chip (4/4)

FPGA or CPLD

Real ASIC chip

less flexible, long design cycle, higher speed,

larger-scale production to reduce price

more flexible, shorter design cycle, lower speed, lower utilization

suitable for smaller production

Standard cellStandard cell

PLDPLD

Fab (TSMC, UMC, ..)

Two implementations :

Xilinx, Altera

Page 17: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Hardware Design -- System

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Input devices: keyboard, touch screen, switch, button, ..

Output devices: monitor, LCD, LED, …

Extended devices: compact flash card (CF), PCMCIA, SD

(for storage, wireless communication, I/O)

Power system:

Transmission Interface: PCI, USB, IEEE 1394, UART, bluetooth…

Bus: AMBA (Advanced Microcontroller Bus Architecture)

Input devices

Output devices

Page 18: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Firmware Design

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Devices drivers for I/O devices, extended devices, transmission interface

Assembly codes and C codes for some dedicated CPUs (ARM, 8086,..)

Architectures and instruction sets of different CPUs, DMA,…

Input devices

Output devices

Page 19: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Software Design

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Input devices

Output devices

Embedded OS: WinCE, Palm OS, uC/OS, Linux, JAVA

Real time OS (time) as small as possible (memory)

Distributed embedded system (+ fault tolerance)

Application Software:

wireless communication, network, multimedia,

health, convenience, Web, ….

Porting a customized embedded system to different

machines is very difficult (need large modification)

Page 20: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

FutureChip: tens of millions of transistors or more (.35, .25, .18, .09)

Design shifts from ASIC/board to system

System on a Board(printed circuit board)

System on a chip

uP FPGA

MPEGASIC

ATMROM

ROM

SW SW

SWSWPCB

uP Core SRAM

ROM

ATMMPEG

ROM

FPGA

Glue Logic

A/D Block

PCB

SOCSystem-on-a chip is possible

(the whole system is

built in a single chip)

Page 21: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

SOC is industry trend

Page 22: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Example: Mobile Phone

Voice only; 2 processors 4 year product life cycle Short talk time

Yesterday

Voice, data, video, SMS <12 month product life cycle Lower power; longer talk time

Today

• 5~8 Processors

• Memory• Graphics• Bluetooth• GPS• Radio• WLAN

Single Chip

DSP

Radio

FlashMemory

Processor

Source: EI-SONICS

Page 23: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Hardware Algorithm and VLSI Implementation for

1. H.264

2. Color Filter Array

3. Image Scaling

4. Image Noise Suppression

5. Wide Angle Correction

Current Work in My DIC LABCurrent Work in My DIC LAB

Page 24: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Example : Barrel Distortion Correction

Wide-angle cameras are widely used in many imaging applications nowadays. Images captured by wide-angle lens suffer from

barrel distortion.

DIS: Distorted Image Space CIS: Corrected Image Space

Barrel Distortion Correction

Page 25: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Motivation Low Cost, Real Time, Quality As Best As Possible T.H. Ngo and K.V. Asari

A Pipeline Architecture for Real-Time Correction of Barrel Distortion in Wide-Angle Camera Images

(IEEE Trans. Circuits and System for Video Technology, vol. 15, no. 3, March. 2005)

1. CORDIC (Cartesian to Polar) 2. Back Mapping 3. CORDIC (Polar to Cartesian) 4. Linear Interpolation

Cartesian to Polar

CoordinateTransformation

Back Mapping

Polar to Cartesian

CoordinateTransformation

u

vLinear

Interpolation ),( vuI

u

v

Back

(u’, v’) ‧

(u, v) ‧

Page 26: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Block diagram of Ngo’s architecture

Cartesian to Polar

CoordinateTransformation

Back Mapping

Polar to Cartesian

CoordinateTransformation

u

vLinear

Interpolation ),( vuI

u

v

Step 1:

Cartesian to Polar

Coordinate

Step 2:

Back

Mapping

Step 3:

Polar to Cartesian

Coordinate

Step 4:

Linear

Interpolation

Inputs ),( vu , ),( cc vu ),( , Nbb ~1 ),( , ),( cc vu ),( vu

Outputs

22 )()( cc vvuu

)arctan(c

c

uu

vv

N

n

nnb

1

coscuu

sincvv ),( vuI

Page 27: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed Method (1/2) Drawback of Ngo

CORDIC

Goal Simplify Ngo

Modified Back Mapping

Linear Interpolation

Modified Back Mapping

u

vLinear

Interpolation),( vuI

u

v

Cartesian to Polar

CoordinateTransformation

Back Mapping

Polar to Cartesian

CoordinateTransformation

u

vLinear

Interpolation ),( vuI

u

v

Page 28: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed Method (2/2) Input and output of our circuit

Input and output of Ngo

Step 1:

Cartesian to Polar

Coordinate

Step 2:

Back

Mapping

Step 3:

Polar to Cartesian

Coordinate

Step 4:

Linear

Interpolation

Inputs ),( vu , ),( cc vu ),( , Nbb ~1 ),( , ),( cc vu ),( vu

Outputs

22 )()( cc vvuu

)arctan(c

c

uu

vv

N

n

nnb

1

coscuu

sincvv ),( vuI

Step 1:

Modified Backing Mapping

Step 2:

Linear Interpolation

Inputs ),( vu , ),( cc vu , Ncc ~1 , ),( cc vu ),( vu

Outputs

222 )()( cc vvuu

)...)(1( 63

42

21 cc uucccuu

)...)(1( 63

42

21 cc vvcccvv

),( vuI

Page 29: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed VLSI Architecture (1/7)

Proposed VLSI architecture The first three steps are combined into one

step. Mapping (Modified Back Mapping) Linear Interpolation

We develop a low-cost 21-stage pipelined VLSI architecture

Mapping

u

vLinear

Interpolation),( vuI

u

v

Page 30: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

VLSI Architecture (2/8)

2

c

c

vvtuut

2

1

)( 2435 ttt

)( )(

)(

23538

21517

4556

ctctctct

ttt

)( )(

)(

426211

636810

8669

ctctcttt

ttt

)1( 1)(

)(

21714

42

63111013

849412

cttccttt

ctct

141315 ttt

121516 ttt

2161811617 , tttttt

1817 , tvvtuu cc

Start

),( newGet vu

224113 , tttttt

yvyvy

xuxux

,

,

ytxtytxt

1,11,1

2221

2019

yxtytttxttttttItxIytIyxI

262125

2224222123

201920

19

,,

),( ,),( ,),( ),,( Read

26201930

252029

241928

2327

),(),(),(

),(

tttItttxIttytIttyxIt

302932

282731

tttttt

3231),( ttvuI

INPUT:OUTPUT: ),( vuI

),(,,,,),,( 4321 cccc vuccccvu

Mapping

LinearInterpolation

Another pixel?

Stop

no

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S14

S15

S16

S17

84

63

42

2116 1 cccct

Back II

Back I

State flow chart of our two-step procedure

Page 31: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Block diagram of our VLSI architecture It generates the intensity value of one pixel in

CIS at every clock cycle.

The linear interpolation unit The mapping unit The memory bank The controller

MappingUnit

LinearInterpolation

Unit

u

v

u

v

),( yx

),1( yx

)1,( yx

)1,1( yx

RAM2

RAM3

RAM4

RAM1

)1,1( yxI

)1,( yxI

),1( yxI

),( yxI

),( vuI

Memory Bank

Controller

Proposed VLSI Architecture (3/7)

Page 32: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed VLSI Architecture (4/7) 15-stage pipelined architecture of mapping unit

uc

+ c1c3

c4

c2

+1

+

+

+

+ +

u vcvStage-1

Stage-2 t1 t2

t3 t4t5

t7

t7

t8 t6

t9 t10 t11

t13 t14

t15t12

t16

cu cvt18t17

Stage-3

Stage-4

Stage-5

Stage-6

Stage-7

Stage-8

Stage-9

Stage-10

Stage-11

Stage-12

Stage-13

Stage-14

Stage-15vu

from t2from t1

xx yy

v-vcu-uc

u’c+t16*(u-uc) v’c+t16*(v-vc) Cf.

v-vcu-uc

(v-vc)2(u-uc)2

Page 33: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed VLSI Architecture (5/7)

Mapping Hardware Architecture

u’

- reg

- reg

××

reg

reg

+ reg

× reg

× reg

× reg

× reg × reg

×

+ reg

reg

× reg

+ reg

+ reg

+ reg

× reg

× reg

+ reg

+ reg

1

u

u

v

vc

c

1t

2t

3t

4t

5t

6t

7t

8t

9t

10t

11t14t

13t

12t

15t

16t

17t

18t

19t

20t

c

v’c

u’

v’

4c

3c

1c

2c

Page 34: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed VLSI Architecture (6/7)

Linear Interpolation Unit Four neighboring pixels -> One output. 6-stage pipelined architecture of linear interpolation

unit - from state S13 to state S17.

Memory

Stage-16

Stage-17

Stage-18

Stage-20

Stage-21

Stage-19

+ +

+ +

+

t19 t20

t23 t24 t25 t26

t27 t28 t29t30

t31 t32

x

x 1 y 1x yy x y

),( vuI

x1

y

y1

x

),( ,),( ,),( ),,( Read 20192019 tttxytyx

),( ,),( ,),( ),,( 20192019 ttItxIytIyxI ),( yxI ),( 19 ytI ),( 20txI ),( 2019 ttI

S13=Stage-16; S14=Stage-16、 17;

S15=Stage-18、 19; S16=Stage-20;

S17=Stage-21;

State flow

Page 35: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Proposed VLSI Architecture (7/7)

Linear Interpolation Hardware Architecture

+

reg

reg

reg

reg

-+

reg

reg

reg

reg-

x

x'

y

y'

1

1

u'

fraction

integer

v'

fraction

integer

x

y+1DIS

RAM

DISRAM

DISRAM

DISRAM

×

×

×

×

x+1

y+1

x+1

y

x

y

1-x'

y'

x'

y'

x'

1-y'

1-x'

1-y'

reg

reg

reg

reg

I(x, y)

I(x, y+1)

I(x+1, y+1)

I(x+1, y)

×

×

×

×

++

+

reg

reg

reg I25t

26t

24t

23t

reg

reg

reg

reg

29t

30t

28t

27t

32t

31t

33t

Page 36: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Results & Discussions (1/4)

Our circuit requires : less hardware cost higher clock rate

Feature Total Logic

Elements Flip-Flops

Clock rate

Clock cycle Throughput

Pipeline

Latency

[5] 18,344 ( 75% ) 15355 40 MHz

25 ns

30 M

pixels/s

91 clock

cycles

Proposed 7,163 ( 29 % ) 2811 56.98 MHz

17.55 ns

40 M

pixels/s

21 clock

cycles

Total cell area Gate count clock period clock rate

TSMC

0.18μm 449928.875 45128.272 6.6 ns 150 MHz

Altera EP20K600EBC652-1X FPGA.

Page 37: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Results & Discussions (2/4)

DOT (DIS) DOT (CIS)

Page 38: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Results & Discussions (3/4)

Grid (DIS) Grid (CIS)

Page 39: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Results & Discussions (4/4)

Lab (DIS) Lab (CIS)

Page 40: From Embedded System to Digital IC Design 陳 培 殷 教授 國立成功大學 資訊工程系

Demo

Source ImageSending Image

Barrel Distortion Correction Circuit

FPGA Board

Receiving Image

Result Image

Software ProgramHardware Platform

PC

PC

Hardware Software Co-Simulation/VerificationSMIMS board

USB