from embedded system to digital ic design 陳培殷教授國立成功大學資訊工程系

From Embedded System to

Digital IC Design

陳培殷教授國立成功大學資訊工程系

PCB

PC: a general-purpose computing system

PC

Pentium

PCBEmbedded System: a special-purpose computing system

Most embedded systems are designed for

1. special purpose (customizing and non-programmable)

2. real time applications 3. stable applications

4. automatic applications

Embedded System (1/2)CPUs

uPUART

MPEG

ROM

RAM

Embedded System (2/2)

Traditional embedded systems uses low-level processors only.

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

Advanced embedded systems

Multi-core

Applications

Information Appliances (IA):

1. Smart phone, VOIP

2. Digital TV, set-top box

3. PlayStation

4. PDA, mp3 player

5. Camera, DV

6. Air-conditioner, microwave oven, refrigerator,

vacuum cleaner, sensor network

7. Motorcycles

8. Car (abs, engine firing, air bag) >100 processors

9. … Ubiquitous computing (many computers for everyone)

Application Everywhere!

Requirements

1. Friendly user interface

2. Multiple-rate matching

3. Short time-to-market

4. Real-time (Speed)

5. Cost

6. Power consummation/dissipation

(cooling strategy and battery how?)

7. Distributed property

Power Consumption

The basic equation to represent the average power consumption in CMOS：

VCfP avg

2

:V

:C

::f

Supply voltage

Capacitance

Clock frequency (*)

Average number of 0-to-1 transitions (*)

Transitions reduction, Sleep mode

Distributed property

Machine #1

OS#1

Machine #2

OS#2

Machine #3

OS#3

Machine #4

OS#4

Network

Distributed applications

Middleware

.

.

.

Design Flow

Specification

System Architecture

Hardware Design Software Design

System Integration

System Verification/Testing

Hardware/software partition is

very difficult !!! (cost time)

Synthesis

Applications

傳輸距離： 100 m功率消耗： 2 W功能：影像傳輸、語音傳輸

螢　幕： 176 x 220 pixel　　　　 65535 色 1.8 吋 TFT

其他

傳輸距離： 100 m功率消耗： 2 W功能：影像傳輸、語音傳輸

螢　幕： 176 x 220 pixel　　　　 65535 色 1.8 吋 TFT

其他

Specifications

Hardware:CPU,

RAM, I/O…

Hardware:CPU,

RAM, I/O…

Software:C,C++

Software:C,C++

System Design

always @(posedge clk) begin if (sel1) begin out=in1; else out=in2; endend

always @(posedge clk) begin if (sel1) begin out=in1; else out=in2; endend

Component Design

Layout

Placement & RoutingFabrication

Marketing

System Development Flow

Testing

IC Industry in Taiwan

晶粒測試及切割

晶圓切割

設計

導線架

測試封裝製造光罩

晶圓

邏輯設計封裝

化學品

成品測試光罩設計

長晶

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

Hardware Design -- Chip (1/4)

ASIC

The basic design flow for digital cell-based ASIC

Describe circuits with hardware description language

(HDL 硬體描述語言 ) VHDL and Verilog

Synthesis ( 合成 ) the circuits ….

application specific integrated circuit (ASIC 晶片 ) such as USB, MPEG, ….

Full custom design vs. semi custom (cell-based) design


always @(IN)begin OUT = (IN[0] | IN[1]) &

(IN[2] | IN[3]);end

OUT

IN[0]IN[1]

IN[2]IN[3]

Example:

….

….


always @(…) if (a==b) if (c==1) d=f; else d=1; else d=0; a

b

c

f

d

fca

b

d

Translate into Boolean Representation

Optimize + MapHDL Source

Target Technology

Synthesis = Translation+Optimization+Mapping

Process of logic synthesis


FPGA or CPLD

Real ASIC chip

less flexible, long design cycle, higher speed,

larger-scale production to reduce price

more flexible, shorter design cycle, lower speed, lower utilization

suitable for smaller production

Standard cellStandard cell

PLDPLD

Fab (TSMC, UMC, ..)

Two implementations :

Xilinx, Altera

Hardware Design -- System

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Input devices: keyboard, touch screen, switch, button, ..

Output devices: monitor, LCD, LED, …

Extended devices: compact flash card (CF), PCMCIA, SD

(for storage, wireless communication, I/O)

Power system:

Transmission Interface: PCI, USB, IEEE 1394, UART, bluetooth…

Bus: AMBA (Advanced Microcontroller Bus Architecture)

Input devices

Output devices

Firmware Design

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Devices drivers for I/O devices, extended devices, transmission interface

Assembly codes and C codes for some dedicated CPUs (ARM, 8086,..)

Architectures and instruction sets of different CPUs, DMA,…

Input devices

Output devices

Software Design

ARM PCI

MPEGUSBFLASH

ROM

RAM

DSPAMBA

ASIC

Input devices

Output devices

Embedded OS: WinCE, Palm OS, uC/OS, Linux, JAVA

Real time OS (time) as small as possible (memory)

Distributed embedded system (+ fault tolerance)

Application Software:

wireless communication, network, multimedia,

health, convenience, Web, ….

Porting a customized embedded system to different

machines is very difficult (need large modification)

FutureChip: tens of millions of transistors or more (.35, .25, .18, .09)

Design shifts from ASIC/board to system

System on a Board(printed circuit board)

System on a chip

uP FPGA

MPEGASIC

ATMROM

ROM

SW SW

SWSWPCB

uP Core SRAM

ROM

ATMMPEG

ROM

FPGA

Glue Logic

A/D Block

PCB

SOCSystem-on-a chip is possible

(the whole system is

built in a single chip)

SOC is industry trend

Example: Mobile Phone

Voice only; 2 processors 4 year product life cycle Short talk time

Yesterday

Voice, data, video, SMS <12 month product life cycle Lower power; longer talk time

Today

• 5~8 Processors

• Memory• Graphics• Bluetooth• GPS• Radio• WLAN

Single Chip

DSP

Radio

FlashMemory

Processor

Source: EI-SONICS

Hardware Algorithm and VLSI Implementation for

1. H.264

2. Color Filter Array

3. Image Scaling

4. Image Noise Suppression

5. Wide Angle Correction

Current Work in My DIC LABCurrent Work in My DIC LAB

Example : Barrel Distortion Correction

Wide-angle cameras are widely used in many imaging applications nowadays. Images captured by wide-angle lens suffer from

barrel distortion.

DIS: Distorted Image Space CIS: Corrected Image Space

Barrel Distortion Correction

Motivation Low Cost, Real Time, Quality As Best As Possible T.H. Ngo and K.V. Asari

A Pipeline Architecture for Real-Time Correction of Barrel Distortion in Wide-Angle Camera Images

(IEEE Trans. Circuits and System for Video Technology, vol. 15, no. 3, March. 2005)

1. CORDIC (Cartesian to Polar) 2. Back Mapping 3. CORDIC (Polar to Cartesian) 4. Linear Interpolation

Cartesian to Polar

CoordinateTransformation

Back Mapping

Polar to Cartesian


u

vLinear

Interpolation ),( vuI

u

v

Back

(u’, v’) ‧

(u, v) ‧

Block diagram of Ngo’s architecture

Cartesian to Polar


Back Mapping

Polar to Cartesian


u

vLinear


u

v

Step 1:

Cartesian to Polar

Coordinate

Step 2:

Back

Mapping

Step 3:

Polar to Cartesian

Coordinate

Step 4:

Linear

Interpolation

Inputs ),( vu , ),( cc vu ),( , Nbb ~1 ),( , ),( cc vu ),( vu

Outputs

22 )()( cc vvuu

)arctan(c

c

uu

vv

N

n

nnb

1

coscuu

sincvv ),( vuI

Proposed Method (1/2) Drawback of Ngo

CORDIC

Goal Simplify Ngo

Modified Back Mapping

Linear Interpolation

Modified Back Mapping

u

vLinear

Interpolation),( vuI

u

v

Cartesian to Polar


Back Mapping

Polar to Cartesian


u

vLinear


u

v

Proposed Method (2/2) Input and output of our circuit

Input and output of Ngo

Step 1:

Cartesian to Polar

Coordinate

Step 2:

Back

Mapping

Step 3:

Polar to Cartesian

Coordinate

Step 4:

Linear

Interpolation

Inputs ),( vu , ),( cc vu ),( , Nbb ~1 ),( , ),( cc vu ),( vu

Outputs

22 )()( cc vvuu

)arctan(c

c

uu

vv

N

n

nnb

1

coscuu

sincvv ),( vuI

Step 1:

Modified Backing Mapping

Step 2:

Linear Interpolation

Inputs ),( vu , ),( cc vu , Ncc ~1 , ),( cc vu ),( vu

Outputs

222 )()( cc vvuu

)...)(1( 63

42

21 cc uucccuu

)...)(1( 63

42

21 cc vvcccvv

),( vuI

Proposed VLSI Architecture (1/7)

Proposed VLSI architecture The first three steps are combined into one

step. Mapping (Modified Back Mapping) Linear Interpolation

We develop a low-cost 21-stage pipelined VLSI architecture

Mapping

u

vLinear

Interpolation),( vuI

u

v

VLSI Architecture (2/8)

2

c

c

vvtuut

2

1

)( 2435 ttt

)( )(

)(

23538

21517

4556

ctctctct

ttt

)( )(

)(

426211

636810

8669

ctctcttt

ttt

)1( 1)(

)(

21714

42

63111013

849412

cttccttt

ctct

141315 ttt

121516 ttt

2161811617 , tttttt

1817 , tvvtuu cc

Start

),( newGet vu

224113 , tttttt

yvyvy

xuxux

,

,

ytxtytxt

1,11,1

2221

2019

yxtytttxttttttItxIytIyxI

262125

2224222123

201920

19

,,

),( ,),( ,),( ),,( Read

26201930

252029

241928

2327

),(),(),(

),(

tttItttxIttytIttyxIt

302932

282731

tttttt

3231),( ttvuI

INPUT:OUTPUT: ),( vuI

),(,,,,),,( 4321 cccc vuccccvu

Mapping

LinearInterpolation

Another pixel?

Stop

no

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S14

S15

S16

S17

84

63

42

2116 1 cccct

Back II

Back I

State flow chart of our two-step procedure

Block diagram of our VLSI architecture It generates the intensity value of one pixel in

CIS at every clock cycle.

The linear interpolation unit The mapping unit The memory bank The controller

MappingUnit

LinearInterpolation

Unit

u

v

u

v

),( yx

),1( yx

)1,( yx

)1,1( yx

RAM2

RAM3

RAM4

RAM1

)1,1( yxI

)1,( yxI

),1( yxI

),( yxI

),( vuI

Memory Bank

Controller


Proposed VLSI Architecture (4/7) 15-stage pipelined architecture of mapping unit

uc

+ c1c3

c4

c2

+1

+

+

+

+ +

u vcvStage-1

Stage-2 t1 t2

t3 t4t5

t7

t7

t8 t6

t9 t10 t11

t13 t14

t15t12

t16

cu cvt18t17

Stage-3

Stage-4

Stage-5

Stage-6

Stage-7

Stage-8

Stage-9

Stage-10

Stage-11

Stage-12

Stage-13

Stage-14

Stage-15vu

from t2from t1

xx yy

v-vcu-uc

u’c+t16*(u-uc) v’c+t16*(v-vc) Cf.

v-vcu-uc

(v-vc)2(u-uc)2


Mapping Hardware Architecture

u’

- reg

- reg

××

reg

reg

+ reg

× reg

× reg

× reg

× reg × reg

×

+ reg

reg

× reg

+ reg

+ reg

+ reg

× reg

× reg

+ reg

+ reg

1

u

u

v

vc

c

1t

2t

3t

4t

5t

6t

7t

8t

9t

10t

11t14t

13t

12t

15t

16t

17t

18t

19t

20t

c

v’c

u’

v’

4c

3c

1c

2c


Linear Interpolation Unit Four neighboring pixels -> One output. 6-stage pipelined architecture of linear interpolation

unit - from state S13 to state S17.

Memory

Stage-16

Stage-17

Stage-18

Stage-20

Stage-21

Stage-19

+ +

+ +

+

t19 t20

t23 t24 t25 t26

t27 t28 t29t30

t31 t32

x

x 1 y 1x yy x y

),( vuI

x1

y

y1

x

),( ,),( ,),( ),,( Read 20192019 tttxytyx

),( ,),( ,),( ),,( 20192019 ttItxIytIyxI ),( yxI ),( 19 ytI ),( 20txI ),( 2019 ttI

S13=Stage-16； S14=Stage-16、 17；

S15=Stage-18、 19； S16=Stage-20；

S17=Stage-21；

State flow


Linear Interpolation Hardware Architecture

+

reg

reg

reg

reg

-+

reg

reg

reg

reg-

x

x'

y

y'

1

1

u'

fraction

integer

v'

fraction

integer

x

y+1DIS

RAM

DISRAM

DISRAM

DISRAM

×

×

×

×

x+1

y+1

x+1

y

x

y

1-x'

y'

x'

y'

x'

1-y'

1-x'

1-y'

reg

reg

reg

reg

I(x, y)

I(x, y+1)

I(x+1, y+1)

I(x+1, y)

×

×

×

×

++

+

reg

reg

reg I25t

26t

24t

23t

reg

reg

reg

reg

29t

30t

28t

27t

32t

31t

33t

Results & Discussions (1/4)

Our circuit requires ： less hardware cost higher clock rate

Feature Total Logic

Elements Flip-Flops

Clock rate

Clock cycle Throughput

Pipeline

Latency

[5] 18,344 ( 75% ) 15355 40 MHz

25 ns

30 M

pixels/s

91 clock

cycles

Proposed 7,163 ( 29 % ) 2811 56.98 MHz

17.55 ns

40 M

pixels/s

21 clock

cycles

Total cell area Gate count clock period clock rate

TSMC

0.18μm 449928.875 45128.272 6.6 ns 150 MHz

Altera EP20K600EBC652-1X FPGA.


DOT (DIS) DOT (CIS)


Grid (DIS) Grid (CIS)


Lab (DIS) Lab (CIS)

Demo

Source ImageSending Image

Barrel Distortion Correction Circuit

FPGA Board

Receiving Image

Result Image

Software ProgramHardware Platform

PC

PC

Hardware Software Co-Simulation/VerificationSMIMS board

USB

from embedded system to digital ic design 陳 培 殷 教授 國立成功大學 資訊工程系

Documents

from embedded system to digital ic design 陳培殷教授國立成功大學資訊工程系