seok-jae , lee vlsi signal processing lab. korea university

25
VLSI Signal Processing Laboratory A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology -Alice Wang & Anantha Chandrakasan- Seok-jae, Lee VLSI Signal Processing Lab. Korea University 1

Upload: nuala

Post on 23-Feb-2016

37 views

Category:

Documents


1 download

DESCRIPTION

A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology - Alice Wang & Anantha Chandrakasan -. Seok-jae , Lee VLSI Signal Processing Lab. Korea University. Why FFT processor?. FFT processor is used for wireless sensor network. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

VLSI Signal ProcessingLaboratory

A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design

Methodology-Alice Wang & Anantha Chandrakasan-

Seok-jae, LeeVLSI Signal Processing Lab.

Korea University

1

Page 2: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

2VLSI Signal Processing

Laboratory

Why FFT processor?

• FFT processor is used for wireless sensor network. FFT has been used in target tracking, localization and radar by analyzing

phase differences form multiple sensors. FFT processor require low power design, chip speed is not critical.

• FFT processor is configured with some multipliers, control logics and SRAM memory parts.

• With various design method for low power consumption -variable bit precision, variable FFT length-, more power saving can be achived.

• Especially, multipliers, control logics and SRAM are implemented using ‘SUBTHRESHOLD’ circuits dissipated extremely low energy.

Page 3: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

3VLSI Signal Processing

Laboratory

Radix-2 Butterfly FFT architecture

Subthreshold circuits are used!!!

Page 4: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

4VLSI Signal Processing

Laboratory

8-b and 16-b Scalable Baugh-Wooley Multiplier

With 8-b precision, MSB parts of two in-puts are processed.

To minimize switching in the LSB adders, LSB inputs are gated.

Page 5: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

5VLSI Signal Processing

Laboratory

Minimum Energy Point Analysis(1)

Þ The power supply starting from large value is dropped, the switching(dynamic) and overall energy reduced. (VDD > Vth)

Page 6: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

6VLSI Signal Processing

Laboratory

Minimum Energy Point Analysis(2)

Þ In subthreshold region, the propagation delay increases exponentially resulting in a increase in leakage energy. (VDD <Vth)

Computation delay!!!

Page 7: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

7VLSI Signal Processing

Laboratory

Minimum Energy Point Analysis(3)

• Case 1 : Processing speed is not important.

ÞThe optimal operating point occurs at the minimum energy point.Þ And circuit operates with corresponding frequency.

Minimum energy point =Optimal operating point(VDD, VTH) = (380mV, 480mV)

Page 8: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

8VLSI Signal Processing

Laboratory

Minimum Energy Point Analysis(4)

• Case 2 : Processing speed is critical.

Þ The given frequency constraints the VDD and VTH to achieve maximum power saving.

Þ One performance contours is tangent to one energy contour.

Optimal operating point contour

Page 9: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

9VLSI Signal Processing

Laboratory

Minimum Energy Point for fixed VTH

• VTH value is fixed as 450mV for implementing FFT processor.Þ VDD value is 400mV for minimizing energy consumption

• Low power FFT processor operates in SUBTHRESHOLD region !!!

Page 10: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

10VLSI Signal Processing

Laboratory

Subthreshold Inverter• Case 1 : Input is logical ‘0’.

Þ In subthreshold region, the leakage current is significant, So minimum WP (WP(min)) exists to pull up output node.

Þ worst case : Fast NMOS & Slow PMOS (FS)

• Case 2 : Input is logical ‘1’.

Þ Minimum sized NMOS pulls down output node to ‘0’. But a large PMOS lead to a large leakage current compared to the drive current if NMOS. So maximum WP (WP(max)) exists to pull down out-put node.

Þ worst case : Slow NMOS & Fast PMOS (FS)

Leakage, IOFF

ION0

ION

Leakage, IOFF

1

Page 11: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

11VLSI Signal Processing

Laboratory

Operating Point for a Subthreshold Inverter

VDD = 195mV, WP = 5.4um (0.18um technology)

Page 12: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

12VLSI Signal Processing

Laboratory

Subthreshold Standard Cell – XOR Case (1)Conventional XOR gate scheme in subthreshold region

In A=1, B=0 case,

Leakage current is large andION/IOFF is small.

So, output node can not be fully pulled up.

Page 13: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

13VLSI Signal Processing

Laboratory

Subthreshold Standard Cell – XOR Case (2)

Because there are two de-vices pulling the output node high and two diveces pulling low,

ION/IOFF is not degraded!!!

A transmission gate XOR in subthreshold region

devices are balanced

Page 14: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

14VLSI Signal Processing

Laboratory

Subthreshold Memory Design

• FFT processor contains eight 128W X 16b RAM blocks and four 256W X 16b blocks.

=> Analyzing the functionality of conventional 6T SRAM in subthreshold.- Bitline cap, bitline leakage, speed, PVT variation…etc..

=> Hierarchical read-bitline is used in the design of data memory and achieves acceptable ION/IOFF in subthresh-old.

Page 15: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

15VLSI Signal Processing

Laboratory

Subthreshold Write Access (1)

• NPD have to be large enough to… voltage at LO does not rise above ΔVLO due to leakage of PPU and BL.

• Worst case : Slow NMOS and Fast PMOS (SF)

Page 16: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

16VLSI Signal Processing

Laboratory

Subthreshold Write Access (2)

• Write ‘Low’ case :=> Determines NPS to pull HI down to ΔVLO , worst : SF

• Write ‘High’ case :ÞDetermines Maximum NPD and NPS. Since NPD and NPS causes voltage divider by its leakage current, so the drive current of PPU used to pull LO up to ΔVHI .

Page 17: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

17VLSI Signal Processing

Laboratory

Sizing analysis on NPD

If VDD decreases,

Cell size increase dramatically!!!

This is optimal point,

but this value can’t sat-isfy both READ and WRITE condition!!!

Page 18: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

18VLSI Signal Processing

Laboratory

A Latch Based Write Sceheme and its analysis

• C2MOS tristate inverters is a more robust design for subthrehold oper-ation.

•The tristate latch memory cells shows functionality at down to 215mV.

Page 19: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

19VLSI Signal Processing

Laboratory

Subthreshold Read Access (1)The conventional 128W single-ended scheme case

• During precharge phase, Wpre is on and Bit line (RBL) is charged to VDD.

•But, since the charge stored bitline leaks away through all of the pull down device, Wpre is sized to offset the maximum leakage current through the pull down devices.

Page 20: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

20VLSI Signal Processing

Laboratory

Subthreshold Read Access (2)

• In worst case, M0 = 0 and M1~M127 =1, the bit line leakage are maximized.

• But, in this case, when RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’.

0

1

1

1

1

Page 21: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

21VLSI Signal Processing

Laboratory

Subthreshold Read Access (3)0

1

1

1

1

• In worst case, M0 = 0 and M1~M127 =1, the tristate-based read access also suffer from bitline leakage effects.

•RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’.

The tristate-based scheme case

Page 22: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

22VLSI Signal Processing

Laboratory

Subthreshold Read Access (4)Proposed hierarhical-read-bitline scheme case

MUX withbalanced circuit

Latency!!!

Need a decoder!!!

Proposed SRAM scheme has some area, timing overhead but achieves extremely low energy dissipation.

Page 23: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

23VLSI Signal Processing

Laboratory

Results – Energy Dissipation as a function of VDD

• The optimal operating point for minimal energy dissipation is at VDD = 350mV

• In simulation result, VDD = 400mV.

Page 24: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

24VLSI Signal Processing

Laboratory

Results – Energy of 8-b and 16-b Processing

Page 25: Seok-jae , Lee VLSI Signal Processing Lab. Korea University

25VLSI Signal Processing

Laboratory

Summary

specifications values

Technology 0.18um CMOS with six metal layer

Area 2.6 X 2.1 mm2

FFT length 128, 256, 512, 1024

Bit precision 8bit and 16bit precision

Voltage supply 180~900mV

Clock frequency 164Hz ~ 6MHz

Power consump-tion

90nW (VDD=180mV)600nW (VDD = 350mV, frequency =

10kHz)