seok-jae , lee vlsi signal processing lab. korea university

VLSI Signal ProcessingLaboratory

A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design

Methodology-Alice Wang & Anantha Chandrakasan-

Seok-jae, LeeVLSI Signal Processing Lab.

Korea University

1

2VLSI Signal Processing

Laboratory

Why FFT processor?

• FFT processor is used for wireless sensor network. FFT has been used in target tracking, localization and radar by analyzing

phase differences form multiple sensors. FFT processor require low power design, chip speed is not critical.

• FFT processor is configured with some multipliers, control logics and SRAM memory parts.

• With various design method for low power consumption -variable bit precision, variable FFT length-, more power saving can be achived.

• Especially, multipliers, control logics and SRAM are implemented using ‘SUBTHRESHOLD’ circuits dissipated extremely low energy.


Laboratory

Radix-2 Butterfly FFT architecture

Subthreshold circuits are used!!!


Laboratory

8-b and 16-b Scalable Baugh-Wooley Multiplier

With 8-b precision, MSB parts of two in-puts are processed.

To minimize switching in the LSB adders, LSB inputs are gated.


Laboratory

Minimum Energy Point Analysis(1)

Þ The power supply starting from large value is dropped, the switching(dynamic) and overall energy reduced. (VDD > Vth)


Laboratory


Þ In subthreshold region, the propagation delay increases exponentially resulting in a increase in leakage energy. (VDD <Vth)

Computation delay!!!


Laboratory


• Case 1 : Processing speed is not important.

ÞThe optimal operating point occurs at the minimum energy point.Þ And circuit operates with corresponding frequency.

Minimum energy point =Optimal operating point(VDD, VTH) = (380mV, 480mV)


Laboratory


• Case 2 : Processing speed is critical.

Þ The given frequency constraints the VDD and VTH to achieve maximum power saving.

Þ One performance contours is tangent to one energy contour.

Optimal operating point contour


Laboratory

Minimum Energy Point for fixed VTH

• VTH value is fixed as 450mV for implementing FFT processor.Þ VDD value is 400mV for minimizing energy consumption

• Low power FFT processor operates in SUBTHRESHOLD region !!!


Laboratory

Subthreshold Inverter• Case 1 : Input is logical ‘0’.

Þ In subthreshold region, the leakage current is significant, So minimum WP (WP(min)) exists to pull up output node.

Þ worst case : Fast NMOS & Slow PMOS (FS)

• Case 2 : Input is logical ‘1’.

Þ Minimum sized NMOS pulls down output node to ‘0’. But a large PMOS lead to a large leakage current compared to the drive current if NMOS. So maximum WP (WP(max)) exists to pull down out-put node.

Þ worst case : Slow NMOS & Fast PMOS (FS)

Leakage, IOFF

ION0

ION

Leakage, IOFF

1


Laboratory

Operating Point for a Subthreshold Inverter

VDD = 195mV, WP = 5.4um (0.18um technology)


Laboratory

Subthreshold Standard Cell – XOR Case (1)Conventional XOR gate scheme in subthreshold region

In A=1, B=0 case,

Leakage current is large andION/IOFF is small.

So, output node can not be fully pulled up.


Laboratory

Subthreshold Standard Cell – XOR Case (2)

Because there are two de-vices pulling the output node high and two diveces pulling low,

ION/IOFF is not degraded!!!

A transmission gate XOR in subthreshold region

devices are balanced


Laboratory

Subthreshold Memory Design

• FFT processor contains eight 128W X 16b RAM blocks and four 256W X 16b blocks.

=> Analyzing the functionality of conventional 6T SRAM in subthreshold.- Bitline cap, bitline leakage, speed, PVT variation…etc..

=> Hierarchical read-bitline is used in the design of data memory and achieves acceptable ION/IOFF in subthresh-old.


Laboratory

Subthreshold Write Access (1)

• NPD have to be large enough to… voltage at LO does not rise above ΔVLO due to leakage of PPU and BL.

• Worst case : Slow NMOS and Fast PMOS (SF)


Laboratory

Subthreshold Write Access (2)

• Write ‘Low’ case :=> Determines NPS to pull HI down to ΔVLO , worst : SF

• Write ‘High’ case :ÞDetermines Maximum NPD and NPS. Since NPD and NPS causes voltage divider by its leakage current, so the drive current of PPU used to pull LO up to ΔVHI .


Laboratory

Sizing analysis on NPD

If VDD decreases,

Cell size increase dramatically!!!

This is optimal point,

but this value can’t sat-isfy both READ and WRITE condition!!!


Laboratory

A Latch Based Write Sceheme and its analysis

• C2MOS tristate inverters is a more robust design for subthrehold oper-ation.

•The tristate latch memory cells shows functionality at down to 215mV.


Laboratory

Subthreshold Read Access (1)The conventional 128W single-ended scheme case

• During precharge phase, Wpre is on and Bit line (RBL) is charged to VDD.

•But, since the charge stored bitline leaks away through all of the pull down device, Wpre is sized to offset the maximum leakage current through the pull down devices.


Laboratory

Subthreshold Read Access (2)

• In worst case, M0 = 0 and M1~M127 =1, the bit line leakage are maximized.

• But, in this case, when RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’.

0

1

1

1

1


Laboratory

Subthreshold Read Access (3)0

1

1

1

1

• In worst case, M0 = 0 and M1~M127 =1, the tristate-based read access also suffer from bitline leakage effects.

•RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’.

The tristate-based scheme case


Laboratory

Subthreshold Read Access (4)Proposed hierarhical-read-bitline scheme case

MUX withbalanced circuit

Latency!!!

Need a decoder!!!

Proposed SRAM scheme has some area, timing overhead but achieves extremely low energy dissipation.


Laboratory

Results – Energy Dissipation as a function of VDD

• The optimal operating point for minimal energy dissipation is at VDD = 350mV

• In simulation result, VDD = 400mV.


Laboratory

Results – Energy of 8-b and 16-b Processing


Laboratory

Summary

specifications values

Technology 0.18um CMOS with six metal layer

Area 2.6 X 2.1 mm2

FFT length 128, 256, 512, 1024

Bit precision 8bit and 16bit precision

Voltage supply 180~900mV

Clock frequency 164Hz ~ 6MHz

Power consump-tion

90nW (VDD=180mV)600nW (VDD = 350mV, frequency =

10kHz)

seok-jae , lee vlsi signal processing lab. korea university

Documents