powerpoint presentationssal.kaist.ac.kr/~kyung/lecture/ee877/200… · ppt file · web view ·...
TRANSCRIPT
-
Various Low-Power SoC Design Techniques
Chong-Min Kyung
KAIST
-
Contents
IntroductionPower Management
using Voltage Island Technique
Energy (Power) Management Approach by ARMLow Power Design Example with Samsung AP based on ARM 920TIBM Low Power Design using PowerPCConclusions
-
Why Low Power?
Limited Battery Capacity (Mobile Devices)For Minimal Heat Dissipation (Heat Sink, Cooler, System Size/Weight/Cost)For Chip/System ReliabilitySave Energy; its limited after all!
-
Power vs. Energy
Power-Critical Applications ; Heat Dissipation RequirementPower/Ground Metal Line WidthPower/Ground Bounce due to IR dropEnergy-Critical Applications ;Battery LifetimeHeat Dissipation Requirement
-
Applications for Low Power Technology
Medical ; Implantable hearing-aid, cardiac pacemakerMobile Devices ; cellular phoneMilitary Devices ;Hard-to-access points ; SpaceToo-many-to-access points ; Sensors/Actuators in Ubiquitous World
-
Power Management
using Voltage Island Technique
-
Typical Power Optimization Procedure
Applications
H/W Description and Synthesis
Standard Cell/Wire
InitialLayout
Functional Partitioning
Cell/Interconnect Delay and Power Modeling
Vdd, Vt, Wg, Wint Optimization
Technology Files
Parasitic(Resistance, Capacitance)
Interconnects from layout
Constraints(Delay, Power, Area, Noise)
Switching Activity
Gate-Level PowerOptimization
Power optimized Net List
Parameterized Cell/Wire Design
Place/Route and Layout
CustomizedLayout
Verification for Min-Power,Delay, Area, Noise
Optimized Vdd, Vt, Wg, Wint
N
Y
Place/Route and Layout
-
Power Challenge
Source from Bergamaschi
-
Low Power Levers
Structural Techniques
Voltage Islands
Multi-threshold devices
Multi-oxide devices
Minimize capacitance by custom design
Power efficient circuits
Parallelism in micro-architecture
Dynamic Techniques Clock gating Data gating Power gating Variable frequency Variable voltage supply Variable device threshold
-
Standby Mode Leakage Suppression
Disconnect inactive logic from supply in standby mode
Multi-threshold use higher Vt header/footer suppresses logic leakage gate & sub-threshold
Multi-oxide Use thick oxide header/footer suppresses gate leakage
Header/footer gate voltage Overdrive: increase freq. under-drive: reduces leakage
Header/footer well bias Forward bias : increase freq. Reverse bias : reduce leakage
Voltage Islands
-
Standby Power Reduction Mechanism
On-chip supervisor manages standby power Clock gating Functional clock gating (fine clock control) Voltage scaling, shutdown SOC latch save/restore Timeout and interrupt driven
Suspend
Ctrl
Logic
RTC
System Clks
Freeze
SoC Logic
LSSD Latches
Scan Ctrl Logic
Reset Logic
IIC
Ctrl
PG
Wake
Reset
3
Irq
Clk
I/O Freeze
Scalable VDD Domain
3.3V
I/O
Serial
NVRAM
Clk
Data
DC/DC Supplies
Select
Shutdown
1.0-1.8V
Scan
Chains
Battery
Backed Domain
-
Voltage Island Concept
Trade off power for delay by running
functional blocks at different voltages
Can use mix of Low and High Vt to
balance performance and leakage
Switch off inactive blocks to reduce
leakage power
Requires IP standards for power management, clock gating, etc.
Delay vs. Voltage
30
25
20
15
10
5
0
Ddelay (ps)
0.7 0.8 0 .9 1.0 1.1 1.2 1.3
Voltage (Vdd)
Std. Vt Low Vt
E.g.: Telecom ASIC with 1.0/1.2 V islands saved :
16 % active power
50 % standby power
Power Management Unit
SWITCH
SWITCH
Logic
Low VT
Logic
Vddo
Vdd1
Vdd2
IP1
IP2
Source from Bergamaschi
-
Power Management Unit
Bus Interfaces
ReconfigurableRegister Units
Power Management
State Machine
Timer / Counter
ControlPerformance Unit
Clock Control
Unit
MonitorUnit
Power ControlUnit
DC/DC Converter
Well-bias generator
Clock generator
Clock & Power-Gating
Device Performance Monitor
Thermal Monitor
IP Core Interfaces
-
One clock & One signaling voltage Some approaches : Temporarily scaling V & F to for comm. Separate different voltages with bridges
Busses with Different Voltages
Hot Bus
Cold Bus
Cool Bus
bridge
bridge
-
Power Management
I/Os, VReg, Gnd
Memory Arrays
Vdd 4
High Vt device arrays
Optimized for low active
power
Memory Arrays
Vdd 3
Low Vt device arrays
Optimized for low active
power
Microcontroller
Vdd 2
DSP
Vdd 2
ROM
Vdd 1
Monitor Logic Vdd 4
ROM
Vdd1
RLM 1
RLM 2
Memory Arrays
Vdd 3
Low Vt device arrays
Optimized for low active
power
I/Os, VReg, Gnd
Analog Vdd 5
RLM 3
Vdd 1
I/Os, VReg, Gnd
I/Os, VReg, Gnd
Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators
-
Functional Partitioning
Identifying functional components with similar inactive periodsAssigning functional components to possible chip-level power sources capable of providing required voltage levelIdentifying the optimal grouping of components, based upon power sequencing (affects static power) and operating voltage (affects active power) that minimizes chip power within the limits (such as peak power) of the SoCIdentifying or creating, and connecting, logic signals that will be used to control power-sequencing circuitry or control clock gatesConnecting alternate voltage sources to latches or arrays used to save state across power sequencing
-
Controlling VDD and VTH for low power
Software-hardware cooperation
Technology-circuit cooperation
MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS Multiple : spatial assignment Variable : temporal assignment
ActiveStand-byMultiple VTHDual-VTHMTCMOSVariable VTHVTH hoppingVTCMOSMultiple VDDDual-VDDBoosted gate MOSVariable VDDVDD hopping
-
Dynamic power reduction
Controller
Software
Hardware
Required
speed
Clock & VDD
Processor
If you dont need to hustle, relax and save power
Through Software-hardware cooperation OS and application programming
-
Suspend
Ctrl
Logic
Voltage Scaling Mechanism
Four power domains On-chip supervisor for
SOC voltage supplies
Level shifting and
latching circuits at
domain interfaces
RTC
Logic
Linear
Regulator
Regulated
1.0V PLL
Supply
Domain
CPU Core
Caches
I/O Intf Logic
Memory Intf
Accelerators
Drivers
Recvrs
Persistent
1.8V
Battery-BackedDomain
Select
Shutdown
3.3V
1.0V-1.8V
Constant
3.3V I/O
Domain
Voltage Scalable
1.0V-1.8V Logic Supply
Domain
Battery
DC/DC Supplies
-
Dynamic Voltage/Frequency Scaling
Freq. changed and Vdd dropped from 1.8V to 1.0V
PLL locked at 533MHz with CPU clock switched from
266MHz to 66MHz to 266MHz
Continues to execute Dhrystone benchmark
-
Low Leakage Cells Standby Power Reduction
Dual-Vt Storage Cells
Low Vt for high performance
High Vt for low leakage
Gated Vdd and DRG
Power Switch
Sub threshold leakage current dominates
-
Energy (Power) Management Approach by ARM
-
Need for Energy Management
Todays mobile consumers want:longer battery life and smaller, lighter productsManufacturers are adding new features and applications to add product appeal:media players (audio, video)gaming video capture
Increasing processing power requirements and longer battery life are conflicting requirements
Battery technology alone offers only incremental improvement over the next several years
-
Higher performance, higher power
Chart1
237.98
3914.63
4523.94
48.7520
16062.5
16062.5
148
15384
180100
190100
190100
260.4195
260.4195
162.5
162.5
345.8160
345.8160
0.18um process
0.13um process
Dhrystone MIPS
Power consumption (mW)
ARM7
ARM9
ARM10, 11
CPU Characterisation
CPU Data 21st November 2002ARM Confidential
CPU7TDMI7TDMI-S7EJ-S720T920T922T940T946E-S966E-S926EJ-S926EJ-S1020E*1022E*1026EJ-S*1026EJ-S*1136J-S*1136JF-S*
Revision-----------------
MIPS/MHz [RVCT 2.0]0.910.911.000.911.171.171.171.171.171.061.061.241.241.351.351.181.18
Default Cache [inst/data]---8k uni16k/16k8k/8k4k/4k8k/8k16k/16k TCM8k/8k16k/16k32k/32k16k/16k16k/16k8k/8k32k/32k32k/32k
Variable cache [yes/no]NoNoNoNoNoNoNoYesYes (TCMs)YesYesNoNoYesYesYesYes
TCMs [yes/no]NoNoNoNoNoNoNoYesYesYesYesNoNoYesYesYesYes
Memory controller [MPU/MMU]---MMUMMUMMUMPUMPU-MMUMMUMMUMMUMMU/MPUMMU/MPUMMUMMU
Bus Interface [ASB/AHB/dual AHB]Yes**Yes**Yes**AHBASB**ASB**ASB**AHBAHB2x AHB2x AHB2x AHB2x AHB2x AHB2x AHB5x AHB5x AHB
ThumbYesYesYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
DSP--Yes----YesYesYesYesYesYesYesYesYesYes
Jazelle--Yes------YesYes--YesYesYesYes
Data for graph
0.18 Mhz10010010075200200185170200200200210210266266
0.18 MIPS919110068.25234234216.45198.9234212212260.4260.4313.88313.88
0.18 mW / Mhz0.230.390.450.650.800.800.800.900.900.950.951.31.3
0.18 mW23394548.75160160148153180190190345.8345.8
0.13 Mhz133133133100250250210250250250325325325325400400
0.13 MIPS121.03121.0313391292.5292.5245.7292.5265265403403438.75438.75472472
0.13 mW / Mhz0.060.110.180.200.250.250.400.400.400.400.60.60.500.500.400.40
0.13 mW7.9814.6323.942062.562.584100100100195195162.5162.5160160
CPU Characterisation
0.18um process
0.13um process
Dhrystone MIPS
Power consumption (mW)
ARM10, 11
ARM9
ARM7
0.18um process
0.13um process
Dhrystone MIPS
Power consumption (mW)
ARM10, 11
ARM9
ARM7
-
Layers of power optimizations
Software (OS, applications)
System Architecture
Micro-architecture
Circuits
Ambient environment
Si conditions
Power delivery
Important to optimize design at each levelARMs partners have widely varying design-time, technology, legacy, cost constraints.IEM: current focus on top two layersWidely applicable dynamic power-optimizationsOptimize for the requirements of the specific workload
-
Conventional Power Management
STANDBY is off but with state retained with clocks stoppedIDLE is a lower power mode with a slow clock runningON state is fully powered up at maximum clock frequency
Conventional power management schemes manage the transitions between defined power states
Despite the changing software workload, system runs at maximum performance while there is any work to be done
-
Optimizing for utilization characteristics
Conventional power management optimizes power consumption when there is nothing to do (sleep modes).IEM optimizes power when work is being done.Only run fast enough to meet deadlines!Running fast and idling wastes power.The active- and sleep-mode techniques are orthogonal.
100%
0%
100%
0%
Utilization
Dynamic Voltage Scaling
Energy used
Energy used
-
Meeting the Performance Requirement
Effective Energy Management requires:
Automatic Performance Prediction technology
Determining the lowest performance level that will get the software workload done just in time
Performance Scaling technology
Delivering just enough performance to meet the current requirementResponding rapidly to changing performance levels
-
Energy Management Control Components
Software componentTo automatically predict future software workloads by interacting with instrumented Operating Systems and application softwareTo determine the software deadlinesTo balance workload and deadlines with performanceHardware componentTo accurately measure the actual system performance To independently manage the transitions of hardware scaling blocks. e.g., clock generators and power controllersTogether these components determine and manage the lowest performance level that gets the work done
-
Adaptive Voltage Scaling (AVS)
AVS is a closed loop control mechanism.Feedback from the PMU indicates the earliest opportunity to change processor frequency based on the voltage levels being output to the SoC.APC monitors the difference between the requested performance level and the actual level achieved.Taking into account variations due to differences in process technology and ambient temperature the system dynamically changes the voltage applied.The lowest energy consumption is achieved OR
a specified performance level can be met.
-
Low Power Design Example with Samsung AP based on ARM 920T
-
Limited Battery Improvement
Power Increase vs. Battery Improvement
Year 2001 2004 2007 2010 2013 2016
Feature Size(nm) 130 90 65 45 32 22
Dynamic Power Reduction(X) 0 1.5 2.5 4.0 7.0 20
Stand-by Power Reduction(X) 2 6 15 30 150 800
[ITRS 2001]
200
400
600
Volumetric Energy Density(Whr/L)
Gravimetric Energy Density(Whr/Kg)
100
200
300
Li-Ion / Polymer
NI-MH
800
400
500
600
700
800
900
Fuel Cell
Cellular Phone
Talk Time : 2Hrs ~ 4Hrs
Standby : about 1 week
Cellular Phone
Talk Time : about 12Hrs
Standby : about 1 month
Only 4~5 X improvement
In Battery lifetime!
-
Problem Statement
Power Analysis on CMOS Inverter
-
Problem Statement
Dynamic Power
Average Short Circuit Current
Sub-threshold Leakage Current
-
Problem Statement
Domination of Leakage Current
-
Active and Leakage Power with CMOS Scaling
As CMOS scales down the following stand-by leakage current rises rapidly.Source to drain leakage (diffusion+tunneling) as Lg scales downGate leakage current (tunneling) as Tox scales downBody to drain leakage current (tunneling) as channel doping scales up
-
Two cases of Leakage Mechanism
-
Gate Leakage Current Reduction with High-K Gate Dielectric
-
Gate Leakage Current Reduction with High-K Gate Dielectric
As Tox scales gate leakage current increases exponentially due to exponential increase of tunneling probability with reduction of physical tunneling distance. Physically thicker gate dielectric allows lower leakage current but lower oxide capacitance reducing on-current Using high k (dielectric constant) material, both thicker physical thickness and higher oxide capacitance can be achieved. Applying high-k gate dielectric, several orders of magnitude lower gate leakage current can be achieved with similar oxide capacitance
-
Power Saving vs. Abstraction Layers
Power Saving v.s. Abstraction Layers
System/Algorithm/Architecture
have a large potential!
Design Time
-
System Level Consideration for Low Power Design
Mobile Devices Behavior according to Time (Operation Time is less than 10%)
Need Various Power Modes In System
-
Power Management : Example
General Clock Gating
Controlling the individual clock source foreach IP block by the on/off controlling of each corresponding clock source enable bit
IDLE
Turn off the clock source to the CPU
STOP
Turn off all of the clock sources includingthe external X-tal and internal PLLs
SLEEP
Turn off all of the clock sources and also the power-supply for the internal-logicexcept for the wake-up logic circuitry
-
Dynamic Voltage Scaling (DVS)
Reduction of Stand-by Power in Leaky ProcessBy Monitoring Data Bus CongestionBy Monitoring/Guessing Performance Needed, for Specific Application
V
time
V
time
V
Power gain V2
DVS
Need to predict
task execution time!
Task
Task
-
Dynamic Voltage Scaling (DVS)
Stretch the execution by lowering the supply voltageQuadratic Power savingNo later than the deadlineProcessors supporting DVSIntel XscaleTransmeta CrusoeDVS AlgorithmsCan be implemented as HW or SWOptimal solution in continuous voltage domain,
but not in discrete voltage domain
-
Voltage Scaling for Low Power
Low Power
Low VDD
Low Speed
Speed Up
Low Vth
P VDD2
I ds (VDD - Vth)1~2
I ds (VDD - Vth)1~2
High Leakage
I leakage e-C x Vth
Leakage Suppression
-
Low-Leakage Solution Technology
-
VTCMOS & MTCMOS
-
MTCMOS : Reduce Stand-by Power with High Speed
With High VTH switch, much lower leakage current flows between Vdd and Vss High VTH MOSFET should have much lower ( >10X) leakage current compared to normal VTH MOSFET
Vdd
Vss
0
0
Vdd
Vss
1
1
0
Without High VTH switch
With High VTH switch (MTCMOS)
High VTH switch
Normal or Low VTH MOSFET
Virtual Ground
-
Multi-Threshold CMOS (MTCMOS)
Mobile ApplicationsMostly in the idle stateSub-threshold leakage CurrentPower Gating Low VTH Transistors for High Performance Logic GatesHigh VTH Transistors for Low Leakage Current Gates
Sleep Control (SC)
Time
Operating
Mode
Current
Cutoff-Switch
(High Vth)
SC
VDD
VSS
VGND
Low Vth
MOS
High Vth
MOS
Logic Component (Low Vth)
-
CCS Sizing
The effect of CCS sizeAs the size decreases, logic performance also decreases.As the size increases, leakage current and chip area also increase.Proper sizing is very important.CCS size should be decided within 2% performance degradation.
Vop = VDD - V
V must be sizedwithin 2% performance degradation.
VDD
GND
Low Vt
Switch
Control
-
Energy Management System Open loop
IEM and IEC components work together to predict lowest acceptable processor performance levelPower Controller, PMU and Clock Generator work together to deliver that lowest performance level
-
Energy Management System Closed loop
APC operates in closed loop control mode using HPM to adapt to actual process and temperaturePowerWise Interface provides fast control of EMU and feedback of status for optimum control
-
MPEG video playback comparison
Classical interval-based algorithms (e.g. LongRun) are too conservative choose higher performance than necessary.
Chart2
00.03040.1720.7915
0.00080.07780.88060.0407
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
IE
Emacs
LongRunVertigo
LongRunVertigo
5067.94%95.57%
6620.48%2.34%
835.95%0%
1005.63%2.10%
0.880.94
Xwelltris
LongRunVertigo
LongRunVertigo
5032.65%81.67%
6649.27%9.38%
835.26%3.66%
10012.82%5.29%
0.630.70
ar2-tmAcrobat Reader
LongRunVertigo
LongRunVertigoLongRunVertigo
503.51%9.62%508.42%40.37%
663.45%1.06%668.37%2.75%
833.52%1.33%837.57%0.98%
10089.53%87.98%10075.63%55.90%
5.996.111.011.12
Netscape NewsNetscape Newsnews2
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
5016.73%44.39%5010.71%60.70%
668.56%3.42%6616.30%14.15%
838.61%5.83%837.99%3.02%
10066.10%66.10%10065.00%22.22%
5.996.112.643.23
konq4Konqueror
LongRunVertigo
LongRunVertigo
5010.09%38.49%
6610.44%25.56%
835.55%14.75%
10073.92%26.65%
4.655.52
FS misc
LongRunVertigo
LongRunVertigo
5033.76%85.54%
6616.05%1.28%
8314.43%0.41%
10035.75%12.77%
0.730.89
IE
0000
0000
Series1
Fraction of time at each performance level
Emacs
Multimedia
Danse De Cable
LongRunVertigo
LongRunVertigoLongRunVertigo
503.93%46.82%505.74%51.17%
6611.93%52.74%6617.04%48.34%
8327.62%0.12%8329.50%0.11%
10056.51%0.32%10047.72%0.37%
Legendary
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
500.00%0.31%500.00%0.08%
662.84%7.76%663.04%7.78%
8317.14%88.11%8317.20%88.06%
10079.07%3.81%10079.15%4.07%
99.39%
Series1
Fraction of time at each performance level
Xwelltris
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Acrobat Reader
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Netscape News
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Konqueror
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Interactive shell commands
0
0
0
0
0
0
0
0
Multimedia
0000
0000
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
Sheet3
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
Chart3
0.05740.17040.2950.4772
0.51170.48340.00110.0037
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
IE
Emacs
LongRunVertigo
LongRunVertigo
5067.94%95.57%
6620.48%2.34%
835.95%0%
1005.63%2.10%
0.880.94
Xwelltris
LongRunVertigo
LongRunVertigo
5032.65%81.67%
6649.27%9.38%
835.26%3.66%
10012.82%5.29%
0.630.70
ar2-tmAcrobat Reader
LongRunVertigo
LongRunVertigoLongRunVertigo
503.51%9.62%508.42%40.37%
663.45%1.06%668.37%2.75%
833.52%1.33%837.57%0.98%
10089.53%87.98%10075.63%55.90%
5.996.111.011.12
Netscape NewsNetscape Newsnews2
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
5016.73%44.39%5010.71%60.70%
668.56%3.42%6616.30%14.15%
838.61%5.83%837.99%3.02%
10066.10%66.10%10065.00%22.22%
5.996.112.643.23
konq4Konqueror
LongRunVertigo
LongRunVertigo
5010.09%38.49%
6610.44%25.56%
835.55%14.75%
10073.92%26.65%
4.655.52
FS misc
LongRunVertigo
LongRunVertigo
5033.76%85.54%
6616.05%1.28%
8314.43%0.41%
10035.75%12.77%
0.730.89
IE
0000
0000
Series1
Fraction of time at each performance level
Emacs
Multimedia
Danse De Cable
LongRunVertigo
LongRunVertigoLongRunVertigo
503.93%46.82%505.74%51.17%
6611.93%52.74%6617.04%48.34%
8327.62%0.12%8329.50%0.11%
10056.51%0.32%10047.72%0.37%
Legendary
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
500.00%0.31%500.00%0.08%
662.84%7.76%663.04%7.78%
8317.14%88.11%8317.20%88.06%
10079.07%3.81%10079.15%4.07%
99.39%
Series1
Fraction of time at each performance level
Xwelltris
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Acrobat Reader
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Netscape News
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Konqueror
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Interactive shell commands
0
0
0
0
0
0
0
0
Multimedia
0000
0000
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
Sheet3
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
0
0
0
0
0
0
0
0
-
Interactive app: Konqueror
Exactly repeating the run of interactive apps is difficult.Our methodology: LongRun in control, estimate what IEM would have done on that same run.
Chart2
0.10090.10440.05550.7392
0.38490.25560.14750.2665
Series1
Fraction of time at each performance level
Konqueror
IE
Emacs
LongRunVertigo
LongRunVertigo
5067.94%95.57%
6620.48%2.34%
835.95%0%
1005.63%2.10%
0.880.94
Xwelltris
LongRunVertigo
LongRunVertigo
5032.65%81.67%
6649.27%9.38%
835.26%3.66%
10012.82%5.29%
0.630.70
ar2-tmAcrobat Reader
LongRunVertigo
LongRunVertigoLongRunVertigo
503.51%9.62%508.42%40.37%
663.45%1.06%668.37%2.75%
833.52%1.33%837.57%0.98%
10089.53%87.98%10075.63%55.90%
5.996.111.011.12
Netscape NewsNetscape Newsnews2
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
5016.73%44.39%5010.71%60.70%
668.56%3.42%6616.30%14.15%
838.61%5.83%837.99%3.02%
10066.10%66.10%10065.00%22.22%
5.996.112.643.23
konq4Konqueror
LongRunVertigo
LongRunVertigo
5010.09%38.49%
6610.44%25.56%
835.55%14.75%
10073.92%26.65%
4.655.52
FS misc
LongRunVertigo
LongRunVertigo
5033.76%85.54%
6616.05%1.28%
8314.43%0.41%
10035.75%12.77%
0.730.89
IE
0000
0000
Series1
Fraction of time at each performance level
Emacs
Multimedia
Danse De Cable
LongRunVertigo
LongRunVertigoLongRunVertigo
503.93%46.82%505.74%51.17%
6611.93%52.74%6617.04%48.34%
8327.62%0.12%8329.50%0.11%
10056.51%0.32%10047.72%0.37%
Legendary
LongRunVertigoLongRunVertigo
LongRunVertigoLongRunVertigo
500.00%0.31%500.00%0.08%
662.84%7.76%663.04%7.78%
8317.14%88.11%8317.20%88.06%
10079.07%3.81%10079.15%4.07%
99.39%
Series1
Fraction of time at each performance level
Xwelltris
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Acrobat Reader
0
0
0
0
0
0
0
0
Series1
Fraction of time at each performance level
Netscape News
Series1
Fraction of time at each performance level
Konqueror
Series1
Fraction of time at each performance level
Interactive shell commands
Multimedia
LongRunLongRunLongRunLongRun
VertigoVertigoVertigoVertigo
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
Sheet3
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Series1
Fraction of time at each performance level
Danse De Cable MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
400 Mhz
500 Mhz
600 Mhz
Series1
Fraction of time at each performance level
Legendary MPEG
LongRun
LongRun
LongRun
LongRun
Vertigo
Vertigo
Vertigo
Vertigo
-
Energy Management in Action
2 seconds
100%
83%
66%
50%
Performance
-
DVS Control Sub-system
Current
Target
PWRREQ
DVC
DVC
Dynamic Voltage
Controller
Voltage vs.
Frequency
Lookup table
IEC
DCG
Dynamic Clock Generator
(SoC specific)
DPC
Dynamic Performance
Controller
DPM
Dynamic Performance
Monitor
CPU
CLKGEN
DPC
CLKGEN
MAXPERF
cpuclk
CLOCK
DATA
...
APB
Configuration Interface
Target
Current
Perf.
Index
Config.
Perf.
Index
DEM
DVS Emulation
Interrupts
(SoC specific)
PMU
-
DVS operation (with MAXPERF Signalling)
-
Prototype IEM test chip
ARM926EJ-S coreMultiple power domainsVoltage and frequency scaling of CPU, caches and TCMsFirst full DVS silicon with National Semiconductor PowerWise technologyNSC Adaptive Power Controller (APC) implemented in FPGAIncludes DVS emulation mode for comparative tests
TSMC 0.13m - CL013G - April Cyber ShuttlePackaged parts 11 August 2003Developed by ARM, Synopsys and National Semiconductor using Synopsys EDA tools
-
Conclusions
Along with Process Technology Scaling, Signal Integrity, SoC Integration and System Verification, Low-Power Design is a critical issue.Low Power Design needs to be approached from System-Level including Software, algorithm to Device/Process Standpoints.
-
Thank you for your kind attention!
-
IBM Low Power Design using PowerPC
-
Platforms for Information Appliances
IBM PowerPC platforms enable highly integrated, power efficient Information Appliance (IA) chips
PowerPC
Platform
SOC
SOC
Custom IA Chips
Application-Specific IA Chips
uP Cores
405/440
IP Cores
CoreConnectTM
Architecture
ASIC
Tools
Low Power
Optimizations
-
Scalable PowerPC 405 CPU Core
CPU Goals
Expanded operating voltage range (0.9V to 1.95V)
Maintain full software and tools with existing compatibility PowerPC 405
Provide a high performance core capable of high efficiency low power operation
CPU Optimizations
Redesigned custom circuits within CPU that were sensitive to low voltage operation
Re-optimize design and timing for extended voltage range
Verification of equivalence
Instruction Unit
Timers
Debug/Trace
I-cache
D-cache
64-bit Processor Local Bus
I-cache Control
D-cache Control
MMU
Power Mgmt.
Execution Unit
Load / Store Pipe
MAC
Branch Unit
Interrupts
PowerPC 405 Core
GPRs
-
Embedded PowerPC Cores
PowerPC 40532-bit data, 32-bit address, MMUSingle-issue, 5-stage pipeline: 1.52 DMIPS / MHz266 400 MHzL1 Cache to 16KB/16KBVoltage-scalable versions (405LP-1, 405LP-2)
PowerPC 44032-bit data, 36-bit address, MMUDual-issue, 7-stage pipeline: 2.0 DMIPS / MHz400 800 MHzL1 Cache 32KB/32KB; L2 256 KB; L3
-
Low Power Optimizations
Active Power Reductions
Voltage Scaling
Frequency Scaling
Flexible Clock Distribution
Clock Gating
Hardware Accelerators
IBM low-power SOC designs include a wide range of optimizations to reduce both active and standby power
Standby Power Reductions
Clock FreezingHibernationCryo Standby
-
Voltage Scaling Benefits
Complementary CMOS scales well over a wide voltage range
Can be used widely over entire chip
Can optimize power/performance (MIPS / W) over a 4X range
Voltage Scaling Challenges
Custom Circuits, PLLs, Analog, and I/O drivers dont voltage scale easily
Avoiding increases in standby power in low active power circuits
( the VTH dilemma )
Reducing operating voltage greatly reduces active power in CMOS
Operating at 1/2 normal Vdd increases delay 2.4-3.2X but reduces power by > 10X
CMOS Ring Oscillator Delay and Power VS VDD
-
IBM Low-Power SOC Designs
Palmtops to Teraflops in a single ISA
Optimized for high-performance handheld applications, e.g., high-end PDA
PowerPC 405LP-1Joint project of IBM Research and IBM MicroelectronicsFirst silicon Oct. 20010.18m processFrequency-scalable, < 66 266 MHzVoltage-scalable, 1.0 1.8 V (0.9 1.65 V)Technology evaluation platform
All power and performance data from 405LP-1 systems
PowerPC 405LP-20.13 m processScalable to 333 MHz @ 1.5 V (est.)Optimized for multimedia processingWell into design
-
405LP-1 System on a Chip
DMA
Controller
PLB-OPB
Bridge
On-chip Peripheral Bus (OPB) 32-bit
Processor Local Bus (PLB)
64-bit
16K
I-Cache
16K
D-Cache
PPC405
CPU Core
Scalable
Low Power
PLL
LCD
Controller
Speech
Accel
CODEC
INTRFC
RTC
Interrupt
Controller
GPIO
SDRAM
Controller
RAM/ROM/
Peripheral
Controller
Code Decompression
PCMCIA/CFII
UART
UART
IIC
Standby Power
Management
Passive
INTRFC
Clock
Power
Management
Crypto
Accel
3.3V I/O Supply
1.0V 1.8V Logic
1.8V Battery-Backed
1.0V Internal Reg.
New Core
Pre-existing Core
Sensor
-
Reducing Standby Power
Cryo mode usesCustomers/designs comfortable with clock-stop standby
Low-latency periodic sleep/wake with minimal standby power
IP cores with hidden state can cause problems for SW-based save/restore
Other methods under reviewVoltage islands and power gatingState-saving latches
-
Standby Power Modes
Cryo mode sequence
Shutdown: Save CPU Core State Flush caches and TLBs Clocks stopped State scanned to internal/external non-volatile storage Power removed from logic
Suspend: Monitor system for wake up condition or RTC timer
Restore: On Wake indicator Restore power to logic State scanned in from non-volatile storage Restore clocks Restore CPU state
Standby power modes enable longer battery life and instant on
System Clock
VDD Logic
State Saved
Restore Time
Power Logic
Freeze Mode
0 Hz
1V
All
Observe Wake-up Condition
(< 1ms)
CMOS Leakage at 1V
Hibernation Mode
0
0
Software State
OS Restore
(100s of mS)
~0
Cryo Mode
0
0
Registers and Software State
Instant On Scan Restore of State
(20 - 200 mS)
~0
-
Dynamic Power Management
System-Wide power management (PM) during application execution
Examples:Peripheral PM, including core clock gatingPM at idle (including low-latency sleep modes)Memory PMDynamic voltage and frequency scalingEnergy policy management
DPM is proposed as an architecture for policy-guided dynamic power management.
-
DPM Motivation
Embedded application requirementsLong battery lifeSystem-specific policy requirementsHighly variable system designsWatch, cell phone, personal server, PDA, tabletSoft real-time (multimedia) requirementsTask-specific policy requirementsGeneral-purpose systems and applicationsNo/minimal application software changes for PMMinimal/variable firmwarePM must be in the OS/applications
-
DPM Motivation
TechnologySOCCPU + peripheral PMComplex clocking architecturesDecoupled CPU/bus frequenciesHeterogeneous processor architecturesExample: 405LP-2 - Asynchronous heterogeneous processing in a common voltage/memory domain New performance and leakage control mechanisms at the circuit level
-
DPM Motivation
LinuxPlatform independence desired
Community acceptance requiredSimplicity ease of maintenanceIntegration with pre-existing facilitiesLinux Device ModelMinimal core kernel changes5 lines of new code in the core kernel
Scalability to server/SMP systems
-
DPM: An Architecture for Policy-Guided PM
Is:A generic software architecture for policy-guided dynamic power managementproposed by IBM and MontaVista software
Flexible enough to implement a number of system-specific DVFS and static PM approaches
Available in an embedded Linux distribution for several embedded processors
Is Not:
PowerPC or Linux specific
A DVFS algorithm
Fully implemented yet
-
DPM Overview
DPM
Sets operating
points changing
power
-
performance
levels
CPU
Memory
Controller
Power
Supplies
Signal
operating/task state
changes
Provide,
manage policies
Policy/Power
Managers
Power
-
aware
Applications
System
Clock
Generation
Operating
System
Device
Drivers
Requirements,
power
-
mgmt.
information
Software
Hardware
-
Dynamic Voltage and Frequency Scaling
-
Idle Scaling Trace (MPEG4)
Core Voltage
Battery Power
ApplicationDefaultIdle ScalingSys. SavingsCore SavingsMPEG4 A/V2.76 W2.63 W4.7 %11.4 %MP31.42 W1.1 W22.5 %47.8 %
-
Load Scaling Trace (MPEG4/spmt)
Core Voltage
Core Voltage
Battery Power
ApplicationDefaultLoad ScalingSystem SavingsMPEG4 A/V2.76 W2.54 W8.0 %MP31.42 W1.03 W27.7 %
-
Application Scaling Trace
More Performance Required
Working Ahead
E
F
D
VideoThread
Task State
-
AS Results
AS achieved close to an ideal LS result with a simple policy manager and a straightforward modification of the application
ApplicationNo DPMDPM: Application ScalingDPM SavingsIdealSavingsMPEG4 A/V2.76 W2.46 W10.8 %10.8 %
-
References
Nowka et al., A 32-bit PowerPC System-on-a-chip With Support for Dynamic Voltage Scaling and Dynamic Frequency Scaling, IEEE Journal of Solid-State Circuits, vol. 37(11), Nov. 2002, pp. 1441-1447.
IBM Austin Research Laboratory (www.research.ibm.com/arl)
Dynamic Power Management for Embedded Systems (Whitepaper)http://www.research.ibm.com/arl/projects/papers/DPM_V1.1.pdf
Linux 2.4 kernel including DPM implementation (Bitkeeper) bk://source.mvista.com/linuxppc_2_4_devel-pm
Dynamic Voltage Scaling
1.8V --> 1.0V at upto 1V/100us
Dynamic Frequency Scaling
266Mhz CPU to 66MHz CPU
400mW
200mW
600mW
2.0V
1.0V
Logic
VDD
I/O Power
--- 266 /133---| -------------------------- 66 /66 --------------------- |-------- 266/133--------
CPU/MEMORY FREQUENCY( MHz)
Total Chip Power
Logic Power
Uninterrupted Operation
Linux 2.3.17 Running
Dhrystone 2.1 code
400 loops per cycle .
0mW
0V
Power consumption for the CPU and logic was reduced by 13X dynamically
under the control of the Linux kernel
( NO PLL Relock and NO stopping of the application )
System
Clock
VDD
Logic
State Saved
Restore Time
Power
Logic
Freeze
Mode
0 Hz
1V
All
Observe
Wake
-
up
Condition
(< 1ms)
CMOS
Leakage
at 1V
Hibernation
Mode
0
0
Software State
OS Restore
(100s of mS)
~0
Cryo Mode
0
0
Registers and
Software State
Instant On
Scan Restore
of State
(20
-
200 mS)
~0
ARM7
ARM9
ARM10, 11
1
10
100
1000
050100150200250300350400450500
Dhrystone MIPS
Power consumption (mW)
0.18um process
0.13um process
Input
switching
to
'
1
'
or
'
0
'
charge
discharge
Input
Cload
V
thn