powerpoint presentationssal.kaist.ac.kr/~kyung/lecture/ee877/200… · ppt file · web view ·...

82
Various Low-Power SoC Design Techniques Chong-Min Kyung KAIST

Upload: truongkiet

Post on 11-Apr-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

  • Various Low-Power SoC Design Techniques

    Chong-Min Kyung

    KAIST

  • Contents

    IntroductionPower Management

    using Voltage Island Technique

    Energy (Power) Management Approach by ARMLow Power Design Example with Samsung AP based on ARM 920TIBM Low Power Design using PowerPCConclusions

  • Why Low Power?

    Limited Battery Capacity (Mobile Devices)For Minimal Heat Dissipation (Heat Sink, Cooler, System Size/Weight/Cost)For Chip/System ReliabilitySave Energy; its limited after all!

  • Power vs. Energy

    Power-Critical Applications ; Heat Dissipation RequirementPower/Ground Metal Line WidthPower/Ground Bounce due to IR dropEnergy-Critical Applications ;Battery LifetimeHeat Dissipation Requirement

  • Applications for Low Power Technology

    Medical ; Implantable hearing-aid, cardiac pacemakerMobile Devices ; cellular phoneMilitary Devices ;Hard-to-access points ; SpaceToo-many-to-access points ; Sensors/Actuators in Ubiquitous World

  • Power Management

    using Voltage Island Technique

  • Typical Power Optimization Procedure

    Applications

    H/W Description and Synthesis

    Standard Cell/Wire

    InitialLayout

    Functional Partitioning

    Cell/Interconnect Delay and Power Modeling

    Vdd, Vt, Wg, Wint Optimization

    Technology Files

    Parasitic(Resistance, Capacitance)

    Interconnects from layout

    Constraints(Delay, Power, Area, Noise)

    Switching Activity

    Gate-Level PowerOptimization

    Power optimized Net List

    Parameterized Cell/Wire Design

    Place/Route and Layout

    CustomizedLayout

    Verification for Min-Power,Delay, Area, Noise

    Optimized Vdd, Vt, Wg, Wint

    N

    Y

    Place/Route and Layout

  • Power Challenge

    Source from Bergamaschi

  • Low Power Levers

    Structural Techniques

    Voltage Islands

    Multi-threshold devices

    Multi-oxide devices

    Minimize capacitance by custom design

    Power efficient circuits

    Parallelism in micro-architecture

    Dynamic Techniques Clock gating Data gating Power gating Variable frequency Variable voltage supply Variable device threshold

  • Standby Mode Leakage Suppression

    Disconnect inactive logic from supply in standby mode

    Multi-threshold use higher Vt header/footer suppresses logic leakage gate & sub-threshold

    Multi-oxide Use thick oxide header/footer suppresses gate leakage

    Header/footer gate voltage Overdrive: increase freq. under-drive: reduces leakage

    Header/footer well bias Forward bias : increase freq. Reverse bias : reduce leakage

    Voltage Islands

  • Standby Power Reduction Mechanism

    On-chip supervisor manages standby power Clock gating Functional clock gating (fine clock control) Voltage scaling, shutdown SOC latch save/restore Timeout and interrupt driven

    Suspend

    Ctrl

    Logic

    RTC

    System Clks

    Freeze

    SoC Logic

    LSSD Latches

    Scan Ctrl Logic

    Reset Logic

    IIC

    Ctrl

    PG

    Wake

    Reset

    3

    Irq

    Clk

    I/O Freeze

    Scalable VDD Domain

    3.3V

    I/O

    Serial

    NVRAM

    Clk

    Data

    DC/DC Supplies

    Select

    Shutdown

    1.0-1.8V

    Scan

    Chains

    Battery

    Backed Domain

  • Voltage Island Concept

    Trade off power for delay by running

    functional blocks at different voltages

    Can use mix of Low and High Vt to

    balance performance and leakage

    Switch off inactive blocks to reduce

    leakage power

    Requires IP standards for power management, clock gating, etc.

    Delay vs. Voltage

    30

    25

    20

    15

    10

    5

    0

    Ddelay (ps)

    0.7 0.8 0 .9 1.0 1.1 1.2 1.3

    Voltage (Vdd)

    Std. Vt Low Vt

    E.g.: Telecom ASIC with 1.0/1.2 V islands saved :

    16 % active power

    50 % standby power

    Power Management Unit

    SWITCH

    SWITCH

    Logic

    Low VT

    Logic

    Vddo

    Vdd1

    Vdd2

    IP1

    IP2

    Source from Bergamaschi

  • Power Management Unit

    Bus Interfaces

    ReconfigurableRegister Units

    Power Management

    State Machine

    Timer / Counter

    ControlPerformance Unit

    Clock Control

    Unit

    MonitorUnit

    Power ControlUnit

    DC/DC Converter

    Well-bias generator

    Clock generator

    Clock & Power-Gating

    Device Performance Monitor

    Thermal Monitor

    IP Core Interfaces

  • One clock & One signaling voltage Some approaches : Temporarily scaling V & F to for comm. Separate different voltages with bridges

    Busses with Different Voltages

    Hot Bus

    Cold Bus

    Cool Bus

    bridge

    bridge

  • Power Management

    I/Os, VReg, Gnd

    Memory Arrays

    Vdd 4

    High Vt device arrays

    Optimized for low active

    power

    Memory Arrays

    Vdd 3

    Low Vt device arrays

    Optimized for low active

    power

    Microcontroller

    Vdd 2

    DSP

    Vdd 2

    ROM

    Vdd 1

    Monitor Logic Vdd 4

    ROM

    Vdd1

    RLM 1

    RLM 2

    Memory Arrays

    Vdd 3

    Low Vt device arrays

    Optimized for low active

    power

    I/Os, VReg, Gnd

    Analog Vdd 5

    RLM 3

    Vdd 1

    I/Os, VReg, Gnd

    I/Os, VReg, Gnd

    Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators

  • Functional Partitioning

    Identifying functional components with similar inactive periodsAssigning functional components to possible chip-level power sources capable of providing required voltage levelIdentifying the optimal grouping of components, based upon power sequencing (affects static power) and operating voltage (affects active power) that minimizes chip power within the limits (such as peak power) of the SoCIdentifying or creating, and connecting, logic signals that will be used to control power-sequencing circuitry or control clock gatesConnecting alternate voltage sources to latches or arrays used to save state across power sequencing

  • Controlling VDD and VTH for low power

    Software-hardware cooperation

    Technology-circuit cooperation

    MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS Multiple : spatial assignment Variable : temporal assignment

    ActiveStand-byMultiple VTHDual-VTHMTCMOSVariable VTHVTH hoppingVTCMOSMultiple VDDDual-VDDBoosted gate MOSVariable VDDVDD hopping

  • Dynamic power reduction

    Controller

    Software

    Hardware

    Required

    speed

    Clock & VDD

    Processor

    If you dont need to hustle, relax and save power

    Through Software-hardware cooperation OS and application programming

  • Suspend

    Ctrl

    Logic

    Voltage Scaling Mechanism

    Four power domains On-chip supervisor for

    SOC voltage supplies

    Level shifting and

    latching circuits at

    domain interfaces

    RTC

    Logic

    Linear

    Regulator

    Regulated

    1.0V PLL

    Supply

    Domain

    CPU Core

    Caches

    I/O Intf Logic

    Memory Intf

    Accelerators

    Drivers

    Recvrs

    Persistent

    1.8V

    Battery-BackedDomain

    Select

    Shutdown

    3.3V

    1.0V-1.8V

    Constant

    3.3V I/O

    Domain

    Voltage Scalable

    1.0V-1.8V Logic Supply

    Domain

    Battery

    DC/DC Supplies

  • Dynamic Voltage/Frequency Scaling

    Freq. changed and Vdd dropped from 1.8V to 1.0V

    PLL locked at 533MHz with CPU clock switched from

    266MHz to 66MHz to 266MHz

    Continues to execute Dhrystone benchmark

  • Low Leakage Cells Standby Power Reduction

    Dual-Vt Storage Cells

    Low Vt for high performance

    High Vt for low leakage

    Gated Vdd and DRG

    Power Switch

    Sub threshold leakage current dominates

  • Energy (Power) Management Approach by ARM

  • Need for Energy Management

    Todays mobile consumers want:longer battery life and smaller, lighter productsManufacturers are adding new features and applications to add product appeal:media players (audio, video)gaming video capture

    Increasing processing power requirements and longer battery life are conflicting requirements

    Battery technology alone offers only incremental improvement over the next several years

  • Higher performance, higher power

    Chart1

    237.98

    3914.63

    4523.94

    48.7520

    16062.5

    16062.5

    148

    15384

    180100

    190100

    190100

    260.4195

    260.4195

    162.5

    162.5

    345.8160

    345.8160

    0.18um process

    0.13um process

    Dhrystone MIPS

    Power consumption (mW)

    ARM7

    ARM9

    ARM10, 11

    CPU Characterisation

    CPU Data 21st November 2002ARM Confidential

    CPU7TDMI7TDMI-S7EJ-S720T920T922T940T946E-S966E-S926EJ-S926EJ-S1020E*1022E*1026EJ-S*1026EJ-S*1136J-S*1136JF-S*

    Revision-----------------

    MIPS/MHz [RVCT 2.0]0.910.911.000.911.171.171.171.171.171.061.061.241.241.351.351.181.18

    Default Cache [inst/data]---8k uni16k/16k8k/8k4k/4k8k/8k16k/16k TCM8k/8k16k/16k32k/32k16k/16k16k/16k8k/8k32k/32k32k/32k

    Variable cache [yes/no]NoNoNoNoNoNoNoYesYes (TCMs)YesYesNoNoYesYesYesYes

    TCMs [yes/no]NoNoNoNoNoNoNoYesYesYesYesNoNoYesYesYesYes

    Memory controller [MPU/MMU]---MMUMMUMMUMPUMPU-MMUMMUMMUMMUMMU/MPUMMU/MPUMMUMMU

    Bus Interface [ASB/AHB/dual AHB]Yes**Yes**Yes**AHBASB**ASB**ASB**AHBAHB2x AHB2x AHB2x AHB2x AHB2x AHB2x AHB5x AHB5x AHB

    ThumbYesYesYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes

    DSP--Yes----YesYesYesYesYesYesYesYesYesYes

    Jazelle--Yes------YesYes--YesYesYesYes

    Data for graph

    0.18 Mhz10010010075200200185170200200200210210266266

    0.18 MIPS919110068.25234234216.45198.9234212212260.4260.4313.88313.88

    0.18 mW / Mhz0.230.390.450.650.800.800.800.900.900.950.951.31.3

    0.18 mW23394548.75160160148153180190190345.8345.8

    0.13 Mhz133133133100250250210250250250325325325325400400

    0.13 MIPS121.03121.0313391292.5292.5245.7292.5265265403403438.75438.75472472

    0.13 mW / Mhz0.060.110.180.200.250.250.400.400.400.400.60.60.500.500.400.40

    0.13 mW7.9814.6323.942062.562.584100100100195195162.5162.5160160

    CPU Characterisation

    0.18um process

    0.13um process

    Dhrystone MIPS

    Power consumption (mW)

    ARM10, 11

    ARM9

    ARM7

    0.18um process

    0.13um process

    Dhrystone MIPS

    Power consumption (mW)

    ARM10, 11

    ARM9

    ARM7

  • Layers of power optimizations

    Software (OS, applications)

    System Architecture

    Micro-architecture

    Circuits

    Ambient environment

    Si conditions

    Power delivery

    Important to optimize design at each levelARMs partners have widely varying design-time, technology, legacy, cost constraints.IEM: current focus on top two layersWidely applicable dynamic power-optimizationsOptimize for the requirements of the specific workload

  • Conventional Power Management

    STANDBY is off but with state retained with clocks stoppedIDLE is a lower power mode with a slow clock runningON state is fully powered up at maximum clock frequency

    Conventional power management schemes manage the transitions between defined power states

    Despite the changing software workload, system runs at maximum performance while there is any work to be done

  • Optimizing for utilization characteristics

    Conventional power management optimizes power consumption when there is nothing to do (sleep modes).IEM optimizes power when work is being done.Only run fast enough to meet deadlines!Running fast and idling wastes power.The active- and sleep-mode techniques are orthogonal.

    100%

    0%

    100%

    0%

    Utilization

    Dynamic Voltage Scaling

    Energy used

    Energy used

  • Meeting the Performance Requirement

    Effective Energy Management requires:

    Automatic Performance Prediction technology

    Determining the lowest performance level that will get the software workload done just in time

    Performance Scaling technology

    Delivering just enough performance to meet the current requirementResponding rapidly to changing performance levels

  • Energy Management Control Components

    Software componentTo automatically predict future software workloads by interacting with instrumented Operating Systems and application softwareTo determine the software deadlinesTo balance workload and deadlines with performanceHardware componentTo accurately measure the actual system performance To independently manage the transitions of hardware scaling blocks. e.g., clock generators and power controllersTogether these components determine and manage the lowest performance level that gets the work done

  • Adaptive Voltage Scaling (AVS)

    AVS is a closed loop control mechanism.Feedback from the PMU indicates the earliest opportunity to change processor frequency based on the voltage levels being output to the SoC.APC monitors the difference between the requested performance level and the actual level achieved.Taking into account variations due to differences in process technology and ambient temperature the system dynamically changes the voltage applied.The lowest energy consumption is achieved OR

    a specified performance level can be met.

  • Low Power Design Example with Samsung AP based on ARM 920T

  • Limited Battery Improvement

    Power Increase vs. Battery Improvement

    Year 2001 2004 2007 2010 2013 2016

    Feature Size(nm) 130 90 65 45 32 22

    Dynamic Power Reduction(X) 0 1.5 2.5 4.0 7.0 20

    Stand-by Power Reduction(X) 2 6 15 30 150 800

    [ITRS 2001]

    200

    400

    600

    Volumetric Energy Density(Whr/L)

    Gravimetric Energy Density(Whr/Kg)

    100

    200

    300

    Li-Ion / Polymer

    NI-MH

    800

    400

    500

    600

    700

    800

    900

    Fuel Cell

    Cellular Phone

    Talk Time : 2Hrs ~ 4Hrs

    Standby : about 1 week

    Cellular Phone

    Talk Time : about 12Hrs

    Standby : about 1 month

    Only 4~5 X improvement

    In Battery lifetime!

  • Problem Statement

    Power Analysis on CMOS Inverter

  • Problem Statement

    Dynamic Power

    Average Short Circuit Current

    Sub-threshold Leakage Current

  • Problem Statement

    Domination of Leakage Current

  • Active and Leakage Power with CMOS Scaling

    As CMOS scales down the following stand-by leakage current rises rapidly.Source to drain leakage (diffusion+tunneling) as Lg scales downGate leakage current (tunneling) as Tox scales downBody to drain leakage current (tunneling) as channel doping scales up

  • Two cases of Leakage Mechanism

  • Gate Leakage Current Reduction with High-K Gate Dielectric

  • Gate Leakage Current Reduction with High-K Gate Dielectric

    As Tox scales gate leakage current increases exponentially due to exponential increase of tunneling probability with reduction of physical tunneling distance. Physically thicker gate dielectric allows lower leakage current but lower oxide capacitance reducing on-current Using high k (dielectric constant) material, both thicker physical thickness and higher oxide capacitance can be achieved. Applying high-k gate dielectric, several orders of magnitude lower gate leakage current can be achieved with similar oxide capacitance

  • Power Saving vs. Abstraction Layers

    Power Saving v.s. Abstraction Layers

    System/Algorithm/Architecture

    have a large potential!

    Design Time

  • System Level Consideration for Low Power Design

    Mobile Devices Behavior according to Time (Operation Time is less than 10%)

    Need Various Power Modes In System

  • Power Management : Example

    General Clock Gating

    Controlling the individual clock source foreach IP block by the on/off controlling of each corresponding clock source enable bit

    IDLE

    Turn off the clock source to the CPU

    STOP

    Turn off all of the clock sources includingthe external X-tal and internal PLLs

    SLEEP

    Turn off all of the clock sources and also the power-supply for the internal-logicexcept for the wake-up logic circuitry

  • Dynamic Voltage Scaling (DVS)

    Reduction of Stand-by Power in Leaky ProcessBy Monitoring Data Bus CongestionBy Monitoring/Guessing Performance Needed, for Specific Application

    V

    time

    V

    time

    V

    Power gain V2

    DVS

    Need to predict

    task execution time!

    Task

    Task

  • Dynamic Voltage Scaling (DVS)

    Stretch the execution by lowering the supply voltageQuadratic Power savingNo later than the deadlineProcessors supporting DVSIntel XscaleTransmeta CrusoeDVS AlgorithmsCan be implemented as HW or SWOptimal solution in continuous voltage domain,

    but not in discrete voltage domain

  • Voltage Scaling for Low Power

    Low Power

    Low VDD

    Low Speed

    Speed Up

    Low Vth

    P VDD2

    I ds (VDD - Vth)1~2

    I ds (VDD - Vth)1~2

    High Leakage

    I leakage e-C x Vth

    Leakage Suppression

  • Low-Leakage Solution Technology

  • VTCMOS & MTCMOS

  • MTCMOS : Reduce Stand-by Power with High Speed

    With High VTH switch, much lower leakage current flows between Vdd and Vss High VTH MOSFET should have much lower ( >10X) leakage current compared to normal VTH MOSFET

    Vdd

    Vss

    0

    0

    Vdd

    Vss

    1

    1

    0

    Without High VTH switch

    With High VTH switch (MTCMOS)

    High VTH switch

    Normal or Low VTH MOSFET

    Virtual Ground

  • Multi-Threshold CMOS (MTCMOS)

    Mobile ApplicationsMostly in the idle stateSub-threshold leakage CurrentPower Gating Low VTH Transistors for High Performance Logic GatesHigh VTH Transistors for Low Leakage Current Gates

    Sleep Control (SC)

    Time

    Operating

    Mode

    Current

    Cutoff-Switch

    (High Vth)

    SC

    VDD

    VSS

    VGND

    Low Vth

    MOS

    High Vth

    MOS

    Logic Component (Low Vth)

  • CCS Sizing

    The effect of CCS sizeAs the size decreases, logic performance also decreases.As the size increases, leakage current and chip area also increase.Proper sizing is very important.CCS size should be decided within 2% performance degradation.

    Vop = VDD - V

    V must be sizedwithin 2% performance degradation.

    VDD

    GND

    Low Vt

    Switch

    Control

  • Energy Management System Open loop

    IEM and IEC components work together to predict lowest acceptable processor performance levelPower Controller, PMU and Clock Generator work together to deliver that lowest performance level

  • Energy Management System Closed loop

    APC operates in closed loop control mode using HPM to adapt to actual process and temperaturePowerWise Interface provides fast control of EMU and feedback of status for optimum control

  • MPEG video playback comparison

    Classical interval-based algorithms (e.g. LongRun) are too conservative choose higher performance than necessary.

    Chart2

    00.03040.1720.7915

    0.00080.07780.88060.0407

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    IE

    Emacs

    LongRunVertigo

    LongRunVertigo

    5067.94%95.57%

    6620.48%2.34%

    835.95%0%

    1005.63%2.10%

    0.880.94

    Xwelltris

    LongRunVertigo

    LongRunVertigo

    5032.65%81.67%

    6649.27%9.38%

    835.26%3.66%

    10012.82%5.29%

    0.630.70

    ar2-tmAcrobat Reader

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.51%9.62%508.42%40.37%

    663.45%1.06%668.37%2.75%

    833.52%1.33%837.57%0.98%

    10089.53%87.98%10075.63%55.90%

    5.996.111.011.12

    Netscape NewsNetscape Newsnews2

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    5016.73%44.39%5010.71%60.70%

    668.56%3.42%6616.30%14.15%

    838.61%5.83%837.99%3.02%

    10066.10%66.10%10065.00%22.22%

    5.996.112.643.23

    konq4Konqueror

    LongRunVertigo

    LongRunVertigo

    5010.09%38.49%

    6610.44%25.56%

    835.55%14.75%

    10073.92%26.65%

    4.655.52

    FS misc

    LongRunVertigo

    LongRunVertigo

    5033.76%85.54%

    6616.05%1.28%

    8314.43%0.41%

    10035.75%12.77%

    0.730.89

    IE

    0000

    0000

    Series1

    Fraction of time at each performance level

    Emacs

    Multimedia

    Danse De Cable

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.93%46.82%505.74%51.17%

    6611.93%52.74%6617.04%48.34%

    8327.62%0.12%8329.50%0.11%

    10056.51%0.32%10047.72%0.37%

    Legendary

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    500.00%0.31%500.00%0.08%

    662.84%7.76%663.04%7.78%

    8317.14%88.11%8317.20%88.06%

    10079.07%3.81%10079.15%4.07%

    99.39%

    Series1

    Fraction of time at each performance level

    Xwelltris

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Acrobat Reader

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Netscape News

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Konqueror

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Interactive shell commands

    0

    0

    0

    0

    0

    0

    0

    0

    Multimedia

    0000

    0000

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    Sheet3

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

    Chart3

    0.05740.17040.2950.4772

    0.51170.48340.00110.0037

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    IE

    Emacs

    LongRunVertigo

    LongRunVertigo

    5067.94%95.57%

    6620.48%2.34%

    835.95%0%

    1005.63%2.10%

    0.880.94

    Xwelltris

    LongRunVertigo

    LongRunVertigo

    5032.65%81.67%

    6649.27%9.38%

    835.26%3.66%

    10012.82%5.29%

    0.630.70

    ar2-tmAcrobat Reader

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.51%9.62%508.42%40.37%

    663.45%1.06%668.37%2.75%

    833.52%1.33%837.57%0.98%

    10089.53%87.98%10075.63%55.90%

    5.996.111.011.12

    Netscape NewsNetscape Newsnews2

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    5016.73%44.39%5010.71%60.70%

    668.56%3.42%6616.30%14.15%

    838.61%5.83%837.99%3.02%

    10066.10%66.10%10065.00%22.22%

    5.996.112.643.23

    konq4Konqueror

    LongRunVertigo

    LongRunVertigo

    5010.09%38.49%

    6610.44%25.56%

    835.55%14.75%

    10073.92%26.65%

    4.655.52

    FS misc

    LongRunVertigo

    LongRunVertigo

    5033.76%85.54%

    6616.05%1.28%

    8314.43%0.41%

    10035.75%12.77%

    0.730.89

    IE

    0000

    0000

    Series1

    Fraction of time at each performance level

    Emacs

    Multimedia

    Danse De Cable

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.93%46.82%505.74%51.17%

    6611.93%52.74%6617.04%48.34%

    8327.62%0.12%8329.50%0.11%

    10056.51%0.32%10047.72%0.37%

    Legendary

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    500.00%0.31%500.00%0.08%

    662.84%7.76%663.04%7.78%

    8317.14%88.11%8317.20%88.06%

    10079.07%3.81%10079.15%4.07%

    99.39%

    Series1

    Fraction of time at each performance level

    Xwelltris

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Acrobat Reader

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Netscape News

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Konqueror

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Interactive shell commands

    0

    0

    0

    0

    0

    0

    0

    0

    Multimedia

    0000

    0000

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    Sheet3

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    0

    0

    0

    0

    0

    0

    0

    0

  • Interactive app: Konqueror

    Exactly repeating the run of interactive apps is difficult.Our methodology: LongRun in control, estimate what IEM would have done on that same run.

    Chart2

    0.10090.10440.05550.7392

    0.38490.25560.14750.2665

    Series1

    Fraction of time at each performance level

    Konqueror

    IE

    Emacs

    LongRunVertigo

    LongRunVertigo

    5067.94%95.57%

    6620.48%2.34%

    835.95%0%

    1005.63%2.10%

    0.880.94

    Xwelltris

    LongRunVertigo

    LongRunVertigo

    5032.65%81.67%

    6649.27%9.38%

    835.26%3.66%

    10012.82%5.29%

    0.630.70

    ar2-tmAcrobat Reader

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.51%9.62%508.42%40.37%

    663.45%1.06%668.37%2.75%

    833.52%1.33%837.57%0.98%

    10089.53%87.98%10075.63%55.90%

    5.996.111.011.12

    Netscape NewsNetscape Newsnews2

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    5016.73%44.39%5010.71%60.70%

    668.56%3.42%6616.30%14.15%

    838.61%5.83%837.99%3.02%

    10066.10%66.10%10065.00%22.22%

    5.996.112.643.23

    konq4Konqueror

    LongRunVertigo

    LongRunVertigo

    5010.09%38.49%

    6610.44%25.56%

    835.55%14.75%

    10073.92%26.65%

    4.655.52

    FS misc

    LongRunVertigo

    LongRunVertigo

    5033.76%85.54%

    6616.05%1.28%

    8314.43%0.41%

    10035.75%12.77%

    0.730.89

    IE

    0000

    0000

    Series1

    Fraction of time at each performance level

    Emacs

    Multimedia

    Danse De Cable

    LongRunVertigo

    LongRunVertigoLongRunVertigo

    503.93%46.82%505.74%51.17%

    6611.93%52.74%6617.04%48.34%

    8327.62%0.12%8329.50%0.11%

    10056.51%0.32%10047.72%0.37%

    Legendary

    LongRunVertigoLongRunVertigo

    LongRunVertigoLongRunVertigo

    500.00%0.31%500.00%0.08%

    662.84%7.76%663.04%7.78%

    8317.14%88.11%8317.20%88.06%

    10079.07%3.81%10079.15%4.07%

    99.39%

    Series1

    Fraction of time at each performance level

    Xwelltris

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Acrobat Reader

    0

    0

    0

    0

    0

    0

    0

    0

    Series1

    Fraction of time at each performance level

    Netscape News

    Series1

    Fraction of time at each performance level

    Konqueror

    Series1

    Fraction of time at each performance level

    Interactive shell commands

    Multimedia

    LongRunLongRunLongRunLongRun

    VertigoVertigoVertigoVertigo

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    Sheet3

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    600 Mhz

    500 Mhz

    400 Mhz

    300 Mhz

    Series1

    Fraction of time at each performance level

    Danse De Cable MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

    400 Mhz

    500 Mhz

    600 Mhz

    Series1

    Fraction of time at each performance level

    Legendary MPEG

    LongRun

    LongRun

    LongRun

    LongRun

    Vertigo

    Vertigo

    Vertigo

    Vertigo

  • Energy Management in Action

    2 seconds

    100%

    83%

    66%

    50%

    Performance

  • DVS Control Sub-system

    Current

    Target

    PWRREQ

    DVC

    DVC

    Dynamic Voltage

    Controller

    Voltage vs.

    Frequency

    Lookup table

    IEC

    DCG

    Dynamic Clock Generator

    (SoC specific)

    DPC

    Dynamic Performance

    Controller

    DPM

    Dynamic Performance

    Monitor

    CPU

    CLKGEN

    DPC

    CLKGEN

    MAXPERF

    cpuclk

    CLOCK

    DATA

    ...

    APB

    Configuration Interface

    Target

    Current

    Perf.

    Index

    Config.

    Perf.

    Index

    DEM

    DVS Emulation

    Interrupts

    (SoC specific)

    PMU

  • DVS operation (with MAXPERF Signalling)

  • Prototype IEM test chip

    ARM926EJ-S coreMultiple power domainsVoltage and frequency scaling of CPU, caches and TCMsFirst full DVS silicon with National Semiconductor PowerWise technologyNSC Adaptive Power Controller (APC) implemented in FPGAIncludes DVS emulation mode for comparative tests

    TSMC 0.13m - CL013G - April Cyber ShuttlePackaged parts 11 August 2003Developed by ARM, Synopsys and National Semiconductor using Synopsys EDA tools

  • Conclusions

    Along with Process Technology Scaling, Signal Integrity, SoC Integration and System Verification, Low-Power Design is a critical issue.Low Power Design needs to be approached from System-Level including Software, algorithm to Device/Process Standpoints.

  • Thank you for your kind attention!

  • IBM Low Power Design using PowerPC

  • Platforms for Information Appliances

    IBM PowerPC platforms enable highly integrated, power efficient Information Appliance (IA) chips

    PowerPC

    Platform

    SOC

    SOC

    Custom IA Chips

    Application-Specific IA Chips

    uP Cores

    405/440

    IP Cores

    CoreConnectTM

    Architecture

    ASIC

    Tools

    Low Power

    Optimizations

  • Scalable PowerPC 405 CPU Core

    CPU Goals

    Expanded operating voltage range (0.9V to 1.95V)

    Maintain full software and tools with existing compatibility PowerPC 405

    Provide a high performance core capable of high efficiency low power operation

    CPU Optimizations

    Redesigned custom circuits within CPU that were sensitive to low voltage operation

    Re-optimize design and timing for extended voltage range

    Verification of equivalence

    Instruction Unit

    Timers

    Debug/Trace

    I-cache

    D-cache

    64-bit Processor Local Bus

    I-cache Control

    D-cache Control

    MMU

    Power Mgmt.

    Execution Unit

    Load / Store Pipe

    MAC

    Branch Unit

    Interrupts

    PowerPC 405 Core

    GPRs

  • Embedded PowerPC Cores

    PowerPC 40532-bit data, 32-bit address, MMUSingle-issue, 5-stage pipeline: 1.52 DMIPS / MHz266 400 MHzL1 Cache to 16KB/16KBVoltage-scalable versions (405LP-1, 405LP-2)

    PowerPC 44032-bit data, 36-bit address, MMUDual-issue, 7-stage pipeline: 2.0 DMIPS / MHz400 800 MHzL1 Cache 32KB/32KB; L2 256 KB; L3

  • Low Power Optimizations

    Active Power Reductions

    Voltage Scaling

    Frequency Scaling

    Flexible Clock Distribution

    Clock Gating

    Hardware Accelerators

    IBM low-power SOC designs include a wide range of optimizations to reduce both active and standby power

    Standby Power Reductions

    Clock FreezingHibernationCryo Standby

  • Voltage Scaling Benefits

    Complementary CMOS scales well over a wide voltage range

    Can be used widely over entire chip

    Can optimize power/performance (MIPS / W) over a 4X range

    Voltage Scaling Challenges

    Custom Circuits, PLLs, Analog, and I/O drivers dont voltage scale easily

    Avoiding increases in standby power in low active power circuits

    ( the VTH dilemma )

    Reducing operating voltage greatly reduces active power in CMOS

    Operating at 1/2 normal Vdd increases delay 2.4-3.2X but reduces power by > 10X

    CMOS Ring Oscillator Delay and Power VS VDD

  • IBM Low-Power SOC Designs

    Palmtops to Teraflops in a single ISA

    Optimized for high-performance handheld applications, e.g., high-end PDA

    PowerPC 405LP-1Joint project of IBM Research and IBM MicroelectronicsFirst silicon Oct. 20010.18m processFrequency-scalable, < 66 266 MHzVoltage-scalable, 1.0 1.8 V (0.9 1.65 V)Technology evaluation platform

    All power and performance data from 405LP-1 systems

    PowerPC 405LP-20.13 m processScalable to 333 MHz @ 1.5 V (est.)Optimized for multimedia processingWell into design

  • 405LP-1 System on a Chip

    DMA

    Controller

    PLB-OPB

    Bridge

    On-chip Peripheral Bus (OPB) 32-bit

    Processor Local Bus (PLB)

    64-bit

    16K

    I-Cache

    16K

    D-Cache

    PPC405

    CPU Core

    Scalable

    Low Power

    PLL

    LCD

    Controller

    Speech

    Accel

    CODEC

    INTRFC

    RTC

    Interrupt

    Controller

    GPIO

    SDRAM

    Controller

    RAM/ROM/

    Peripheral

    Controller

    Code Decompression

    PCMCIA/CFII

    UART

    UART

    IIC

    Standby Power

    Management

    Passive

    INTRFC

    Clock

    Power

    Management

    Crypto

    Accel

    3.3V I/O Supply

    1.0V 1.8V Logic

    1.8V Battery-Backed

    1.0V Internal Reg.

    New Core

    Pre-existing Core

    Sensor

  • Reducing Standby Power

    Cryo mode usesCustomers/designs comfortable with clock-stop standby

    Low-latency periodic sleep/wake with minimal standby power

    IP cores with hidden state can cause problems for SW-based save/restore

    Other methods under reviewVoltage islands and power gatingState-saving latches

  • Standby Power Modes

    Cryo mode sequence

    Shutdown: Save CPU Core State Flush caches and TLBs Clocks stopped State scanned to internal/external non-volatile storage Power removed from logic

    Suspend: Monitor system for wake up condition or RTC timer

    Restore: On Wake indicator Restore power to logic State scanned in from non-volatile storage Restore clocks Restore CPU state

    Standby power modes enable longer battery life and instant on

    System Clock

    VDD Logic

    State Saved

    Restore Time

    Power Logic

    Freeze Mode

    0 Hz

    1V

    All

    Observe Wake-up Condition

    (< 1ms)

    CMOS Leakage at 1V

    Hibernation Mode

    0

    0

    Software State

    OS Restore

    (100s of mS)

    ~0

    Cryo Mode

    0

    0

    Registers and Software State

    Instant On Scan Restore of State

    (20 - 200 mS)

    ~0

  • Dynamic Power Management

    System-Wide power management (PM) during application execution

    Examples:Peripheral PM, including core clock gatingPM at idle (including low-latency sleep modes)Memory PMDynamic voltage and frequency scalingEnergy policy management

    DPM is proposed as an architecture for policy-guided dynamic power management.

  • DPM Motivation

    Embedded application requirementsLong battery lifeSystem-specific policy requirementsHighly variable system designsWatch, cell phone, personal server, PDA, tabletSoft real-time (multimedia) requirementsTask-specific policy requirementsGeneral-purpose systems and applicationsNo/minimal application software changes for PMMinimal/variable firmwarePM must be in the OS/applications

  • DPM Motivation

    TechnologySOCCPU + peripheral PMComplex clocking architecturesDecoupled CPU/bus frequenciesHeterogeneous processor architecturesExample: 405LP-2 - Asynchronous heterogeneous processing in a common voltage/memory domain New performance and leakage control mechanisms at the circuit level

  • DPM Motivation

    LinuxPlatform independence desired

    Community acceptance requiredSimplicity ease of maintenanceIntegration with pre-existing facilitiesLinux Device ModelMinimal core kernel changes5 lines of new code in the core kernel

    Scalability to server/SMP systems

  • DPM: An Architecture for Policy-Guided PM

    Is:A generic software architecture for policy-guided dynamic power managementproposed by IBM and MontaVista software

    Flexible enough to implement a number of system-specific DVFS and static PM approaches

    Available in an embedded Linux distribution for several embedded processors

    Is Not:

    PowerPC or Linux specific

    A DVFS algorithm

    Fully implemented yet

  • DPM Overview

    DPM

    Sets operating

    points changing

    power

    -

    performance

    levels

    CPU

    Memory

    Controller

    Power

    Supplies

    Signal

    operating/task state

    changes

    Provide,

    manage policies

    Policy/Power

    Managers

    Power

    -

    aware

    Applications

    System

    Clock

    Generation

    Operating

    System

    Device

    Drivers

    Requirements,

    power

    -

    mgmt.

    information

    Software

    Hardware

  • Dynamic Voltage and Frequency Scaling

  • Idle Scaling Trace (MPEG4)

    Core Voltage

    Battery Power

    ApplicationDefaultIdle ScalingSys. SavingsCore SavingsMPEG4 A/V2.76 W2.63 W4.7 %11.4 %MP31.42 W1.1 W22.5 %47.8 %

  • Load Scaling Trace (MPEG4/spmt)

    Core Voltage

    Core Voltage

    Battery Power

    ApplicationDefaultLoad ScalingSystem SavingsMPEG4 A/V2.76 W2.54 W8.0 %MP31.42 W1.03 W27.7 %

  • Application Scaling Trace

    More Performance Required

    Working Ahead

    E

    F

    D

    VideoThread

    Task State

  • AS Results

    AS achieved close to an ideal LS result with a simple policy manager and a straightforward modification of the application

    ApplicationNo DPMDPM: Application ScalingDPM SavingsIdealSavingsMPEG4 A/V2.76 W2.46 W10.8 %10.8 %

  • References

    Nowka et al., A 32-bit PowerPC System-on-a-chip With Support for Dynamic Voltage Scaling and Dynamic Frequency Scaling, IEEE Journal of Solid-State Circuits, vol. 37(11), Nov. 2002, pp. 1441-1447.

    IBM Austin Research Laboratory (www.research.ibm.com/arl)

    Dynamic Power Management for Embedded Systems (Whitepaper)http://www.research.ibm.com/arl/projects/papers/DPM_V1.1.pdf

    Linux 2.4 kernel including DPM implementation (Bitkeeper) bk://source.mvista.com/linuxppc_2_4_devel-pm

    Dynamic Voltage Scaling

    1.8V --> 1.0V at upto 1V/100us

    Dynamic Frequency Scaling

    266Mhz CPU to 66MHz CPU

    400mW

    200mW

    600mW

    2.0V

    1.0V

    Logic

    VDD

    I/O Power

    --- 266 /133---| -------------------------- 66 /66 --------------------- |-------- 266/133--------

    CPU/MEMORY FREQUENCY( MHz)

    Total Chip Power

    Logic Power

    Uninterrupted Operation

    Linux 2.3.17 Running

    Dhrystone 2.1 code

    400 loops per cycle .

    0mW

    0V

    Power consumption for the CPU and logic was reduced by 13X dynamically

    under the control of the Linux kernel

    ( NO PLL Relock and NO stopping of the application )

    System

    Clock

    VDD

    Logic

    State Saved

    Restore Time

    Power

    Logic

    Freeze

    Mode

    0 Hz

    1V

    All

    Observe

    Wake

    -

    up

    Condition

    (< 1ms)

    CMOS

    Leakage

    at 1V

    Hibernation

    Mode

    0

    0

    Software State

    OS Restore

    (100s of mS)

    ~0

    Cryo Mode

    0

    0

    Registers and

    Software State

    Instant On

    Scan Restore

    of State

    (20

    -

    200 mS)

    ~0

    ARM7

    ARM9

    ARM10, 11

    1

    10

    100

    1000

    050100150200250300350400450500

    Dhrystone MIPS

    Power consumption (mW)

    0.18um process

    0.13um process

    Input

    switching

    to

    '

    1

    '

    or

    '

    0

    '

    charge

    discharge

    Input

    Cload

    V

    thn