serhey rudko 049036 snoc anoc id 309501864

Upload: vivek-pratap-singh

Post on 10-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    1/17

    1

    Asynchronous vs. SynchronousAsynchronous vs. Synchronous

    NetworkNetwork--onon--ChipChip

    Prepared by Sergey RudkoPrepared by Sergey Rudko

    Advanced Topics in VLSI 1 (NoC) 049036Advanced Topics in VLSI 1 (NoC) 049036

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    2/17

    2

    IntroductionIntroduction Problem DefinitionProblem Definition

    NoC Implementation AlternativesNoC Implementation Alternatives Fully asynchronousFully asynchronous

    MultiMulti--synchronous (GALS)synchronous (GALS)

    SynchronousSynchronous

    Proposed SolutionProposed Solution Systematic Comparison between Different StrategiesSystematic Comparison between Different Strategies

    Silicon AreaSilicon Area

    Network Saturation ThresholdNetwork Saturation Threshold

    Communication ThroughputCommunication Throughput

    Packet LatencyPacket Latency Power ConsumptionPower Consumption

    Implementation Flexibility and ToolsImplementation Flexibility and Tools

    Related ApproachesRelated Approaches I. MiroI. Miro--Panades, F. Clermidy, P. Vivet, A. Greiner,Panades, F. Clermidy, P. Vivet, A. Greiner, Physical Implementation of the DSPINPhysical Implementation of the DSPIN

    NetworkNetwork--onon--Chip in the FAUST ArchitectureChip in the FAUST Architecture, NoCs 2008, NoCs 2008

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    3/17

    3

    Synchronous RouterSynchronous Router

    Router Pipeline may include many stagesRouter Pipeline may include many stages Increases communication latencyIncreases communication latency

    Router Pipeline may be optimized to single cycle routerRouter Pipeline may be optimized to single cycle router Possible by use of speculationPossible by use of speculation

    Clock period same as pipeline routerClock period same as pipeline router

    Presence of clock simplify designPresence of clock simplify design Standard libraries and toolsStandard libraries and tools

    VCAVCA SASARouterRouter

    Data pathData path

    LINKLINK LINKLINK

    A. Kumar, P. Kundu, A. Singh, L. Peh and N. Jha ,

    "A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator",International Conference on Computer Design (ICCD), October, 2007.

    Speculative Control SignalsSpeculative Control Signals

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    4/17

    4

    Limitations ofLimitations of

    FullyFully--Synchronous NetworksSynchronous Networks

    Difficult to distribute clockDifficult to distribute clock Network spread over die & may have irregular layoutNetwork spread over die & may have irregular layout

    MinimisingMinimising skew costs complexity and powerskew costs complexity and power Solution:Solution: Alternatives/extensions to PLL and HAlternatives/extensions to PLL and H--treetree

    Single Network Clock FrequencySingle Network Clock Frequency Communicating synchronous IP blocks with different frequenciesCommunicating synchronous IP blocks with different frequencies

    What is most appropriate network clock frequency?What is most appropriate network clock frequency?

    Problem:Problem: Clock Distribution and Frequency SelectionClock Distribution and Frequency Selection

    Solution:Solution: Beyond a Single Global ClockBeyond a Single Global Clock

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    5/17

    5

    Synchronous Routers withSynchronous Routers with

    Asynchronous Links (GALS)Asynchronous Links (GALS)

    s*

    Synchronization is simpleSynchronization is simple TraditionalTraditional 22 FF synchronizersFF synchronizers

    Can support asynchronous interconnectsCan support asynchronous interconnects

    No longer exploiting periodic nature of router clocksNo longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the linkCorrect operation is independent of the delay of the link

    GALS interfaces with pausible clocksGALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliablyIf necessary clock is stretched, data is always transferred reliably

    Need to construct local delay lineNeed to construct local delay line

    RouterRouter RouterRouterAsynchronous FIFO

    s* r*

    Connect Frequency Independent RoutersConnect Frequency Independent Routers

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    6/17

    6

    Asynchronous NoCsAsynchronous NoCs Simple/elegant solution when networked IP blocks run at differentSimple/elegant solution when networked IP blocks run at different

    clock frequenciesclock frequencies Data driven, no superfluous switching activityData driven, no superfluous switching activity

    No synchronization/clock alignment issues at interfacesNo synchronization/clock alignment issues at interfaces

    Solves synchronization, clock domain crossings, timing, long connectsSolves synchronization, clock domain crossings, timing, long connects

    No clock distribution issuesNo clock distribution issues

    Security and EMI advantagesSecurity and EMI advantages Clock focuses EM emissionsClock focuses EM emissions

    The presence of a clock can also aid faultThe presence of a clock can also aid fault--induction and sideinduction and side--channelchannelanalysis attacksanalysis attacks

    Reduced design timeReduced design time Easy to use interfaces, modularityEasy to use interfaces, modularity

    Robust and simple implementationRobust and simple implementation

    Reduced powerReduced power

    But network latency significantly increasedBut network latency significantly increased

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    7/17

    7

    Asynchronous NoCs ApproachesAsynchronous NoCs Approaches

    An Asynchronous Router for Multiple Service Levels Networks on Chip,An Asynchronous Router for Multiple Service Levels Networks on Chip,

    R. Dobkin et al, ASYNCR. Dobkin et al, ASYNC0505. (QNoC Group). (QNoC Group)

    MANGO Clockless NetworkMANGO Clockless Network--onon--ChipChip

    A Scheduling Discipline for Latency and Bandwidth Guarantees inA Scheduling Discipline for Latency and Bandwidth Guarantees inAsynchronous NetworkAsynchronous Network--onon--ChipChip,,

    T. Bjerregaard and J. Spars, ASYNCT. Bjerregaard and J. Spars, ASYNC0505..

    A router Architecture for ConnectionA router Architecture for Connection--Orientated Service Guarantees inOrientated Service Guarantees inthe MANGO Clockless Networkthe MANGO Clockless Network--onon--ChipChip,,

    T. Bjerregaard and J. Spars, DATET. Bjerregaard and J. Spars, DATE0505

    R. Dobkin Provide Synchronous versus Asynchronous Router StudyR. Dobkin Provide Synchronous versus Asynchronous Router Study

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    8/17

    8

    Synchronous or AsynchronousSynchronous or Asynchronous

    NoCs?NoCs?

    Physical Implementation of the DSPIN NetworkPhysical Implementation of the DSPIN Network--onon--Chip in the FAUST ArchitectureChip in the FAUST ArchitectureI. MiroI. Miro--Panades, F. Clermidy, P. Vivet and A. GreinerPanades, F. Clermidy, P. Vivet and A. Greiner

    NoCsNoCs 20082008

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    9/17

    9

    MotivationMotivation Physically implement the DSPIN NoC into thePhysically implement the DSPIN NoC into the

    FAUST application platformFAUST application platform

    Compare the performances between ANOC andCompare the performances between ANOC and

    DSPIN on a real application and trafficDSPIN on a real application and traffic Silicon AreaSilicon Area

    ThroughputThroughput

    Packet LatencyPacket Latency

    Power ConsumptionPower Consumption

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    10/17

    10

    FAUST Architecture with ANOCFAUST Architecture with ANOC

    Asynchronous NoC (ANOC)Asynchronous NoC (ANOC) QDIQDI 44--phase/phase/44--rail asynchronous logicrail asynchronous logic

    2020 RoutersRouters 55 port routerport router

    Source routingSource routing

    Wormhole packet switchWormhole packet switch

    3232 bit payloadbit payload

    GALS ConceptionGALS Conception

    2424 independent clocksindependent clocks FIFO based InterfaceFIFO based Interface

    HardHard--macro approach for ANOC reusemacro approach for ANOC reuse

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    11/17

    11

    DSPIN ArchitectureDSPIN Architecture

    Packet BasedPacket Based Distributed Router ArchitectureDistributed Router Architecture

    Suited for GALS ApproachSuited for GALS Approach

    Mesochronouse links between routersMesochronouse links between routers

    Metastability Resolved by Metastability Resolved by bibi--synchronoussynchronous FIFO FIFO

    Synthesizable with Standard CellsSynthesizable with Standard Cells

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    12/17

    12

    DSPIN Clock TreeDSPIN Clock Tree

    Mesochronous Link between Neighbor RoutersMesochronous Link between Neighbor Routers

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    13/17

    13

    NoC Architecture ComparisonNoC Architecture Comparison

    Both implementation use GALS principlesBoth implementation use GALS principles

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    14/17

    14

    Network ComparisonNetwork Comparison

    DSPIN clock-tree Consumes as much Power as the Router Itselftself

    ParameterParameter ANOCANOC DSPINDSPIN

    ImplementationImplementation HardHard--MacroMacro SoftSoft--MacroMacro

    AreaArea 0.281 mm 0.187 mm

    ThroughoutThroughout(worst case conditions(worst case conditions))

    ~~ 160160Mflit/sMflit/s 289289Mflit/sMflit/s

    ThroughoutThroughout

    (nominal conditions)(nominal conditions)

    ~~ 220220Mflit/sMflit/s 408408Mflit/sMflit/s

    Power Consumption (F=150MHz)Power Consumption (F=150MHz) 3.69mW3.69mW 5.89mW5.89mW

    Power Consumption (F=250MHz)Power Consumption (F=250MHz) 3.69mW3.69mW 10.39mW10.39mW

    DSPIN throughput is deterministic with respect to the clock frequencyDSPIN throughput is deterministic with respect to the clock frequency

    DSPIN Power IssuesDSPIN Power Issues Power consumption mainly dominated by FIFO data registersPower consumption mainly dominated by FIFO data registers

    The DSPIN clockThe DSPIN clock--gating reduced the power consumption by 67%gating reduced the power consumption by 67%

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    15/17

    15

    Network ComparisonNetwork Comparison -- LatencyLatency

    DSPIN Router is IP Data Locality Aware

    DSPIN routers resynchronize the data packetsDSPIN routers resynchronize the data packets

    DSPIN should be clocked toDSPIN should be clocked to 367367MHzMHz

    Flit PathFlit Path ANOCANOC DSPINDSPIN ANOCANOC DSPINDSPIN

    F=F=150150MHzMHz F=F=250250MHzMHz

    Intermediate Router LatencyIntermediate Router Latency 6.80 ns 1616..6666 nsns 66..8080 nsns 10.00 ns

    First + Last Router LatencyFirst + Last Router Latency 6060..0000 nsns 5656..6666 nsns 4747..0000 nsns 3434..0000 nsns

    Latency for 5 hops pathLatency for 5 hops path 8080..0000 nsns 106106..6666 nsns 6868..0000 nsns 6464..0000 nsns

    Latency for 9 hops pathLatency for 9 hops path 106.66 ns106.66 ns 173.30 ns173.30 ns 96.00 ns96.00 ns 104.00 ns104.00 ns

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    16/17

    16

    ConclusionConclusion Little published work on asynchronous routers and networksLittle published work on asynchronous routers and networks

    Comparing synchronous and asynchronous designs are difficultComparing synchronous and asynchronous designs are difficult System timing styleSystem timing style

    TechnologyTechnology

    Circuit style and architectureCircuit style and architecture

    Difficult to reproduce and simulate asynchronous designs fromDifficult to reproduce and simulate asynchronous designs frompublished workpublished work No notion of cycleNo notion of cycle--accurate modelaccurate model

    Hide detailed control and datapath delaysHide detailed control and datapath delays

    Asynchronous Performance GuaranteesAsynchronous Performance Guarantees Performance guarantees are requiredPerformance guarantees are required

    Less predictable, nonLess predictable, non--deterministicdeterministic Predicting performance is more complexPredicting performance is more complex

    Asynchronous EDA Tool RequirementsAsynchronous EDA Tool Requirements

    Synchronous RoutersSynchronous Routers Predictability and determinism can be exploitedPredictability and determinism can be exploited

    Fast single cycle routers possibleFast single cycle routers possible

    ANoC for Low Power & SNoC for Small AreaANoC for Low Power & SNoC for Small Area

  • 8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864

    17/17

    17