eda court: hierarchical construction and timing sign-off of socs tau 2013 panel
TRANSCRIPT
EDA Court: Hierarchical Construction and Timing Sign-off of SoCs
TAU 2013 Panel
The good side of hierarchy
…k
…k…k
…k…k …k…k
…k …k
Chip (h=0)
Chiplet (h=1)
Core (h=2)
Unit (h=3)
Macro (h=4)
Impact of pruning
Sweet spot:50B objects2M per macroFraction at top = 4e-54 levels of hierarchyPruning = 93%
h = 0
h = 1
h = 2
h = 3
h = 4
h = 5
Unpruned fraction
Fract
ion o
f ch
ip a
t to
p
The bad side of hierarchy
Accuracy? Pessimism? Coupling noise? Functional noise? Multiple interacting clocks? Parasitics on boundary nets?
Is “context” required? If so, we cannot “shelve and re-use” macros
Construction flow? Draconian methodology restrictions?
Chandu VisweswariahDistinguished EngineerIBM East Fishkill, [email protected]
Larry BrownDesign Center EngineerIBM San Jose, [email protected]
Alex RubinSenior Engineer
IBM San Jose, [email protected]
Amit ShaligramPrincipal EngineerSTMicroelectronics Scottsdale, [email protected]
Oleg LevitskySolutions ArchitectCadence San Jose, [email protected]
Qiuyang WuSenior Staff Engineer
Synopsys Hillsboro, [email protected]
Igor KellerSenior Architect
Cadence San Jose, [email protected]
Alexander SkourikhinEDA Engineer
Intel Haifa, [email protected]
Guntram WolskiPrincipal EngineerCisco San Jose, [email protected]
Panel plan
10 minCharge 1: Hierarchical implementation and hence hierarchical timing sign-off don’t have a future
Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
10 minCharge 2: EDA tools and flows are inadequate for a construction flow: budgeting, IP models and hierarchical constraint development are lacking
Plaintiff: Amit Shaligram, STMicro. Defendant: Alex Rubin, IBM
10 minCharge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software
Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
10 minCharge 4: hierarchical timing cannot handle multiple interacting synchronous clocks
Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
30 min Discussion and audience questions
5 min Verdicts and “damages”
Charge 1: Hierarchical implementa-tion and hence hierarchical timing sign-off don’t have a future
Plaintiff: Oleg Levitsky, CadenceDefendant: Qiuyang Wu, Synopsys
Evolution of design flow
Prototype
Implement
Sign Off
Evolution of design flow
Implement
Prototype
Sign Off
Evolution of design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Evolution of design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Blk1 Blk2 … Blkn
Quiz: Why hierarchical flow?
Create more work for managers
Contribute to real estate bubble
Control time to market schedule?
Hierarchical design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Blk1 Blk2 … Blkn
Complexity
Hierarchical scalability
Hierarchical design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Blk1 Blk2 … Blkn
tapeouttapeout
……
Step 2Step 2
Step nStep n
Step 1Step 1
Flow convergence is a key
Hierarchical design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Blk1 Blk2 … Blkn
Technical challenges: SI Over the block
routing Useful skew
distribution CPPR modeling Power
budgeting Channeless
designs …
Human factor: Level of
expertise Human error Lack of sleepConvergence
Hierarchical design flow
Implement
Prototype
Sign Off
Blk1 Blk2 … Blkn
Blk1 Blk2 … Blkn
Convergence
Complexity
Hierarchical scalability
Failed to control
TTM
What is the alternative?
Charge 1: Hierarchical implementa-tion and hence hierarchical timing sign-off don’t have a future
Plaintiff: Oleg Levitsky, CadenceDefendant: Qiuyang Wu, Synopsys
© Synopsys 2013 18
Hierarchical Design and Timing Closure is the Only Way to Have a Future
Qiuyang WuSr. Staff Engineer, Synopsys Inc.March 2013
Hierarchical Implementation is Proven
• Way back when in the last century– Designs grew beyond the reach of flat implementation
– Established hierarchical methodologies, tried, and true
• The success will continue because– naturally an iterative and gradual refinement process
– relatively larger error margins and tolerances for tradeoff
– more about reuse and integration, less about from scratch
– …
+1M Gates
+100M Gates
But, “Classic” Hierarchical Timing is Inadequate for Signoff
Gap #1 - Burden is on the users: “Garbage in, garbage out”– Block designers do not have quality constraints
Can’t close block timing with confidence: pessimism, optimism
Can’t create quality models: pessimism, optimism
Gap #2 - Language limitations: critical details can’t be elaborated– Chip level designers do not have means to express design intention
Can’t describe I/O timing context accurately and completely
Can’t cover different reuse scenarios
The rescue: flat signoff.
However, hierarchical signoff is the only way to stay on top of the technology curve.
Block constraints(ad-hoc)
Inst
TOP
Block
block netlist parasitics
ILM, ETM, glass-box, black-box,
…
Flat STA(golden)
Hier STA
Full chip golden constraints
chip netlist chip parasitics
And Here is How to do Hierarchical Signoff
• The Recipe on Top of Signoff Quality Engine• Provide hierarchical constraint management
– Check and highlight inconsistencies
• Provide context feedback and allow refinement– Produce accurate and elaborate timing environment
• Provide Ease-of-Use through data / flow automation– Minimize/prevent user errors by construction
• The Benefits Go Beyond Signoff– Design faster: throughput and interoperation with implementation
– Design better: accuracy enables further optimization for power,
leakage, robustness, area, etc.
Charge 2: EDA tools/flows are inad-equate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking
Plaintiff: Amit Shaligram, STMicroelectronicsDefendant: Alex Rubin, IBM
Hierarchical Constraints & Budgeting
Amit Shaligram, Principal Engineer
STMicroelectronics
Presentation Title
Models – Accuracy, speed and compatibility • Which model to use?
• ETM or .lib – Reasonable for use before clock tree.
• ILM – Required after clock tree insertion
• Model accuracy• Different modes at block and top level, block/top constraint mismatches
• Handling of high fanout and static nets
• Model compatibility• Models between different vendors/tools are not compatible.
• Some tools create “physical ILMs” others only “timing ILMs”
• It takes time..• For a ~2M instance block: 1 scenario (1 mode/1 corner), it takes ~6-8 hours
• Quickly becomes impractical with 25 blocks, ~5 modes and ~16 corners
• Can someone create models on the fly? Just use the DEF!
24
Presentation Title
Budgeting
• Floorplan and constraints – a chicken and egg problem!
• Estimation of feedthru delays can be challenging. • Consider crosstalk effect!
• Best practices not easy to follow all the time (FF at the boundary)• Critical path from a macro, legacy design, cannot tolerate extra latency
• Managing hold violations with FF at the boundary• Uncommon clock path creates hold violations due to OCV impact.
• SDC format limitations after clock tree insertion• Input/Output delay is specified with respect to virtual clock
• Latency of virtual clock changes with every step of the flow (postCTS, postRoute, postRouteSI)
25
Presentation Title
Hierarchical Constraints
• Top down or bottom-up constraints development flow ?
• How to ensure that block and top constraints are aligned?
• Constraint modifications required when using .lib or ILMs in top level • Generated clock definitions inside blocks create “new internal” clocks/pins
• Handling large constraint files created within ILM generation flow(s)
• Boundary conditions for hold?• How to estimate set_min_delay accurately?
• Crosstalk effects of top level clock tree• How much margin is too much margin inside the blocks?
• Using infinite timing windows inside the blocks is an overkill
26
Charge 2: EDA tools/flows are inad-equate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking
Plaintiff: Amit Shaligram, STMicroelectronicsDefendant: Alex Rubin, IBM
Living in a flat world?
March 27, 2013
Long list of charges that simply don’t stick…
Many teams have used hierarchy successfullyto tape out designs!
– Large problems require the use of “divide and conquer”.
Vast amount of design experience, understanding and overcoming practical challenges.
Tools help establish hand-shake across hierarchical levels.– Verification of boundary conditions and assumptions.– Automatic constraint generation and management.– Enforcement of best design practices.
Significant body of “do’s and don’ts” to help provide guidance, improve efficiency and reduce pessimism.
Follow best hierarchical design practices
Flop bound the design!
Use single macro clock input!
Simple rules can make hierarchy easy(er)!
Macro A
D Q
CLK
Flop 1Macro B
D Q
CLK
Flop 2
Avoid critical paths crossing boundaries!
Isolate output loading from internal paths!
Object count per unit
0
0.5
1
1.5
2
2.5
3
top level Ax20 Bx14 Cx20 Dx14 Ex1 Fx1 Gx1 Hx1 Ix1
Unit name x number of reused instances
Mil
lio
ns
of
ob
ject
s
Full Instance Abstracted Instance
Object count per unit
5X Speedup
0
50
100
150
200
250
300
PASTA SAUCE
Ru
n t
ime
(Ho
urs
)
Hierarchical Timing Full Chip Timing
Ru
n t
ime
(h
ou
rs)
Statistical Timing
10+ days
Deterministic Timing
5X Speedup
Hierarchy is a “must have”!
Parallelizes timing and optimization of independent paths to improve over-all efficiency.
Better supports timing closure when different macros / top level are at different “stages” of completeness.
Fosters un-interrupted design fix-up loop.
More resilient to failure.
44M Objects!
Charge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software
Plaintiff: Guntram Wolski, CiscoDefendant: Alexander Skourikhin, Intel
33
Hierarchical TimingFelonies or Misdemeanors?Guntram Wolski – Cisco Systems
Principal Engineer
Enterprise Networking Group
34
• You can come close, but that only counts in …..Or if you start worst casing things, you’ve overdesigned…
• You can set goals/targets for blocks, but then reality sets in.You end up opening block as it is the “right thing to do” in order to
close.
• Multiple instances of same coreHow do you wire over/through the cores?
Wiring bays – what if you don’t have enough in some areas?
Wire over the top == create new extraction/unique timing problems.
Noise issues
Every instance doesn’t have same IR drop/noise profile
35
• Requires strict PD requirements to be effectiveVery strict methodology to be effective
Need flopped boundaries
Long distance routes/fly overs need extra handling or pushed down
Legacy designs/IP integration cause immediate loss of benefit
Integration/Adopt complexity seems more so than with other tools
Logic designers have very little interest in helping PD
It’s good enough, live with it.
I’m not paid to improve your problems, I just meet timing.
I have to work on something else, you have to fix it.
• Are we leaving performance on table?Subchips need to be designed to guardbanded conditions on I/Os and IR drop
36
• Why are we not looking at taking advantage of parallelism?Are these not many individual paths?
If DRC can run on 120 cpus and benefit, why can’t timing?
Break up the problem and distribute to my farm….
Charge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software
Plaintiff: Guntram Wolski, CiscoDefendant: Alexander Skourikhin, Intel
Defense• Timing closure is an iterative process
• Controllability is the key for success• Start from initial spec• Once design is getting mature, gradually refine environmental
requirements and increase model accuracy• Finally, you see the “real” timing requirements, avoiding overdesign
• Non-overdesigned multi-instantiated blocks are reality• Must see all the requirements (timing, parasitics) w/o worst casing • Clocks handling is the real challenge• Noise is never an issue (at most – make worst case between
instances)
• Reusable IPs are feasible• Have to use accurate block models (adjustable to a new env.)• Have to apply design restrictions on interfaces
Defense (cont.)• Have to apply methodological restrictions to block
interfaces• Driver size, wire length, ports, etc. • All of them are manageable and ease integration on top level • Doesn’t necessarily lead to overdesign, due to accurate block
models• Applicable to both flop and latch based designs
• Timing analysis is highly parallelizable• Individual block analysis is naturally done in parallel• Top level analysis might
• leverage multi-threading technologies in STA algorithms• be divided in clusters and every cluster is analyzed in parallel
Summary• Efficient and Reliable Hierarchical Flow requires two
essential factors:• A robust project methodology, which
• Enforces design restrictions • Takes advantage of IP Reuse • Provides continuous timing picture throughout all project phases • Allows productive ECO work
• Advanced EDA tools, which• Are flexible and allow controllability between accuracy and simplicity• Can efficiently handle Multi-X environments (X=system, corner, clocks,
etc.)• Utilize parallel computing techniques • Support batch and ECO modes
Charge 4: Hierarchical timing cannot handle multiple interacting synchro-nous clocks
Plaintiff: Larry Brown, IBMDefendant: Igor Keller, Cadence
Hierarchical timing cannot handle multiple interacting synchronous clocks
Define the problem:
Definition continued
If clk1X is later than clk2X, we reduce our setup margin. If clk1X is earlier than clk2X, we reduce our hold margin.
We don’t know the real relationship between the two clocks until we have our top level established. This makes it difficult to close timing on the logic
macro and “put it on the shelf.” The problem is magnified if the logic macro is re-used.
In that case, the setup and hold margins of the logic macro must span all existing clk1X-clk2X relationships.
Fixes from timing methodology
Option 1: Assert an uncertainty between clk1X and clk2X in macro timing, and validate this uncertainty when running top level timing. Problem with this:
Leave performance/area on the table by lowering cycle time and/or over-padding hold fails.
If top level can’t meet this requirement, we must open up logic macro for further work.
Option 2: ???
The best solution: Fix the design
Update the design so we do not have multiple synchronous clock inputs in the first place.
Conclusion
Perhaps it’s more accurate to say that hierarchical timing can handle multiple synchronous clock inputs, but cannot do this without leaving performance and/or area on the table. In other words, it does not lead to the most efficient design.
Charge 4: Hierarchical timing cannot handle multiple interacting synchro-nous clocks
Plaintiff: Larry Brown, IBMDefendant: Igor Keller, Cadence
Defense:
48
First and foremost, defendant pleads not guiltyThe charge from plaintiff only means that there is
no free lunchFor Hierarchical Timing to work designers must
follow certain rulesThey are well described in Alex Rubin defenseSpecifically, one should have a single clock pin in
a block to avoid extra pessimism in hold/setup timing
In the case of multiple clock pins plaintiff himself exonerated defender by proposing a solution: it is possible to remove some of the pessimism by
describing relationship between two clocks
Defense (cont.)Advanced SI analysis today reduces pessimism
today if victim and aggressor share same clockSI analysis also becomes more problematic
with multiple clock pinsWith multiple clock pins one assumes the
clocks are different leading to Pessimism if uncertainty is assigned to both pinsOptimism if no uncertainty is assigned
As often is true, the best way to resolve a problem is to avoid creating it: stick to rules of hierarchy-friendly design methodology
Ways to Remove the Limitation
There are ways to define relationship between two internal clocks:Through parent external clockExplicitly define ranges of skews
Parameterization of timing models with skew on two clocks is possible
These enhancement are feasible but need to be driven by real commercial interest
CLK
Q & A Verdicts Damages!!!
10 minCharge 1: Hierarchical implementation and hence hierarchical timing sign-off don’t have a future
Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
10 minCharge 2: EDA tools and flows are inadequate for a construction flow: budgeting, IP models and hierarchical constraint development are lacking
Plaintiff: Amit Shaligram, STMicro. Defendant: Alex Rubin, IBM
10 minCharge 3: You can never really close out-of-context + Misdemeanor charge: too much additional complexity and software
Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
10 minCharge 4: hierarchical timing cannot handle multiple interacting synchronous clocks
Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
30 min Discussion and audience questions
5 min Verdicts and “damages”