unit 4: floorplanning - 國立臺灣大學cc.ee.ntu.edu.tw/~ywchang/courses/pd/unit4p1.pdf · y.-w....
TRANSCRIPT
Y.-W. ChangUnit 4 1
Unit 4: Floorplanning
․Course contents: Normalized polish expression for
slicing floorplans B*-trees for compacted and
large-scale floorplans, floorplanning with various constraints
P*-admissible floorplanrepresentations (Sequence pair, TCG, etc)
Comparisons on recently developed floorplanrepresentations
ILP for general floorplans․Readings
W&C&C: Chapter 10 S&Y: Chapter 3
Y.-W. ChangUnit 4 2
Floorplanning/Placement․Partitioning leads to
Blocks with well-defined areas and shapes (hard/rigid blocks). Blocks with approximated areas and no particular shapes
(soft/flexible blocks). A netlist specifying connections between the blocks.
․Objectives Find locations for all blocks. Consider shapes of soft block and pin locations of all the
blocks.
Y.-W. ChangUnit 4 3
Floorplanning Problem․Inputs:
A set of blocks, hard or soft. Pin locations of hard blocks. A netlist.
․Objectives: minimize area, reduce wirelength for (critical) nets, maximize routability (minimize congestion), determine shapes of soft blocks
․Is NP-hard (cf. 2-D bin packing)
Y.-W. ChangUnit 4 4
Floorplan Design
Y.-W. ChangUnit 4 5
Early Layout Decision with Floorplanning
․Floorplanning gives early chip design feedback Suggests valuable architectural modifications Estimates chip area and delay & congestion due to wiring Many more, e.g., buffer location, IR drop, etc.
Apple A5 die with dual ARM cores
Y.-W. ChangUnit 4 6
Slicing Floorplan Structure․Rectangular dissection: Subdivision of a given rectangle by a finite
# of horizontal and vertical line segments into a finite # of non-overlapping rectangles.
․Slicing structure: a rectangular dissection that can be obtained by repetitively subdividing rectangles horizontally or vertically.
․Slicing tree: A binary tree, where each internal node represents a vertical cut line or horizontal cut line, and each leaf a basic rectangle.
․Skewed slicing tree: One in which no node and its right child are the same.
Y.-W. ChangUnit 4 7
Floorplan Order
․Wheel: The smallest non-slicing floorplans (Wang and Wong, TCAD, Aug. 92).
․Order of a floorplan: a slicing floorplan is of order 2.․Floorplan tree: A tree representing the hierarchy of partitioning.
Y.-W. ChangUnit 4 8
Polar Graphs for General Floorplans․Ohtsuki, Suzigama, and Hawanishi, “An optimization technique for
integrated circuit layout design,” ICCST-70.․ vertex: channel segment; edge (weight): block (dimension)․Question: How to manipulate the graphs?
vertical polar graph
horizontal polar graph
Y.-W. ChangUnit 4 9
Slicing Floorplan Design by Simulated Annealing
․Wong & Liu, “A new algorithm for floorplan design,” DAC-86.
․Kirkpatrick, Gelatt, and Vecchi, “Optimization by simulated annealing,” Science, May 1983.
Y.-W. ChangUnit 4 10
Simulated Annealing Basics
․ Non-zero probability for “up-hill” moves.․ Probability depends on
1. magnitude of the “up-hill” movement2. total search time
․ C = cost(S') - Cost(S)․ T: Control parameter (temperature)․ Annealing schedule: T=T0, T1, T2, …,
where Ti = ri T0,r < 1.
f(x) = e-x
x
1
p = min{1, e-ΔC/T}
Y.-W. ChangUnit 4 11
Generic Simulated Annealing Algorithm1 begin2 Get an initial solution S; 3 Get an initial temperature T > 0; 4 while not yet “frozen” do5 for 1 i P do6 Pick a random neighbor S' of S;7 cost(S') - cost(S);
/* downhill move */8 if 0 then S S'
/* uphill move */9 if > 0 then S S' with probability ;10 T rT; /* reduce temperature */ 11 return S;12 end
Y.-W. ChangUnit 4 12
Basic Ingredients for Simulated Annealing
․Analogy:
․Basic Ingredients for Simulated Annealing: Solution space: What are the feasible solutions? Neighborhood structure: How to find a neighboring
solution from the current one? Cost function: How to evaluate the quality of a
solution? Annealing schedule: How to conduct the search
process to find a desired solution?
Y.-W. ChangUnit 4 13
Solution Representation․An expression E = e1 e2… e2n-1, where ei {1, 2, …, n, H, V},
1 i 2n-1, is a Polish expression of length 2n-1 iff1. every operand j, 1 j n, appears exactly once in E;2. (the balloting property) for every subexpression Ei = e1 … ei,
1 i 2n-1, # operands > # operators.
․Polish expression Postorder traversal.․ ijH: rectangle i on bottom of j; ijV: rectangle i on the left of j.
Y.-W. ChangUnit 4 14
Redundant Representation
․Question: How to eliminate ambiguous representation?
Y.-W. ChangUnit 4 15
Normalized Polish Expression
․A Polish expression E = e1 e2 … e2n-1 is called normalized iff E has no consecutive operators of the same type (H or V).
․Given a normalized Polish expression, we can construct a unique rectangular slicing structure.
Y.-W. ChangUnit 4 16
Neighborhood Structure
․ Chain: HVHVH … or VHVHV …
․Adjacent: 1 and 6 are adjacent operands; 2 and 7 are adjacent operands; 5 and V are adjacent operand and operator.
․3 types of moves: M1 (Operand Swap): Swap two adjacent operands. M2 (Chain Invert): Complement some chain (V = H, H = V). M3 (Operator/Operand Swap): Swap two adjacent operand
and operator.
Y.-W. ChangUnit 4 17
Effects of Perturbation
․Question: The balloting property holds during the moves? M1 and M2 moves are OK. Check the M3 moves! Reject “illegal” M3 moves.
․Check M3 moves: Assume that M3 swaps the operand ei with the operator ei+1, 1 i k-1. Then, the swap will not violate the balloting property iff 2Ni+1 < i. Nk: # of operators in the Polish expression E = e1 e2 … ek,
1 k 2n-1
1 23
4
Y.-W. ChangUnit 4 18
Cost Function․ = A + W.
A: area of the smallest rectangle W: overall wiring length : user-specified parameter
․W= ijcij dij. cij: # of connections between blocks
i and j. dij: center-to-center distance
between basic rectangles i and j.
1 23
4
Y.-W. ChangUnit 4 19
Area Computation for Hard Blocks
․Wiring cost? Center-to-center interconnection length
• Allow rotation: each block has up to two implementations.
Y.-W. ChangUnit 4 20
Incremental Computation of Cost Function
․Each move leads to only a minor modification of the Polish expression.
․At most two paths of the slicing tree need to be updated for each move.
Y.-W. ChangUnit 4 21
Incremental Computation of Cost Function (cont'd)
Y.-W. ChangUnit 4 22
Annealing Schedule
․Initial solution: 12V3V … nV.
․Ti = ri T0, i = 1, 2, 3, …; r =0.85.․At each temperature, try kn moves (k = 5-10).․Terminate the annealing process if
# of accepted moves < 5%, temperature is low enough, or run out of time.
Y.-W. ChangUnit 4 23
1 begin2 E 12V3V4V … nV; /* initial solution */3 Best E; T0 ; M MT uphill 0; N = kn; 4 repeat 5 MT uphill reject 0; 6 repeat 7 SelectMove(M); 8 Case M of 9 M1: Select two adjacent operands ei and ej; NE Swap(E, ei, ej);10 M2: Select a nonzero length chain C; NE Complement(E, C);11 M3: done FALSE;12 while not (done) do13 Choice 1: Select two adjacent operand ei and operator ei+1;14 if (ei-1 ei+1) and (2 Ni+1 < i) then done TRUE; 13’ Choice 2: Select two adjacent operator ei and operand ei+1;14’ if (ei ei+2) then done TRUE; 15 NE Swap(E, ei, ei+1);16 MT MT+1; cost cost(NE) - cost(E);17 if (cost 0) or (Random < )18 then19 if (cost > 0) then uphill uphill + 1;20 E NE;21 if cost(E) < cost(best) then best E;22 else reject reject + 1; 23 until (uphill > N) or (MT > 2N); 24 T rT; /* reduce temperature */25 until (reject/MT > 0.95) or (T < ) or OutOfTime; 26 end
Algorithm: Wong-Liu (P, , r, k)
Y.-W. ChangUnit 4 24
Shape Curve for Floorplan Sizing․A soft (flexible) block b can have different aspect ratios, with
a fixed area A. ․The shape function of b is a hyperbola: xy = A for width x
and height y. ․Very thin blocks are often not interesting and feasible
Add two straight lines for the constraints on aspect ratios. Aspect ratio: r y/x s.
y = sx
y = rx
legal shapesy
xx
y
Y.-W. ChangUnit 4 25
Shape Curve․Since a basic block is built from discrete transistors, its shape
function contains discrete points on the hyperbola.․For a rigid/hard block, it can only be rotated and mirrored
during floorplanning or placement. ․Feasible region: points with corresponding dimensions that
can accommodate/implement the block ․Use a piecewise linear function with corner points to
approximate any shape function.
Shape curve of a 2 4 hard block
y
x
feasibleregion y
x
Corner points
Y.-W. ChangUnit 4 26
Vertical Abutment
․Composition by vertical abutment (horizontal cut) addition of shape functions.
IV dominates II (IV is a better implementation.)
Y.-W. ChangUnit 4 27
Deriving Shapes of Children
․A choice for the minimal shape of a composite blockfixes the shapes of its children blocks.
Y.-W. ChangUnit 4 28
Area Computation for Hard Blocks Revisited
• Allow rotation
Y.-W. ChangUnit 4 29
Feasible Implementations․Shape curves for different kinds of constraints where
the shaded areas are feasible regions.
(a) hard, fixed orientation
(b) hard, free orientation
(c) soft, fixedorientation
(d) soft, freeorientation
Y.-W. ChangUnit 4 30
Slicing Floorplan Sizing
․The shape functions of all leaf blocks are given as piecewise linear functions.
․Traverse the slicing tree to compute the shape functions of all composite blocks (bottom-up composition).
․Choose the desired shape of the top-level block Only the corner points of the function need to be evaluated for
area minimization.․Propagate the consequences of the choice down to the
leaf blocks (top-down propagation).․The sizing algorithm runs in polynomial time for slicing
floorplans NP-complete for general (non-slicing) floorplans
Y.-W. ChangUnit 4 31
b0
B*-Tree: Compacted Floorplan Representation․ Chang, et al., “B*-tree: A new representation for non-
slicing floorplans,” DAC-2k. Compact modules to left and bottom. Construct an ordered binary tree (B*-tree).
Left child: the lowest, adjacent block on the right (xj = xi + wi). Right child: the first block above, with the same x-coordinate
(xj = xi).
b0
b7
b8
b9b1 b2
b3
b6b5
b4
b7
b8
b9b1 b2
b3
b6b5
b4
n0
n7
n8
n9
n1
n2
n3
n4
n5
n6
A non-slicing floorplan Compact to left and down B*-tree
Y.-W. ChangUnit 4 32
n0
n2 n5
n3
n1
n4
n6
n8
n9
n7
n10
n11
B*-tree Packing․x-coordinates can be determined by the tree structure.
Left child: the lowest, adjacent block on the right (xj = xi + wi). Right child: the first block above, with the same x-coordinate
(xj = xi).․y-coordinates?
b7
b9
b8 b11
b10
b0
b1
b5
b2b4
b6
b3
b0
b7
b1
x7= x0+w0
x1= x0
(x0,y0)
w0
Y.-W. ChangUnit 4 33b7
b9
b8 b11
b10
b0
b1 b4b3
phorizontal contour
vertical contour
Computing y-coordinates․Horizontal contour: Use a doubly linked list to record the
current maximum y-coordinate for each x-range ․Based on the B*-tree topology and block width, make the
maximum y value within the x-ranges the y-coordinate․Reduce the complexity of computing a y-coordinate to
amortized O(1) time & linear time overall
n3
p
B*-tree topology
Y.-W. ChangUnit 4 34
B*-Tree Perturbation
․Op1: rotate a macro (same tree topology)․Op2: delete & insert․Op3: swap 2 nodes․Op4: resize a soft macro (same topology)
n0
n2 n5
n3
n1
n4
n6
n8
n9
n7
n10
n11
n0
n2 n5
n3
n4
n6
n8
n9
n7
n10
n11
n1
n0
n2 n5
n3
n4
n6
n8
n9
n7
n10 n11
n1
Op3(n1, n7) Op2(n11, n6)
35
B*-tree Based Placer/Floorplanner
By Tung-Chieh Chen
http://eda.ee.ntu.edu.tw/research.htm/
․ Rated the best representation for packing in [Chan, et. al, ISPD-05]
Y.-W. Chang 36
Diverse B*-tree Applications
Reconfigurable computing [ICCAD’04]& 3D IC flooprlanning
Microfluidic Biochip scheduling [DAC’06] Logistics facility layout [ICLEM’10]
Analog layout [ICCAD’02, ‘11; DAC’07, ’08, ’13, ‘16]
NOC implementation[ICRCF’09]
Test schedule of 24 TAM width; 28 cores, test time: 287,999 cc
[ASP-DAC’05]
Y.-W. ChangUnit 4 37
B*-tree Strengths․Binary tree based, efficient and easy․Transformation between a tree and its placement takes
only linear time (cf. O(n2) or O(n lg lgn) for sequence pair to be introduced shortly)
․Operate on only one B*-tree (vs. two O-trees)․Can evaluate area cost incrementally․Smaller solution space: only O(n! 4n/n1.5) combinations
(vs. O((n!)2) for sequence pair)․Flexible for dealing with various placement constraints
by augmenting the B*-tree data structure (e.g., preplaced, symmetry, alignment, bus position) and rectilinear modules
․Directly corresponds to hierarchical and multilevel frameworks for large-scale floorplan designs
Y.-W. ChangUnit 4 38
Weaknesses of Tree-Based Representations
․Representation may change after packing.
․Is only a partially topological representation; less flexible than a fully topological representation B*-tree can represent only
compacted placement․Adjacency information is
incomplete
(a)1
3
42
n1
n3
n4
n2
1
2 3
4
B*-tree??
May need to integrate with another data structure to remedy this insufficiency, e.g., with corner stitching [Tsao et al., ICCAD-11; Wu et al., DAC-16] (see Appendices C & D)
Y.-W. ChangUnit 4 39
P*-admissible Solution Space․ P-admissible solution space for Problem P (Murata et al.,
ICCAD-95)1. the solution space is finite, 2. every solution is feasible, 3. evaluation for each configuration is possible in polynomial time
and so is the implementation of the corresponding configuration, and
4. the configuration corresponding to the best evaluated solution in the space coincides with an optimal solution of P.
․ P*-admissible solution space (Lin & Chang, DAC-02, TCAD-04)5. The relationship between any two blocks is defined in the
representation (fully topological representation).․ Slicing floorplan is not P-admissible. Why?․ B*-trees are not a P*-admissible representation.․ A P*-admissible floorplan representation: Sequence Pair.
Y.-W. ChangUnit 4 40
Sequence Pair (SP)․Murata, Fujiyoshi, Nakatake, Kajitani, “Rectangle-Packing Based
Module Placement,” ICCAD-95.․Represent a packing by a pair of module-name sequences (e.g.,
(abdecf, cbfade)). Size of the solution space: (n!)2
․Correspond all pairs of the sequences to a P-admissible (P*-admissible) solution space.
․Search in the P-admissible (P*-admissible) solution space (by SA). Swap two nodes only in a sequence Swap two nodes in both sequences
Y.-W. ChangUnit 4 41
Relative Module Positions․ A floorplan is a partition of a chip into rooms, each containing at
most one block.․ Locus (right-up, left-down, up-left, down-right)
1. Take a non-empty room.2. Start at the center of the room, walk in two alternating
directions to hit the sides of rooms or previous loci.3. Continue until to reach a corner of the chip.
․ Positive locus +: Union of right-up locus and left-down locus.․ Negative locus -: Union of up-left locus and down-right locus.
Y.-W. ChangUnit 4 42
Geometrical Information
․No pair of positive (negative) loci cross each other, i.e., loci are linearly ordered.
․SP uses two sequences (+, -) to represent a floorplan. H-constraint: (..a..b.., ..a..b..) iff a is on the left of b V-constraint: (..a..b..,..b..a..) iff b is below a
(+, -) = (abdecf, cbfade)
Y.-W. ChangUnit 4 43
(+, -)-Packing․For every SP (+, -), there is a (+, -) packing.․Horizontal constraint graph GH(V, E) (similarly, GV(V, E)):
V: source s, sink t, n vertices for modules. E: (s, x) and (x, t) for each module x, and (x, y) iff x must be left
to y. Vertex weight: 0 for s and t, module widths for other vertices.
Y.-W. ChangUnit 4 44
Cost Evaluation․Optimal (+, -)-packing can be obtained in O(n2) time by
applying a longest path algorithm on a vertex-weighted directed acyclic graph. GH and GV are independent. The X and Y coordinates of each module are the values of the
longest path lengths between s and the corresponding vertex in GH and GV, respectively.
․Cost evaluation can be done in O(n lg lg n) time by computing the longest common subsequence of the two sequences (Tang & Wong, DATE-2K, ASP-DAC-01)
Y.-W. ChangUnit 4 45
P*-admissible Transitive Closure Graph (TCG)․Lin & Chang, “TCG: A transitive closure graph representation
for non-slicing floorplans,” DAC-01 (TVLSI-04)․TCG = (Ch, Cv): pair of vertical and horizontal constraint
graphs. Ch (Cv) represents the horizontal (vertical) geometric relations
between modules. Transforms diagonal relations into horizontal relations if no
vertical constraints among modules.․Vertex weights denote module widths (heights).
mamb
mc md
me
mamb
mc
ma
md
ma
me
b1mb
mc
mb
md
me
b2
mc md
me
b3 b4
b5
Ch Cv
n2n2
4
6
n3 n37 4
n4
n4
6
3n5 n53 2
n1 n16 4
Y.-W. ChangUnit 4 46
Feasibility of TCG1. Ch and Cv are acyclic.
2. Each pair of nodes must be connected by exactly one edge either in Ch or in Cv .
3. The transitive closure of Ch ( Cv ) is equal to Ch ( Cv) itself.
Ch Cv
n5
n1
n5
n3
n2
n42
4
4
n1
n3n2
n4
37
64
63
6
Ch Cv
n5
n1
n5
n3
n2
n42
4
4
n1
n3
n2
n4
37
64
63
6
b1b2
b3 b4
b5
b1b2
b3 b4
b5
b5
Infeasible placement
Y.-W. ChangUnit 4 47
Feasibility of TCG2. Each pair of nodes must be connected by exactly one edge either in
Ch or in Cv to prevent any overlap among modules. n5
n1
n5
n3
n2
n42
4
4
n1
n3
n2
n4
37
64
63
6 b1b2
b3 b4
b5
b1b2
b4b5b3 b3 and b5
overlap
Ch Cv
6Ch Cv
n5
n1
n5
n3
n2
n42
4
4
n1
n3
n2
n4
37
64
63
Y.-W. ChangUnit 4 48
Feasibility of TCG3. The transitive closure of Ch ( Cv ) is equal to Ch ( Cv ) itself to reduce
the solution space.
The transitive closure of a directed graph G=(V, E) is the graphG’ =(V, E’), where E’={(ni, nj): there is a path from ni to nj in G}.n5
n1
n5
n3
n2
n42
4
4
n1
n3
n2
n4
37
64
63
6 b1b2
b3 b4
b5
Ch Cv
Ch Cv
n5
n1
n5
n3
n2
n42
4
4
n1
n3
n2
n4
37
64
63
6 b1b2
b3 b4
b5
Redundantsolution
Y.-W. ChangUnit 4 49
Boundary Modules in TCG․ bi is a left (right) boundary module if ni’s in-degree (out-degree) in
Ch is zero.․ bi is a bottom (top) boundary module if ni’s in-degree (out-degree)
in Cv is zero.
4
1
3
25
7
6n1
Ch Cv
n2
2
n3
4
n6
n7
4
7
n5
n43
n1
n2
4
n3
5
n6
n77
n5
n4
3
10
6
3
4
1
3
2
The in-degrees of n1, n2, and n3 are zero
Y.-W. ChangUnit 4 50
P*-admissible TCG-S: Combination of Two․ Lin & Chang, “TCG-S: An orthogonal coupling of P*-
admissible representations for general floorplans,” DAC-2002 (TCAD-2004)
․TCG-S = (Ch, Cv , s): TCG + a module sequence. s represents the packing sequence of modules
Iteratively traverse the module in the leftmost with all modules below it having been traversed.
Is used for speeding up the packing scheme (O(n lg n) time).․ Leads to faster convergence speed and more stable results.
s: b1 b2 b3 b5 b4
mamb
mc md
me
mamb
mc
ma
md
ma
me
b1mb
mc
mb
md
me
b2
mc md
me
b3 b4
b5
Ch Cv
n2
n2
4
6
n3 n37 4
n4
n4
6
3n5 n53 2
n1 n16 4
Y.-W. Chang
TCG
Unit 4 51
Convergence Speed & Stability
․Convergence speed and stability are two important criteria to evaluate the quality of a representation.
․TCG-S converges very fast and is very stable.․Convergence speed and stability: TCG-S > TCG > SP.
SP TCG TCG-S
Y.-W. ChangUnit 4 52
Comparisons
3O(n)CS
3O(n)O(n!22n/n1.5)O-tree
3O(n)O(n!22n/n1.5)B*-tree
4OO(n lg n)(n!)2TCG-S
4OO(n2)(n!)2TCG
O(n)??
4OO(n2)n! C(n2, n)BSG
4OO(n2)(n!)2SP
FlexibilityGuarantee Feasible
Perturbations?Packing
TimeSolution Space
Flexibility: Can represent 4 (general; P*-admissible);3 (compacted; P-admissible); 2 (mosaic); 1 (slicing)
NormalizedPolish Exp.
O(n!22.6n/n1.5) O(n) O 1
CBL O(n!23n) x
2O((n!)2)
Represent.
Y.-W. ChangUnit 4 53
Soft Block Sizing․Key: Line up a soft block with adjacent blocks
Each soft block has four candidates for the block dimension change to optimize the cost.
․Advantage: fast and reasonably effective․Similar idea by Chi and Chen Chi, Chung Yuan Journal,
2003.
b2b1b2b0
b4
b3
b5
L3 R3
T3
B3
b0 b1
b4
b3
b5
(a) (b)
xx
y y
Y.-W. ChangUnit 4 54
Coping with Pre-placed Modules․If there are modules ahead or lower than bi so that bi
cannot be placed at its fixed position (xfi, yf
i), exchange bi with the module in Di = {bj | (xj, yj) (xf
i, yfi)} that is
closest to (xfi,yf
i).․Fix the placement for affected modules (as in slack-
based sizing)
Y.-W. ChangUnit 4 55
Coping with Rectilinear Blocks․Wu, Chang, Chang, “Rectilinear block placement using
B*-trees,“ ACM TODAES, 2003 (ICCD-01)․Convex block: partition a rectilinear block into
rectangular sub-blocks․Concave block: fill the concave holes of a concave
block to make the block convex (other approximate schemes?)
filled area
b2b1
Convex blockConcave block
Y.-W. ChangUnit 4 56
b3
b4
b1
b2
0 2
1
0
b1
b2
1
b3
2
b4
location constraint
Location Constraint for Rectilinear Blocks
․Location Constraint: Keep b2 as b1’s left child in the tree.․x-coordinates can be correctly determinated
y-coordinates? Align sub-blocks, if necessary.
․Treat the sub-blocks of a block as a whole during processing.
b2
b1b1b1b1b1b1b1b1b1
Align blocks to fix y-coordinates
Y.-W. ChangUnit 4 57
Coping with Symmetry Blocks for Analog Placement
Schematic of the CMOS comparator
․Some pairs of devices need to be placed symmetrically w.r.t. common axes to reduce parasitic mismatches and circuit sensitivities to thermal gradients or process variations for better electrical effects and higher performance
․Form symmetry islands that keep devices of the same symmetry group placed at the closest proximity
Placement with symmetry constraint
Y.-W. Chang 58
Basic Analog Placement Constraints
․Device matching/symmetry Reduce mismatches which may
cause higher offset voltages or degrade power-supply rejection ratio
Inter-digitized or common centroidplacement: current mirrors, differential pairs, ratioed devices
Mirrored placement w.r.t. the vertical or horizontal symmetry axis: differential circuits
․Device proximity Place matched devices together or
closely to reduce the impact of local variations during fabrication
common centroid placement
Symmetry placementN-well PMOS
P-substrate NMOSDevice-proximity
placement
Y.-W. Chang 59
Advanced Analog Placement Constraints1. Preplaced constraint2. Fixed-outline constraint3. Variant constraint4. Boundary constraint5. Minimum spacing constraint6. Hierarchical constraints7. Layout design hierarchy8. Regularity constraints9. Thermal consideration
Proximity
Matching
Symmetry
Proximity
Regular structure
Hierarchical constraintsPreplaced modules
changeable aspect ratio[Source: Springsoft & Po-Hung Lin]
symmetry
Y.-W. ChangUnit 4 60
Symmetry Placements
․Symmetry groups: S = {b0s, (b1, b1’), b2
s, b3s}
․Symmetry pair: (b1, b1’)․Self-symmetry blocks: b0
s, b2s, b3
s
left boundary block
b1
b3s
b0s
b1’
b0r
b3r
b1rb2
rb2s
n3r
n2r
n1r
n0r
rightmostbranch
bottom boundary blockS = {(b0, b0’), b1s, b2
s, b3s}
b1s
b0’
b0
b3s
b3r
b0r
b1r
b2s n0
r
n1r
n2r
n3r
b2r
left mostbranch
verticalsymmetry
horizontalsymmetry
B*-trees with representative nodes/blocks
Y.-W. ChangUnit 4 61
Hierarchical B*-trees (HB*-trees)․Handle symmetry-island and non-symmetry blocks together
․Keep the horizontal contour for a hierarchy node representing a rectilinear symmetry island
b1s
b2
b6’
b6
b2’b5
b7s b8
sb4b3
n5
nS1
n3
n4
nS2
n2r
n1r
n6r
n7r
n8r
hierarchy node block nodeB*-subtree in a hierarchy node
n1r
n0r
n2r
b1 b1’b2 b2’
b0 b0’
S0
c00 c02nS0
n00
n01
n02
horizontal contour hierarchy nodecontour node
c01
Y.-W. ChangUnit 4 62
Example HB*-tree for Symmetry Placement
b11b10
b6
b12
b5b8
b7
n5
n9
n6
n7
b9
b13b14
n8 n10
n00
n01
n02 n11
n12
n10
n11
n12 n13
n14
nS0
nS1
b1 b1’b2 b2’
b0 b0’
b3 b3’ b4’b4
n1r
n0r
n2r
n3r
n4r
hierarchy node
contour node
symmetrymodule nodenon-symmetrymodule node
b1 b1’b2 b2’
b0 b0’
b3 b3’ b4’b4
symmetry axis
symmetry axis
Y.-W. Chang 63
Analog Placement Examples
Circuit biasynth_2p4g# of Mod. 65
# of Sym. Mod. 8 + 5 + 12Area utilization 95.53%
Runtime 22 sec
Circuit lnamixbias_2p4g# of Mod. 110
# of Sym. Mod. 16 + 6 + 6 + 12 + 4
Area utilization 94.59%Runtime 43 sec
Y.-W. Chang64
1st Linear-Time Algorithm for All the 10 Constraints․ Wu, Ou, Chang: “QB-trees: Towards an optimal topological
representation and its applications to analog layout designs,” DAC-16.
preplaced
symmetry
boundary
max sep.
․ Lu, Chang, Chang: “WB-trees: a topological representation for FinFETanalog layout designs,” DAC-18.
Y.-W. ChangUnit 4 65
Fixed-Outline Floorplanning․Chen and Chang, “Modern floorplanning based on fast
simulated annealing,” ISPD-05 (TCAD-06)․Fixed-outline floorplanning is more prevailing in modern
VLSI design
fixed outline
net bounding-boxnet
modules
․Input Modules, netlist, fixed outline
․Output Module positions, orientations
․Objectives Minimize the half-perimeter wirelength
(HPWL) All modules are within the fixed die (fixed-
outline constraint) and no overlaps occur between modules
Y.-W. ChangUnit 4 66
Fixed-Outline Constraints
․Two user-specified parameters: Γ: maximum white-space fraction, and R*: desired aspect ratio (height/width)
․The outline (height H* and width W*) is defined by
․Use the same formulation as Adya et al. (ICCD-01).
*/)1(**)1(* RAWARH
Γ=0.15H*
W*
R*=2H*
W*
R*=1 Γ=0.50
A(1+Γ ) = H*W*R* = H* / W*
Y.-W. ChangUnit 4 67
Cost Function for Fixed-Outline Floorplanning
․Cost for a floorplan F
A Chip areaArea weight
W WirelengthWirelength weight
R* Desired aspect ratioR Current floorplan aspect ratio
2)*)(1()( RRWAF
Chip area Wirelength Aspect ratio penalty
Y.-W. ChangUnit 4 68
Adaptive Simulated Annealing․The aspect ratio of the best floorplan area in the fixed
outline is not the same as that of the outline. ․Shall decrease the weight of aspect ratio penalty
(1 - α - β) to concentrate more on the floorplanwirelength/area optimization (i.e., increase α and β). Adopt an adaptive method to control the weights in
the cost function based on n most recent floorplans. The more feasible floorplans, the less aspect ratio
penalty.
(a) (b)
2
3 1
41
2 3
4
(a) (b)
Y.-W. ChangUnit 4 69
Fixed-Outline Floorplanning (1/2)
․Success probability vs. aspect ratio on circuit n100
0102030405060708090
100
1 1.5 2 2.5 3 3.5 4
Succ
ess r
ate
(%)
Aspect Ratio
Parquet-2
Ours
GFA
0
20
40
60
80
100
1 1.5 2 2.5 3 3.5 4
Succ
ess r
ate
(%)
Aspect Ratio
Parquet-2OursGFA
n100, Γ=10% Parquet-2: SP (TVLSI-2003)
GFA: NPE(ASPDAC-04)
Fast-SA: B*-tree (ISPD-05)
Avg. success rate 16.6% 30.3% 99.7%Avg. dead space 7.32% 6.26% 5.79%
Avg. dead space ratio 1.26 1.08 1.00Avg. runtime (sec) 40.2 44.5 27.6Avg. runtime ratio 1.46 1.61 1.00
Γ=15%
Γ=10%
Y.-W. ChangUnit 4 70
B*-tree Fixed-Outline Floorplanning Results
Circuit: n100
Circuit: ami49
Y.-W. ChangUnit 4 71
Multilevel B*-trees for Large-Scale Designs․Lee, Hsu, Chang, Yang, “Multilevel
floorplanning/placement for large-scale modules using B*-trees,” DAC-03 (TCAD-07)
․Two stages for MB*-tree: clustering followed by declustering
․Clustering Iteratively groups a set of modules based on area
utilization and module connectivity Constructs a B*-tree to keep the geometric relations
for the newly clustered modules․Declustering
Iteratively ungroups a set of the previously clustered modules (i.e., perform tree expansion)
Refines the solution using simulated annealing
Y.-W. ChangUnit 4 72
MB*-tree: Λ-Shaped Multilevel Floorplanning
Cluster the modules based on area and local connectivity andcreate clustered modules for the next level.
Recursively decluster the clusters and use simulated annealing to refine the floorplan.
single clustered moduleclustering
clustering declustering
declustering
clustered blockchip boundary
Y.-W. ChangUnit 4 73
Multilevel B*-tree Example
Y.-W. ChangUnit 4 74
Multilevel B*-tree Example (cont'd)
n7n6
n7n6n3 n3
n4
Y.-W. ChangUnit 4 75
Comparison among Representations
MB*-tree
MB*-treeMB*-tree
Y.-W. ChangUnit 4 76
Layout of ami49_200․MB*-tree: 9800 modules, dead space = 3.44%, CPU
time = 256 min (450 MHz SUN Ultra 60 workstation with 2 GB memory in 2002)
Y.-W. ChangUnit 4 77
MB*-tree: Λ-Shaped Multilevel FloorplanningCluster the modules based on area and local connectivity andcreate clustered modules for the next level.
Recursively decluster the clusters and use simulated annealing to refine the floorplan.
single clustered moduleclustering
clustering declustering
declustering
clustered blockchip boundary
Drawback: Does not have the view of the global configuration at the earlier stages
Y.-W. ChangUnit 4 78
Perform partitioning to the circuit and determine the global locations of modules for the next level.
Use the flat floorplanner to pack the modules in the partitions and legalize/refine the solution.
initial floorplan
partitioning
partitioning
merging/refinement
partitioned floorplan
final floorplan
merging/refinement
floorplan region
packed modulesfloating modules
overlap area
IMF: V-Shaped Multilevel Floorplanning
Consider the global interconnect at the earlier stages
․Chen, Chang, Lin, “IMF: Interconnect-driven floorplanningfor large-scale building-module designs,” ICCAD-05
Y.-W. ChangUnit 4 79
Design Flow
Chip outline
Bi-partition
Obtain partitioned result
Refine the merged region
Final floorplan
All regions have less than
n modules?
Merge two regions
Top level?Y
N
Y
N
Partitioning stage
Merging stage
Floorplan each region
Y.-W. ChangUnit 4 80
Stage 1: Partitioning Stage
․All modules are set to the center of the chip region initially.
․Partition the circuit recursively to minimize the interconnect and assign the regions of the modules.
․The partitioning stage continues until the number of modules in each partition is smaller than a threshold, and the partitioned floorplan is obtained.
Y.-W. ChangUnit 4 81
․Construct a B*-tree and find the sub-floorplans for each sub-region (fixed-outline floorplanning).
․Cost function for the simulated annealing: Fixed-outline floorplanning
․Merge two B*-trees (sub-floorplans) to form a new B*-tree (floorplan) recursively.
․Refine the merged sub-floorplan using fixed-outline floorplanning again.
2
*
*
321 ***
HW
HWkwirelengthkareakCost
Aspect ratio penalty
Stage 2: Merging Stage
Y.-W. ChangUnit 4 82
Vertical Merging
b4 b5
b6 b7
b0 b1
b2 b3
n0
n1 n2
n3
T1
n4
n5 n6
n7
T2
heightnew ≤ height1 + height2widthnew = max( width1, width2 )
heig
ht1
heig
ht2
width2
width1
b4 b5
b6 b7
b0 b1
b2 b3
widthnew
heig
htne
w
Make the root of the top B*-tree as the right child of the right-most node of the bottom B*-tree.
n0
n1 n2
n3
T1
n4
n5 n6
n7
T2
Y.-W. ChangUnit 4 83
b4 b5
b6 b7
b0 b1
b2 b3
n0
n1 n2
n3
T1 n4
n5 n6
n7
T2
heightnew = max( height1, height2)widthnew = width1 + width2
width1 width2he
ight
1
heig
ht2
widthnew
b4 b5
b6 b7
b0 b1
b2 b3
heig
htne
w
Horizontal Merging
Make the root of the right B*-tree as the left child of the node corresponding to the right-most module of
the left B*-tree.
n0
n1 n2
n3
T1
n4
n5 n6
n7
T2
Y.-W. ChangUnit 4 84
Resulting Floorplans
n300 (300 modules) ami49_200 (9800 modules)
Y.-W. ChangUnit 4 85
Floorplanning by Mathematical Programming․Sutanthavibul, Shragowitz, and Rosen, “An analytical
approach to floorplan design and optimization,” 27th DAC, 1990.
․Notation: wi, hi: width and height of module Mi. (xi, yi): coordinate of the lower left corner of module Mi. ai wi /hi bi: aspect ratio wi /hi of module Mi. (Note: We defined
aspect ratio as hi/wi before.)․Goal: Find a mixed integer linear programming (ILP)
formulation for the floorplan design. Linear constraints? Objective function?
Y.-W. ChangUnit 4 86
Nonoverlap Constraints․Two modules Mi and Mj are nonoverlap, if at least one of the
following linear constraints is satisfied (cases encoded by pij and qij):
․Let W, H be upper bounds on the floorplan width and height.․ Introduce two 0, 1 variables pij and qij to denote that one of the
above inequalities is enforced; e.g., pij = 0, qij = 1 yi + hi yj is satisfied
Y.-W. ChangUnit 4 87
Cost Function & Constraints
․Minimize Area = xy, nonlinear! (x, y: width and height of the resulting floorplan)
․How to fix? Fix the width W and minimize the height y!
․Four types of constraints:1. no two modules overlap ( i, j: 1 i < j n);2. each module is enclosed within a rectangle of width W
and height H (xi + wi W, yi + hi H, 1 i n);3. xi 0, yi 0, 1 i n;4. pij, qij {0, 1}.
․wi, hi are known.
Y.-W. ChangUnit 4 88
Mixed ILP for Floorplanning
․Size of the mixed ILP: for n modules, # continuous variables: O(n); # integer variables: O(n2); # linear
constraints: O(n2). Unacceptably huge program for a large n! (How to cope with it?)
․Popular LP software: LINDO, lp_solve, CPLEX, etc.
Y.-W. ChangUnit 4 89
Mixed ILP for Floorplanning (cont'd)
․For each module i with free orientation, associate a 0-1 variable ri: ri = 0: 0o rotation for module i. ri = 1: 90o rotation for module i.
․M = max{W, H}.
+-
Y.-W. ChangUnit 4 90
Flexible/Soft Modules․Assumptions: wi, hi are unknown; area lower bound: Ai.․Module size constraints: wi hi Ai; ai wi / hi bi.․Hence, ․wi hi Ai nonlinear! How to fix?
Can apply a first-order approximation of the equation: a line passing through (wmin, hmax) and (wmax, hmin).
Substitute i wi + ci for hi to form linear constraints (xi, yi, wi are unknown; i, j, ci, cj can be computed as above).
Y.-W. ChangUnit 4 91
Reducing the Size of the Mixed ILP․Time complexity of a mixed ILP: exponential!․Large size of the mixed ILP: # variables, # constraints: O(n2).
How to fix it?․Key: Solve a partial problem at each step (successive augmentation)․Questions:
How to select next subgroup of modules? linear ordering based on connectivity.
How to minimize the # of required variables?
Y.-W. ChangUnit 4 92
Reducing the Size of the Mixed ILP (cont'd)․Size of each successive mixed ILP depends on (1) # of modules in the
next group; (2) “size” of the partially constructed floorplan.․Keys to deal with (2)
Minimize the problem size of the partial floorplan. Replace the already placed modules by a set of covering rectangles. # rectangles is usually much smaller than # placed modules.
Y.-W. ChangUnit 4 93
Notes on ILP Based Approaches
․Always analyze the complexity # of variables # of constraints
․Always try to reduce the problem size Hierarchical (divide-and-conquer) Successive/progressive: 1D or 2D Other reduction method?
Hierarchical 1D progressive 2D progressive
Y.-W. ChangUnit 4 94
Summary on Floorplan Representations․Generic floorplanning objectives: (1) minimize area, (2) meet timing
constraints, (3) maximize routability (minimize congestion), ((4) determine shapes of soft modules)
․Existing representations Slicing: slicing tree (DAC-82), normalized Polished expression
(DAC-86, ASPDAC-05), generalized Polish expression (ASPDAC-04)
Mosaic: CBL (ICCAD-2k), Q-Sequence (AP-CAS-2k, DATE-02), Twin binary tree (ISPD-01)
Compacted: O-tree (DAC-99), B*-tree (DAC-2k), CS (TVLSI, 2003), T-tree (3D; ICCAD-04)
General: SP (ICCAD-95), BSG (ICCAD-96), TCG (DAC-01), TCG-S (DAC-02), ACG (ICCD-04), 3D-subTCG (ASPDAC-04), Sequence triplet (3D: IEICE)
general restricted
generalcompacted
mosaicslicing
Y.-W. ChangUnit 4 95
More on Floorplan Representations․P*-admissible representations: all representations for
general floorplans.․P-admissible, non-P*-admissible representations (for
area): all for compacted floorplans.․What makes a good representation?
Easy, effective, efficient, flexible, stable․Since each representation has its pros and cons, so
maybe we can Integrate two or more representations to get a better one
(e.g., TCG-S, DAC-02; CB-tree, ICCAD-11; QB-tree, DAC-16; WB-tree, DAC-18)
Apply different representations at different stages?
general restricted
generalcompacted
mosaicslicing
Y.-W. ChangUnit 4 96
Other Floorplanning Problems․ Soft module: shape curve (NPE, DAC-86), analytical techniques (DAC-90, DAC-
2k, ASPDAC-06), stretching range (B*-tree, DAC-2k), Lagrangian relaxation (SP, ISPD-2k), ICCAD-08 (SP), ISPD-12 (SP)
․ Preplaced module: ASPDAC-98 (BSG), ASPDAC-01 (SP), DAC-2K (B*-tree), ISCAS-01 (B*-tree), DAC-02 (TCG-S)
․ Symmetry module: DAC-99 (SP), ICCAD-02 (B*-tree), ASPDAC-05 (B*-tree), ICCAD-06 (SP), DAC-07 (B*-tree), DAC-08 (B*-tree), ICCAD-11 (B*-tree), DAC-16 (B*-tree + Quad-tree)
․ Rectilinear module: TCAD-2K (SP), ICCAD-98 (SP), ISPD-98 (SP), ISPD-01 (SP), ISPD-01 (O-tree), DATE-02 (TCG), TVLSI-02 (TCG), ICCD-2K (B*-tree), ACM TODAES-03 (B*-tree).
․ Fixed-outline constraint: ICCD-01 (SP), ASPDAC-04 (NPE), ISPD-05 (B*-tree), ISPD-07 (SP), DAC-08 (NPE)
․ Abutment constraint: (slicing), (SP), ICCD-04 (B*-tree)․ Bus-driven constraint: ICCAD-2003 (SP), ISPD-05 (B*-tree)․ Range constraint: ISPD-99 (NPE), ASPDAC-01 (SP), DAC-02 (TCG-S), ICCD-04
(B*-tree)․ Boundary constraint: ASPDAC-01 (SP), DAC-02 (TCG-S), IEE Proc.-02 (B*-tree),
DAC-10 (B*-tree)․ Double patterning: DAC-13 (B*-tree), DAC-15 (NPE), DAC-18 (B*-tree)․ FinFET constraint: DAC-18 (B*-tree)
Y.-W. ChangUnit 4 97
Other Floorplanning Problems (cont’d)․Large-scale module floorplanning/placement (MB*-tree, DAC-03;
IMF: ICCAD-05; NPE: DAC-08)․Mixed-size cell/block floorplanning/placement (ISPD-02, ASPDAC-
03, ICCAD-03, ICCAD-04, ISPD-05, DAC-07, ICCAD-08, DAC-14)․Co-synthesis with floorplanning
Buffer planning (ICCAD-99, ISPD-2K, DAC-01, ASPDAC-03) Wire planning (ICCAD-99) Noise-aware floorplanning (ASPDAC-03, ICCAD-04) Power supply planning (ASPDAC-01, DAC-04, ISPD-06) SoC test scheduling (ICCAD-03, ASPDAC-05) Architecture-driven floorplanning (DAC-04) Double patterning (DAC-13, B*-tree; DAC-15, NPE) 3D (2.5D) floorplanning/placement: (DAC-13, B*-tree)
․B*-tree has minimized the gap between the representations for slicing and non-slicing floorplans.
․B*-tree-v1.0, TCG, and TCG-S packages are available at http://eda.ee.ntu.edu.tw/research.htm.
Y.-W. ChangUnit 4 98
Macro Placement/Floorplanning․Mixed sized cell/block floorplanning/placement
(DAC-07, ICCAD-08, DAC-14) Apply floorplanning techniques for macros to
address various design constraints, e.g., range constraints, block rotation, block sizing
multi-domain floorplanning/placementfloorplanning w. range constraints
Y.-W. ChangUnit 4 99
Voltage-Island-Aware Floorplanning․Floorplanning with multiple supply voltages (voltage
islands): ICCAD-06, ICCAD-07
0
100
200
300
400
500
600
700
0 50 100 150 200 250 300 350 400
VDDH block
VDDL blockLevel shifter
better floorplan
VDDH power ring
VDDL power ring
VDDH power line
VDDL power line
Y.-W. ChangUnit 4 100
2.89V 2.95V
1.46V 2.23V1.8V
SM1
SM2
HM1
HM2 HM3
3V
SM2
SM1
HM2
HM1HM3
SM2
HM1
HM2 HM3
SM1
SM2
violation
Voltage-Drop-Aware Floorplanning․Floorplan co-synthesized with other circuit components
E.g., Power/ground networks for static/dynamic IR drop minimization (ISPD-06, ASP-DAC-10), buffer blocks for timing optimization, level shifters for multiple supply voltage designs (ICCAD-07), clock network synthesis
Y.-W. ChangUnit 4 101
Beyond 2D Floorplanning․Floorplanning for digital microfluidic biochips (DAC-06)․Floorplanning for reconfigurable computing (ICCAD-04)․2.5D/3D IC floorplan and power/ground network co-synthesis
(ASP-DAC-10)․2.5D/3D floorplanning for system-in-packages & interposer
(DAC-13)
Y.-W. ChangUnit 4 102
Appendix A:Fast Simulated Annealing
Chen and Chang, “Modern floorplanning based on fast simulated annealing,” ISPD-05 (TCAD-06)
Time
Temperature
Classical SA TimberWolf SA Fast-SA
Time
I II III
Time(a) (b) (c)
Y.-W. ChangUnit 4 103
Simulated Annealing Schedules․Classical simulated annealing (SA)
Non-zero probability for up-hill move: p = min{1, e-ΔC/T}
Initial temperature: T = |Δavg / ln p|, p is the initial acceptance rate (typically, close to 1.0), Δavg is the average cost of the up-hill moves
Classical temperature updating function: λ is set to a fixed value (e.g., 0.85)
Tnew = λTold, 0 < λ< 1 ․TimberWolf annealing schedule (Sechen and
Sangiovanni-Vincentelli, DAC-86) Increase λ gradually from its lowest value (0.8) to its
highest value (approximately 0.95) and then gradually decreases λ back to its lowest value.
Y.-W. ChangUnit 4 104
Fast Simulated Annealing․Chen and Chang, “Modern floorplanning based on fast
simulated annealing,” ISPD-05 (TCAD-06)․Comparisons for the temperature vs. search time:
․Fast Simulated Annealing (Fast-SA) consists of 3 stages High-temperature random search (temperature T -> ∞ ) Pseudo-greedy local search (T -> 0) Hill-climbing search (increase T to simulate regular SA)
Time
Temperature
Classical SA TimberWolf SA Fast-SA
Time
I II III
Time(a) (b) (c)
Y.-W. ChangUnit 4 105
Fast Simulated Annealing (2/2)
․Temperature update (T1: initial temperature)
․If is larger, temperature decreases slowly. ․If is smaller, temperature decreases quickly.
cos
cos
1ln
21 tr
1 t
avg rp
TT r krc
Tr k
r
Average uphill cost
Initial acceptance rate
Average cost change since the SA started
Number of iterations
User-specified parameters (e.g., c = 100, k = 7)
cost
avg
p
r
cost
cost
c, k
Y.-W. ChangUnit 4 106
Convergence and Stability for Fast-SA (1/2)
․Classical SATimberWolf SAFast-SA, k=1 (no greedy local search)Fast-SA, k=7
․Ran the circuit n100 for 10 times.
․Fast-SA has a better convergence speed than TimberWolf SA and classical SA.
Classical SA
1718192021222324252627
0 40 80 120 160 200Runtime (sec)
Area TimberWolf SA
1718192021222324252627
0 40 80 120 160 200Runtime (sec)
Area
Fast-SA, k = 1
1718192021222324252627
0 40 80 120 160 200Runtime (sec)
Area Fast-SA, k = 7
1718192021222324252627
0 40 80 120 160 200Runtime (sec)
Area
Y.-W. ChangUnit 4 107
Appendix B:Other floorplan representations
ba bbbt
bs
b1 b2
b3 b4
b5b6
ns n5 n6Ln4 nt
Y.-W. ChangUnit 4 108
Corner Sequence (CS)• Lin, Chang, Lin, “Corner sequence: A P-admissible floorplan
representation with linear-time packing scheme,” IEEE TVLSI 2003.
• Sequence of modules and their corresponding corners CS = <(S1, D1), (S2, D2), …, (Sm, Dm) >
– Si: a module
– Di: the corresponding bend for packing Si
CS = <(b1, [bs , bt ]), (b2, [b1 , bt ]),(b3, [b1 , b2 ]), (b4, [b3, bt ]),(b5, [bs , b3]), (b6, [b5 , b4 ]) >
ba bbbt
bs
b1 b2
b3 b4
b5b6
ns n5 n6Ln4 nt
Y.-W. ChangUnit 4 109
Corner Block List (CBL)․Hong, et. al., “Corner block list: An effective and efficient
topological representation of non-slicing floorplan,” ICCAD-2K.․Each room contains one and only one block (mosaic floorplan).․CBL = (S, L, T):
S: sequence of corner modules. L: List of module orientations (0: vertical T-junction; 1:
horizontal one). T: list of T-junction information (# of T-junctions with the
corner block).
S = (fcegbad), L = (001100), T = (001010010)
f b
ca
e
g
d
Y.-W. ChangUnit 4 110
Appendix C:
Integrated Floorplan Representation:
QB-Tree = Quadtree + B*-tree
I-P. Wu, H.-C. Ou, and Y.-W. Chang,“QB-Trees: Towards An Optimal Topological
Representation and Its Applications to Analog Layout Designs,” DAC-16
Y.-W. Chang
Quadtree Data Structure
․Quadtree data structure is often used to represent a partition of a two-dimensional region Regular decomposition Irregular decomposition
111
F G
H IC
DJ K
L M
A
B C ETL TR BL BR
D
F G ITL TR BL BR
H J K MTL TR BL BR
L
Y.-W. Chang
Construction Scheme of QB-trees․Partition the placement region with partition lines along the
preplaced modules’ boundaries recursively․Select partition lines with minimum cost in each recursion
112
pj
pi
, = , ,
,
,pk
pj
preplaced module
partition lines pj
partition line i
,
=
Y.-W. Chang
Packing Scheme of QB-trees․Adopt and revise the original B*-tree packing in each
sub-tree of one QB-tree․Construct a packing list to maintain the linear-time
packing complexity
113
p2
p1 b1
TL TR BL BR
p1
p2
n1
quadtree
B*-tree
n3
n4 n6
n5
n2
b2b3b4
b5b6
HEAD
n1
n2
n3
Y.-W. Chang
Proposed Analog Placement Flow
114
QB-tree Construction
Modules Netlists Constraints
QB-tree Perturbation
Look-ahead Constraint Checking
Candidate Generation
QB-tree Packing
Packing
Perturbation
success fail
Cost Evaluation
Meet termination conditions?
no
yesFinish
success
fail
Y.-W. Chang
Look-ahead Constraint Checking․Consist of two parts
Quadtree leaf recognition Constraint checking
․Filter out infeasible QB-tree configurations before packing Maximum separation/range/close-to-boundary constraints The runtime could be reduced by avoiding infeasible solutions
115
p2
p1
infeasible
n2
n1
p2
b1
b2
p1
Y.-W. Chang
Look-ahead Constraint Checking Complexity․Time complexity
Maximum separation constraint: Range constraint: Close-to-boundary constraint:
116
p2
p1
feasible
n1
p2
b1
b2
p1
n2
Y.-W. Chang
Candidate Generation․Generate feasible QB-tree configurations to facilitate the
placement flow․Consist of two parts
Candidate quadtree leaf collection Transformation from an infeasible configuration to a feasible one
․Time complexity by setting an upper bound of / on the traversed
quadtree leaves for each constraint
117
p2
p1 n1
p2
b1p1
Y.-W. Chang
Cost Evaluation․Define the cost of a placement solution as follows:
: normalized placement area : normalized placement wirelength : normalized out-of-bound area : normalized violation cost
118
Φ ,
max , 1, 0
violation of maximum separation constraint
violation of range constraint
violation of close-to-boundary constraint
Y.-W. ChangUnit 4 119
Appendix D:CB-tree: Corner Stitching
Compliant B*-treeRepresentation
Tsao, Chou, Huang, Chang, Lin, Chen, and Liu,“A Corner Stitching Compliant B*-tree Representation
and Its Applications to Analog Placement,” ICCAD-11
Y.-W. Chang
Corner Stitching [Ousterhout et al., TCAD’84]
․Data structure for representing non-overlapping rectangular modules in a tile plane
․All tiles are linked together at their corners by four corner stitches Solid tiles and empty tiles Empty tiles must be horizontal maximal
solid tile
empty tile
rt
trbl
lb rt : rightmost top neighbortr : topmost right neighborlb: leftmost bottom neighborbl: bottommost left neighbor Tile representation in corner stitching
120
Y.-W. Chang
Point Searching
․Find the tile which contains the given point S1: Select the starting tile randomly and move up or down
until reaching a tile whose vertical range contains the desired point
S2: Move left or right until reach a tile whose horizontal range contains the desired point
S3: Perform step 1 and 2 iteratively to locate the tile containing the desired point
121
T
A
B
CP
rt
bl
Y.-W. Chang
Neighbor Searching
․Find all tiles adjacent to a given tile E.g. find the neighbors on the right sideStep 1) Follow the tr pointer of given tile to find its topmost tight
neighborStep 2) Trace down through lb pointer until all the right-side
neighbors have been found․Is applied during handling minimum separation and
variant constraint
122
bl
Trt
tr
lb
Y.-W. Chang
Area Searching
․Determine if there are any solid tiles within a given rectangular areaStep 1) Use point searching to locate the tile containing the upper-
left corner of the given areaStep 2) If the tile is solid, then the search is done; if the tile is
empty, see if its right edge is within the give areaStep 3) If no solid tile was found, move down to the next tile
touching the right edge of the area of interest․Is applied during handling preplaced constraint
123
P
Y.-W. Chang
Tile Creation
․Create a new solid tile into the data structureStep 1) Find the empty tile containing the top edge of the area to
be occupied by the new tileStep 2) Split the top empty tile along a horizontal line, and update
pointers in the tiles adjoining the new tileStep 3) Perform step 1 and 2 to the bottom edge of the areaStep 4) Work down along the left side of the area of the new tile to
split the space tile into two new tiles, and update pointers
124
Y.-W. Chang
Proposed CB-tree Representation
․Traverse the CB-tree in the DFS order․Update the tile plane every time when finishing packing
one module Create new tiles and update the pointers to record the
neighbors․Can easily get accurate neighbor information with those
pointers
n1
n2 n3b1
A B
b1
A B
b2
C
b1b2
b3
A
D B
C
125
Example CB-tree representation
Y.-W. Chang
Modifications for Point Searching
․Select the starting tile near the given point during point searching The original method selects the starting tile randomly
[Ousterhout et al., TCAD’84] Improve the time complexity from average O(n1/2) to O(1) since
the child module must be placed near the parent module Further improve the operations which adopt point searching
126
ni
nk
n1
…
b1 bi…
…
AB
p
bk
pp1
b1 bi…
AB
p
bk
pp1
The possible situation of the original point searching
The Improved point searching
Y.-W. Chang
Other Modifications (1/2)
․Organize the tile plane vertically due to the property of the B*-tree representation In B*-tree, a module tends to align an adjacent module
vertically Organizing the tile plane vertically can reduce the redundancy
of the tile creation
127
n1
n2 n6
n3 n4 n7
n5b2b1
b3
b6
b4b5
b7
A module tends to align an adjacent module vertically in B*-tree
Y.-W. Chang
Other Modifications (2/2)
․Do not merge the empty tiles which have the same upper and lower boundaries Can conform more to the B*-tree representation
128
n1
n2 n3
n4
b2b1
b3
A B C
b2b1
b3
b4A B
C
b2b1
b3
b4
C
D The packing with merging tiles A and B
The packing without merging tiles A and B
Y.-W. ChangUnit 4 129
Appendix E:Analog Circuit
Placement/Floorplanning
Y.-W. Chang 130
Schematic of an operational amplifier
Corresponding handcrafted layout
Example Analog Layout Design
Require several days to handcraft the layout
of an OPAMP
[Source:Springsoft, Inc. & Po-Hung Lin]
Y.-W. Chang 131
Analog Layout Design Flow
Circuit netlist
Technologyfile
PlacementconstraintsAnalog Placement
Analog Layout
Device Generation
Analog RoutingRoutingconstraints
DevicesNetsSub-circuitsDevice sizesDevice M-factors
Design rules
MatchingSymmetryProximity…
DRC CleanSatisfying constraints
Y.-W. Chang
Analog Circuit Placement․Devices need to be placed properly to reduce parasitic
mismatches and circuit sensitivities to thermal gradients or process variations for better electrical effects and higher performance
Symmetry constraint Proximity constraint
Preplaced constraint Variant constraint
Fixed-boundary constraint Boundary constraint
Minimum separation constraint Regularity constraint
132
Schematic of the CMOS comparator
Placement with
symmetry constraint
Y.-W. Chang 133
Basic Analog Placement Constraints (1/2)
․Device matching/symmetry Reduce mismatches which may
cause higher offset voltages or degrade power-supply rejection ratio
Inter-digitized or common centroidplacement: current mirrors, differential pairs, ratioed devices
Mirrored placement w.r.t. the vertical or horizontal symmetry axis: differential circuits
․Device proximity Place matched devices together or
closely to reduce the impact of local variations during fabrication
common centroid placement
Symmetry placementN-well PMOS
P-substrate NMOSDevice-proximity placement
Y.-W. Chang
Proximity
134
Advanced Analog Placement Constraints
1. Preplaced constraint2. Fixed-outline constraint3. Variant constraint4. Boundary constraint5. Minimum spacing constraint6. Hierarchical constraints7. Layout design hierarchy8. Regularity constraints9. Thermal consideration
Matching
Symmetry
Proximity
Regular structure
Hierarchical constraints
Y.-W. Chang
Analog Placement Constraints (1/4)
․Preplaced constraint Restrict some modules to be placed at pre-specified locations
with fixed orientations for performance specifications
․Fixed-boundary constraint The analog placement region could become irregular with
multiple concave or convex shapes along the boundaries
Preplaced module
Preplaced constraint
Placement region
Convex shape
Concave shape
Fixed-boundary constraint
135
Y.-W. Chang
Analog Placement Constraints (2/4)
․Variant constraint Provide a set of possible realizations of devices to make the
placement more flexible For transistors: different numbers of fingers For capacitances: different aspect ratios
136
Unit capacitor
Folded transistor Different aspect ratios of capacitance
Y.-W. Chang
Analog Placement Constraints (3/4)
․Minimum separation constraint Prevent latch-up for noisy devices Reduce coupling from inductors, capacitors, and interconnects Prepare separations for routing
Guard Ring
Inductor
Spacing
Routing Area
137
Y.-W. Chang
Analog Placement Constraints (4/4)
․Boundary constraint Reduce unwanted routing parasitics between preceding and
succeeding stages Consider input and output pins in placement stage
[Source: Lin et al., DAC’10]138
Place the module with ports not on the boundary
Place the module with ports on the boundary
Y.-W. ChangY.-W. Chang 139
Symmetry Islands
․ Symmetry groups: S = {b0s, (b1, b1’), b2
s, b3s}
․ Symmetry pair: (b1, b1’)․ Self-symmetry blocks: b0
s, b2s, b3
s
left boundary block
b1
b3s
b0s
b1’
b0r
b3r
b1rb2
rb2s
n3r
n2r
n1r
n0r rightmost
branch
bottom boundary blockS = {(b0, b0’), b1s, b2
s, b3s}
b1s
b0’
b0
b3s
b3r
b0r
b1r
b2s n0
r
n1r
n2r
n3r
b2r
left mostbranch
verticalsymmetry
horizontalsymmetry
B*-trees with representative nodes/blocks
․Form symmetry islands that keep devices of the same symmetry group placed at the closest proximity
Y.-W. ChangY.-W. Chang 140
Hierarchical Module Clustering
MatchingGroup
(Hierarchical)Symmetry Group
(Hierarchical)Proximity Group
Y.-W. ChangNTUEE/Y.-W. Chang 141
Hierarchy of the Device Groups
mp1 mp2
mp3
mp4 mp5 mp6
mp9
mp10
mp7 mp8mp11 mp12
mp13
mp14
mp15
mn1
mn2
mn3
mn4 mn5
mn6
mn7mn8
R1R2 R3
R4
R5
bias circuitry differential input stage gain stage output stage
G13 G14 G15 G16 G17
G6 G10 G7 G11 G8 G12
G1 G5G2 G4G3mn1mn2mn3
mp15
mp14
mn8R1R4R5
R2R3
mp1mp2mp3
mp4mp5mp6
mp7mp8
mp9mp10
mp11
mp12
mn4mn5
mn6mn7
mp13
Top
G9
Y.-W. ChangY.-W. Chang 142
Hierarchical B*-trees (HB*-trees)․Handle symmetry-island and non-symmetry blocks together
․Keep the horizontal contour for a hierarchy node representing a rectilinear symmetry island
b1s
b2
b6’
b6
b2’b5
b7s b8
sb4b3
n5
nS1
n3
n4
nS2
n2r
n1r
n6r
n7r
n8r
hierarchy node block nodeB*-subtree in a hierarchy node
n1r
n0r
n2r
b1 b1’b2 b2’
b0 b0’
S0
c00 c02nS0
n00
n01
n02
horizontal contour hierarchy nodecontour node
Y.-W. ChangY.-W. Chang 143
Hierarchical B*-tree Formulation
nG17
nmp15 nG13
nG16
nG15
nG14
nR1
nR5
nR4
nG9
nG17
nG6
nG10nG13
nrG7
nrG11nG14
nG8
nG12nG15nmn8
nmp14nG16
nmp18
nG1nG10
nrG5
nrG2nG11
nG3
nG4nG12
G6
mp18
G1mp14
mn8
Y.-W. Chang 144
Resulting Placements
Circuit biasynth_2p4g# of Mod. 65
# of Sym. Mod. 8 + 5 + 12Area utilization 95.53%
Runtime 22 sec
Circuit lnamixbias_2p4g# of Mod. 110
# of Sym. Mod. 16 + 6 + 6 + 12 + 4
Area utilization 94.59%Runtime 43 sec
Y.-W. Chang 145
Symmetry and Regularity Considerations
․Simultaneously considering symmetry and regularity is non-trivial in HB*-tree
symmetry group
B
E
AFC
D
B
E
C
D
regular structure
first considering symmetrygroups and then regular structures
simultaneously considering symmetry groups and regular structures
AF HGI
I
J J
symmetry group regular structure
G H
Modification is required for HB*-tree to consider symmetry and regularity simultaneously.
Y.-W. Chang 146
Regularity Node․Represent a regular structure․Contain only a specific tree topology for rows or arrays․Require only height and width to pack the tree inside․Generate one horizontal top contour
n1
n2
n3
n1
n2
n3
n4
n5
n6
n7
n8
n9
n4
b1 b2 b3 b4b1 b2 b3
b4 b5 b6
b7 b8 b9
A row structureAn array structure
Placement of OTA1_2
Y.-W. Chang
Comparison of Packing Complexity․Our CB-trees (B*-tree + Corner stitching) achieves the
lowest time complexity among existing work for packing with various constraints n: number of modules; k: number of constraints
Constraint Based on SP Based on B*-tree CB-treeSymmetry O(knlglgn) [ISCAS’07] O(n) [TCAD’09] O(n)Proximity O(kn2) [TVLSI’04] O(n) [DAC’08] O(n)Preplaced O(kn2) [ICCAD’06] O(n2lgk) [ISCAS’01] O(nk)
Variant O(nlgn) [ISPD’05] O(n) [TCAD’06] O(n)Fixed-
boundary N/A N/A O(n)
Min. sep. N/A * [ICCAD’08] O(nk)Boundary O(kn2) [ICCAD’06] O(n) O(n)
*: The work [ICCAD’08] employs a deterministic scheme, which needs more time than nondeterministic methods
147
Y.-W. Chang
Corner Stitching [Ousterhout et al., TCAD’84]
․Data structure for representing non-overlapping rectangular modules in a tile plane
․All tiles are linked together at their corners by four corner stitches Solid tiles and empty tiles Empty tiles must be horizontal maximal
solid tile
empty tile
rt
trbl
lb rt : rightmost top neighbortr : topmost right neighborlb: leftmost bottom neighborbl: bottommost left neighbor Tile representation in corner stitching
148
Y.-W. Chang
Comparison Based on Industry Designs
․Comparison Area: 2% – 13% better than previous works Time: much more efficient than the previous works
149
IndustryCircuit
SP [TCAD’00]
Seg. Tree[TCAD’04]
SP w/ LP[TCAD’07]
SP w/ Dummy[ICCAD’06]
Area(um2)
Time(s)
Area(um2)
Time(s)
Area(um2)
Time(s)
Area(um2)
Time(s)
biasynth_2p4g 5.40 780 5.40 246 5.00 403 5.57 134
lnamixbias_2p4g 50.80 2824 50.30 726 49.95 3252 52.21 227
Comparison 1.10 - 1.09 - 1.05 74.86 1.13 9.64
IndustryCircuit
SP w/ JPQ[ISCAS’07]
B*-tree[ICCAD’08]
HB*-tree[TCAD’09]
SP [ICCAD’10]
CB-tree
Area(um2)
Time(s)
Area(um2)
Time(s)
Area(um2)
Time(s)
Area(um2)
Time(s)
Area(um2)
Time(s)
biasynth_2p4g N/A N/A 4.93 337 4.92 22 4.92 14 4.81 12
lnamixbias_2p4g 50.14 480 49.54 387 48.63 43 48.25 53 47.25 28
Comparison 1.06 17.14 1.04 20.95 1.03 1.68 1.02 1.51 1.00 1.00
Y.-W. Chang
Resulting Placements
․The resulting layouts of biasynth_2p4g and biasynth_2p4g_7
Placement with symmetry constraint
Placement with several constraints
150
symmetrypreplaced
variant
boundary
proximity
Y.-W. Chang 151
Thermal Consideration
․The heat generated by power devices may degrade circuit performance of thermally-sensitive devices.
․Thermal profiles should be considered when placing both thermal and thermal-sensitive devices.
The block diagram of a generic RF system
Y.-W. ChangY.-W. Chang 152
Thermal Profile
․The placement of thermal devices determines the thermal profile.
․The placement of thermal-sensitive devices should be matched in the thermal profile.
․Properties of a good thermal profile Smoother thermal gradient Regular thermal contours in either the horizontal or
vertical direction Lower temperature at the thermal hot spots More separation between thermal and thermal-
sensitive devices Larger areas to accommodate thermal-sensitive
devices
Y.-W. ChangY.-W. Chang 153
Placement of Thermal Devices
Thermal Placement
Thermal Contour
Thermal Profile
Thermal Placement
Thermal Contour
Thermal Profile
Smoother gradient
Lower Temperature
More Separation
Regular Contours
Larger Area
Better
Y.-W. ChangY.-W. Chang 154
Placement of Thermal-Sensitive Devices․Devices of a symmetry group
The thermal impact on both devices of a symmetry pair should be the same.
The symmetry line must be perpendicular to the thermal contours.
․Devices of a matching group The thermal impact on devices of a
matching group should be the same. Sub-devices must be evenly
distributed in different thermal contours.
We propose a k-row thermal-driven common-centroid placement algorithm.
A CC D BDDDAC CDB DD D
A CC D BDDDAC CDB DD D
Q1 Q2
Q1
Q2
Y.-W. ChangY.-W. Chang 155
Algorithm: Thermal-Driven Common-Centroid Placement
1. Evenly distribute the sub-devices of each device into the krows while keeping the assignments of the i-th row and its symmetric row the same.
2. For each row, merge devices by constructing Eulerian trails while keeping the device merging of the i-th row and its symmetric row the same.
3. For each row, assign the devices or merged devices into the column positions in a randomized order while keeping the assignment of the symmetric row in the reverse order.
Placement of a common-centroid group in a 4-bit binary weighted current network containing 5 devices, each with 16, 16, 32, 64, and 128 sub-devices
Y.-W. ChangY.-W. Chang 156
Thermal-Driven Common-Centroid Placements
Placements of a common-centroid group in a 4-bit binary weighted current network containing 5 devices, each with 16, 16, 32, 64, and 128 sub-devices
Ourdesign
Best result
Y.-W. Chang
Capacitor-Array Placement
C1 : C2 : C3 : C4 = 1 : 2 : 16 : 45
unit capacitor or pad of C1
unit capacitor or pad of C2
unit capacitor or pad of C3
unit capacitor or pad of C4
Y.-W. Chang
Capacitor-Array Length-Ratio-Matching Routing
u1
u2
C1
C2
u2
u2
u2
u2
u2
u1
C3
u3
u3
u3
u3
u3
u3
u3
C4
u4
u4
u4
u4
u4
u4
u4
C5
u5
u5
u5
u5
u5
u5
u5
u5
TargetCapacitance
Ratio- 2 : 6 : 7 : 7 : 8
Wirelength RatioOurs 2 : 6.05 : 6.95 : 6.99 : 7.98
Other 2 : 6.19 : 6.55 : 6.34 : 7.43
Capacitance Ratio
Ours 2 : 6.02 : 6.94 : 6.90 : 7.98
Other 2 : 6.16 : 6.89 : 6.82 : 7.93
Error of C2Ours 0.40%
Other 2.70%
Error of C3Ours 0.79%
Other 1.55%
Error of C4Ours 1.43%
Other 2.51%
Error of C5Ours 0.24%
Other 0.93%
H.-C. Ou, K.-H. Ho, Y.-W. Chang, and H.-F. Tsao, "Coupling-aware length-ratio-matching routing for capacitor arrays in analog integrated circuits," DAC, 2013.