asic vs fpga
Embed Size (px)
TRANSCRIPT

AS
ICvs
FP
GA
S.M
anci
ni

Plan
Introduction
ASICsFPGAModèles de coûts
Méthodologie de conceptionDurcissement aux radiationsSoCBilan
1- ASIC vs FPGA
Problématique
FPGA ou ASIC ?
Sur quels critères fonder son choix ?Quels sont les points communs et différences desméthodes de conception ?
2- ASIC vs FPGAIntroduction
S. Mancini
Plan
Introduction
ASICsFPGAModèles de coûts
Méthodologie de conceptionDurcissement aux radiationsSoCBilan
3- ASIC vs FPGA
Les familles
Les ASICs (Application Specific Integrated Circuit) sedécomposent en plusieurs familles :
Full CustomLes masques des transistors sont dessinés.
Standard cellsLe circuit est un assemblage de cellules placées/routées.
Gate arrayUne “mer” de portes est routée.
Embedded Gate arrayC’est un Gate array avec des macro-blocs complexes (RAM).
4- ASIC vs FPGAIntroduction- ASICs
S. Mancini
Evolution des technologie
1st April, 2003 UK Design Forum 6
Technology TrendsTechnology Trends
Proc
ess
Proc
ess
Gen
erat
ion
Gen
erat
ion
YearYear19199999
100nm
130nm
180180nmnm
250250nmnm
19199797 20200101 20200303 20200505 20200707 20200909
65nm
•
′99 ITRS (International Technology Roadmap for Semiconductors)
Leading Foundry
′00 ITRS
•
•
•
• ♦′01 ITRS
90nm
♦
1
Technologie 90 nm
430 KPortes/mm2
SRAM 1.6 à 1.2 mm2 par MbitDRAM 0.5 mm2 par Mbit6 à 9 couches de métal
5- ASIC vs FPGAIntroduction- ASICs
S. Mancini
Plan
Introduction
ASICsFPGAModèles de coûts
Méthodologie de conceptionDurcissement aux radiationsSoCBilan
6- ASIC vs FPGA

Principe
Proposer des circuits génériques reconfigurables à vo-lonté. Ils sont constitués de matrices de cellules reconfi-gurables et d’un réseau d’interconnexion.
Principaux vendeurs :ActelAlteraAtmelCypress
LatticeMincQuickLogicXilinx
Les technologies diffèrent par :
La technologie de mémorisation de la configurationLe type de cellules élémentaires
7- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Technologies de programmation
Les trois principales technologies de programmation sont :
SRAM
Flash
Anti-fusibles
Q
Q’RW
Data
Reconfigurable dynamique-ment
Technologie standardPerte de configuration à lamise hors tension
8- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Technologies de programmation
Les trois principales technologies de programmation sont :
SRAM
Flash
Anti-fusibles
Grille flottante
Conserve la configuration
Circuit “autonome”
Technologie non-standard
9- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Technologies de programmation
Les trois principales technologies de programmation sont :
SRAM
Flash
Anti-fusibles
Anti−fusible
Encombrement minimal
Non reprogrammable
Technologie spécifique
10- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Actel (ProAsic)
v3.1 5
ProASICPLUS Flash Family FPGAs
ProASICPLUS ArchitectureThe proprietary ProASICPLUS architecture providesgranularity comparable to gate arrays.
The ProASICPLUS device core consists of a Sea-of-Tiles™
(Figure 1). Each tile can be configured as a 3-input logicfunction (e.g., NAND gate, D-Flip-Flop, etc.) byprogramming the appropriate Flash switchinterconnections (Figure 2 on page 6 and Figure 3 onpage 6). Tiles and larger functions are connected with anyof the four levels of routing hierarchy. Flash switches aredistributed throughout the device to provide nonvolatile,reconfigurable interconnect programming. Flash switchesare programmed to connect signal lines to the appropriatelogic cell inputs and outputs. Dedicated high-performancelines are connected as needed for fast, low-skew globalsignal distribution throughout the core. Maximum coreutilization is possible for virtually any design.
ProASICPLUS devices also contain embedded two-portSRAM blocks with built-in FIFO/RAM control logic.Programming options include synchronous or asynchronousoperation, two-port RAM configurations, user defined depthand width, and parity generation or checking. Please see
the “Embedded Memory Configurations” section on page 21for more information.
Flash Switch
Unlike SRAM FPGAs, ProASICPLUS uses a live on power-upISP Flash switch as its programming element.
In the ProASICPLUS Flash switch, two transistors share thefloating gate, which stores the programming information.One is the sensing transistor, which is only used for writingand verification of the floating gate voltage. The other is theswitching transistor. It can be used in the architecture toconnect/separate routing nets or to configure logic. It is alsoused to erase the floating gate (Figure 2 on page 6).
Logic Tile
The logic tile cell (Figure 3 on page 6) has three inputs (anyor all of which can be inverted) and one output (which canconnect to both ultra-fast local and efficient long-linerouting resources). Any three-input, one-output logicfunction (except a three-input XOR) can be configured asone tile. The tile can be configured as a latch with clear orset or as a flip-flop with clear or set. Thus, the tiles canflexibly map logic and sequential gates of a design.
Figure 1 • The ProASICPLUS Device Architecture
256x9 Two-Port SRAM or FIFO Block
Logic Tile
256x9 Two Port SRAM or FIFO Block
RAM Block
RAM Block
I/Os
1
ProASICPLUS Flash Family FPGAs
6 v3.1
Routing Resources
The routing structure of ProASICPLUS devices is designed toprovide high performance through a flexible four-levelhierarchy of routing resources: ultra-fast local resources,efficient long-line resources, high speed very long-lineresources, and high performance global networks.
The ultra-fast local resources are dedicated lines that allowthe output of each tile to connect directly to every input ofthe eight surrounding tiles (Figure 4 on page 7).
The efficient long-line resources provide routing for longerdistances and higher fanout connections. These resourcesvary in length (spanning 1, 2, or 4 tiles), run both verticallyand horizontally, and cover the entire ProASICPLUS device(Figure 5 on page 7). Each tile can drive signals onto theefficient long-line resources, which can in turn, access everyinput of every tile. Active buffers are inserted automaticallyby routing software to limit the loading effects due todistance and fanout.
The high-speed very long-line resources, which span theentire device with minimal delay, are used to route very longor very high fanout nets. (Figure 6 on page 8).
The high-performance global networks are low skew, highfanout nets that are accessible from external pins or frominternal logic (Figure 7 on page 9). These nets are typicallyused to distribute clocks, resets, and other high fanout netsrequiring a minimum skew. The global networks areimplemented as clock trees, and signals can be introducedat any junction. These can be employed hierarchically withsignals accessing every input on all tiles.
Figure 2 • Flash Switch
Figure 3 • Core Logic Tile
Switch In
Switch Out
Word
Floating Gate
Sensing Switching
Local RoutingIn 1
In 2 (CLK)
In 3 (Reset)
Efficient Long-Line Routing
1
Mot
Switch
Switch in
Switch out
Grille flottante
Test
Flash
Circuit APA100System Gates 1 000 000Tiles (Registers) 56 320RAM 198 kBit
PLL 2Clocks 88
11- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Actel (Axcelerator)Axcelerator Family FPGAs
6 Advanced v1.5
Embedded MemoryAs mentioned earlier, each core tile has either three (in asmaller tile) or four (in the regular tile) embedded SRAMblocks along the west side, and each variable-aspect-ratioSRAM block is 4,608 bits in size. Available memoryconfigurations are: 128x36, 256x18, 512x9, 1kx4, 2kx2 or4kx1 bits. The individual blocks have separate read andwrite ports that can be configured with different bit widthson each port. For example, data can be written in by 8 andread out by 1. The embedded SRAM blocks can be initializedat power up via the device JTAG port (ROM emulationmode).
In addition, every SRAM block has an embedded FIFOcontrol unit. The control unit allows the SRAM block to beconfigured as a synchronous FIFO without using core logicmodules. The FIFO width and depth are programmable. TheFIFO also features programmable ALMOST-EMPTY(AEMPTY) and ALMOST-FULL (AFULL) flags in addition tothe normal EMPTY and FULL flags. The embedded FIFOcontrol unit also contains the counters necessary for thegeneration of the read and write address pointers as well ascontrol circuitry to prevent metastability and erroneousoperation. The embedded SRAM/FIFO blocks can becascaded to create larger configurations.
Figure 6 • AX Device Architecture (AX1000 shown)
Chip Layout
SuperCluster
I/O Structure(See Figure 6)
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
RAMC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
RD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
HD
SC
SC
SC
SC
Core Tile
4kRAM/FIFO
4kRAM/FIFO
4kRAM/FIFO
4kRAM/FIFO
RX
TX
BC R CC C R
RX RX RX
TX TXTX
Table 1 • Number of Core Tiles per Device
Device Number of Core Tiles
AX125 1 regular tile
AX250 4 smaller tilesAX500 4 regular tilesAX1000 9 regular tilesAX2000 16 regular tiles
1
Axcelerator Family FPGAs
4 Advanced v1.5
Logic ModulesActel’s Axcelerator family provides two types of logicmodules, the register cell (R-cell) and the combinatorialcell (C-cell). The AX C-cell can implement more than 4,000combinatorial functions of up to 5 inputs (Figure 3 onpage 5). The C-cell contains carry logic for even moreefficient implementation of arithmetic functions. With itssmall size, the C-cell structure is extremelysynthesis-friendly, simplifying the overall design as well asreducing design time.
The R-cell contains a flip-flop featuring asynchronous clear,asynchronous preset, and active-low enable control signals(Figure 3 on page 5). The R-cell registers featureprogrammable clock polarity selectable on aregister-by-register basis. This provides additional flexibility(e.g., easy mapping of dual-data-rate functions into theFPGA) while conserving valuable clock resources. The clocksource for the R-cell can be chosen from the hard-wiredclocks, the routed clocks, or the internal logic.
Two C-cells, a single R-cell, and two Transmit (TX) and twoReceive (RX) routing buffers form a Cluster, and twoClusters comprise a SuperCluster (Figure 4 on page 5).Each SuperCluster contains an independent Buffer module,which supports automatic buffer insertion on high-fanoutnets by the place-and-route tool, minimizing system delayswhile improving logic utilization.
The logic modules within the SuperCluster are arranged sothat two combinatorial modules are side by side, giving aC–C–R – C–C–R pattern to the SuperCluster. This C–C–Rpattern enables efficient implementation (minimum delay)of 2-bit carry logic for improved arithmetic performance(Figure 5 on page 5).
The AX architecture is fully fracturable, meaning that if oneor more of the logic modules in a SuperCluster are used by aparticular signal path, the other logic modules are stillavailable for use by other paths.
At the chip level, SuperClusters are organized into coretiles, which are arrayed to build up the full chip. Each coretile consists of an array of 336 SuperClusters and four SRAMblocks (176 SuperClusters and 3 SRAM blocks for theAX250). The SRAM blocks are arranged in a column on thewest side of the tile (Figure 6 on page 6). For example, theAX1000 is composed of a 3x3 array of 9 core tiles.Surrounding the array of core tiles are blocks of I/O Clustersand the I/O bank ring (Table 1 on page 6).
Figure 2 • Axcelerator Family Interconnect Elements1
Circuit AX2000System Gates 2 000 000R-Cells 10 752C-Cells 21 504
RAM 338 kBitPLL 8Clocks 4
12- ASIC vs FPGAIntroduction- FPGA
S. Mancini

Xilinx (Spartan 3/Virtex II)
Functional Description: FPGAR
36 www.xilinx.com DS083-2 (v2.7) June 2, 20031-800-255-7778 Advance Product Specification
3. NO_CHANGE
The NO_CHANGE option maintains the content of the out-put registers, regardless of the write operation. The clockedge during the write mode has no effect on the content ofthe data output register DO. When the port is configured asNO_CHANGE, only a read operation loads a new value inthe output register DO, as shown in Figure 42.
Control Pins and Attributes
Virtex-II Pro SelectRAM+ memory has two independentports with the control signals described in Table 19. All con-trol inputs including the clock have an optional inversion.
Initial memory content is determined by the INIT_xxattributes. Separate attributes determine the output registervalue after device configuration (INIT) and SSR is asserted(SRVAL). Both attributes (INIT_B and SRVAL) are availablefor each port when a block SelectRAM+ resource is config-ured as dual-port RAM.
Total Amount of SelectRAM+ Memory
Virtex-II Pro SelectRAM+ memory blocks are organized inmultiple columns. The number of blocks per columndepends on the row size, the number of Processor Blocks,and the number of RocketIO transceivers.
Table 20 shows the number of columns as well as the totalamount of block SelectRAM+ memory available for eachVirtex-II Pro device. The 18 Kb SelectRAM+ blocks arecascadable to implement deeper or wider single- or dual-portmemory resources.
Figure 43 shows the layout of the block RAM columns in theXC2VP4 device.
18-Bit x 18-Bit Multipliers
Introduction
A Virtex-II Pro multiplier block is an 18-bit by 18-bit 2’s com-plement signed multiplier. Virtex-II Pro devices incorporatemany embedded multiplier blocks. These multipliers can beassociated with an 18 Kb block SelectRAM+ resource orcan be used independently. They are optimized forhigh-speed operations and have a lower power consump-tion compared to an 18-bit x 18-bit multiplier in slices.
Figure 42: NO_CHANGE Mode
Table 19: Control Functions
Control Signal Function
CLK Read and Write Clock
EN Enable affects Read, Write, Set, Reset
WE Write Enable
SSR Set DO register to SRVAL (attribute)
CLK
WE
Data_in
Data_in
New
aa
Last Read Cycle Content (no change)
Address
Internal Memory DO No change during write
Data_out
DI
DS083-2_12_050901
RAM Contents NewOld
Table 20: Virtex-II Pro SelectRAM+ Memory Available
Device Columns
Total SelectRAM+ Memory
Blocks in Kb in Bits
XC2VP2 4 12 216 221,184
XC2VP4 4 28 504 516,096
XC2VP7 6 44 792 811,008
XC2VP20 8 88 1,584 1,622,016
XC2VP30 8 136 2,448 2,506,752
XC2VP40 10 192 3,456 3,538,944
XC2VP50 12 232 4,176 4,276,224
XC2VP70 14 328 5,904 6,045,696
XC2VP100 16 444 7,992 8,183,808
XC2VP125 18 556 10,008 10,248,192
Figure 43: XC2VP4 Block RAM Column Layout
BRAMMultiplierBlocks
PPC405CPU
CLBs
CLB
s
CLB
s CLBs
CLBs
DS083-2_11_010802
TMRocketIOSerial Transceivers
TMRocketIOSerial Transceivers
DCM DCM
DCM DCM
1
Functional Description: FPGAR
24 www.xilinx.com DS083-2 (v2.7) June 2, 20031-800-255-7778 Advance Product Specification
Configurable Logic Blocks (CLBs)The Virtex-II Pro configurable logic blocks (CLB) are orga-nized in an array and are used to build combinatorial andsynchronous logic designs. Each CLB element is tied to aswitch matrix to access the general routing matrix, asshown in Figure 23.
A CLB element comprises 4 similar slices, with fast localfeedback within the CLB. The four slices are split in two col-umns of two slices with two independent carry logic chainsand one common shift chain.
Slice Description
Each slice includes two 4-input function generators, carrylogic, arithmetic logic gates, wide function multiplexers andtwo storage elements. As shown in Figure 24, each 4-inputfunction generator is programmable as a 4-input LUT, 16bits of distributed SelectRAM+ memory, or a 16-bit vari-able-tap shift register element.
Figure 23: Virtex-II Pro CLB Element
SliceX1Y1
SliceX1Y0
SliceX0Y1
SliceX0Y0
FastConnectsto neighbors
SwitchMatrix
DS083-2_32_122001
SHIFTCIN
COUT
TBUF
COUT
CIN
TBUF
Figure 24: Virtex-II Pro Slice Configuration
Register/Latch
MUXF5
MUXFx
CYSRL16
RAM16
LUTG
Register/Latch
Arithmetic Logic
CYLUT
F
DS083-2_31_122001
SRL16
RAM16
ORCY
1
Functional Description: FPGAR
24 www.xilinx.com DS083-2 (v2.7) June 2, 20031-800-255-7778 Advance Product Specification
Configurable Logic Blocks (CLBs)The Virtex-II Pro configurable logic blocks (CLB) are orga-nized in an array and are used to build combinatorial andsynchronous logic designs. Each CLB element is tied to aswitch matrix to access the general routing matrix, asshown in Figure 23.
A CLB element comprises 4 similar slices, with fast localfeedback within the CLB. The four slices are split in two col-umns of two slices with two independent carry logic chainsand one common shift chain.
Slice Description
Each slice includes two 4-input function generators, carrylogic, arithmetic logic gates, wide function multiplexers andtwo storage elements. As shown in Figure 24, each 4-inputfunction generator is programmable as a 4-input LUT, 16bits of distributed SelectRAM+ memory, or a 16-bit vari-able-tap shift register element.
Figure 23: Virtex-II Pro CLB Element
SliceX1Y1
SliceX1Y0
SliceX0Y1
SliceX0Y0
FastConnectsto neighbors
SwitchMatrix
DS083-2_32_122001
SHIFTCIN
COUT
TBUF
COUT
CIN
TBUF
Figure 24: Virtex-II Pro Slice Configuration
Register/Latch
MUXF5
MUXFx
CYSRL16
RAM16
LUTG
Register/Latch
Arithmetic Logic
CYLUT
F
DS083-2_31_122001
SRL16
RAM16
ORCY
1
Circuit Spartan 3 VirtexIILogic Cells 74 880 125 136Slices 33080 55 616RAM 2,5 MBit 11 MBit
Circuit Spartan 3 VirtexIIMult. (18x18) 104 556Clock man. 4 12µP 0 4 PPC
13- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Altera (Apex/Stratix)
Circuit Apex II (EP2A70) Stratix (EP1S80) Excalibur (EPXA10)LEs 67 200 79 040 38 400RAM 1 Mbit 7 Mbit 3 MbitMult. (9x9) 176PLL 4 12 ?µP ARM922T
14- ASIC vs FPGAIntroduction- FPGA
S. Mancini
Plan
Introduction
ASICsFPGAModèles de coûts
Méthodologie de conceptionDurcissement aux radiationsSoCBilan
15- ASIC vs FPGA
Coûts des FPGAs
Exemple de prix unitaires pour de grandes quantités :
Société Référence Prix
Altera EP20K200 (Apex 20k) 340 $
Altera EP1S80 800 $
Altera EPXA1 (Excalibur ARM) 27 $
Xilinx XC3S1000 (Spartan 3) 200 $
Xilinx XC2V8000 (Virtex II) 8000 $
Xilinx XC2VP100 (Virtex II Pro) 11000 $
Actel APA1000 (ProAsic+) 400 $
Actel AX2000 (Axcelerator) 630 $
16- ASIC vs FPGAIntroduction- Modèles de coûts
S. Mancini
S’ajoute{
Outils de CAOEEPROMs externes
18
Coût des ASICS
Troix composantes :
Coût de conceptionIngénieurs
Outils de CAO ≈ 500 000 $ par an.
NRE (Non-Recurring Engineering Charges)Coûts de fabrication incompressibles (masques, . . . )
≈ 50 000 $, jusqu’à 1,5 M$ pour wafer 300 mm techno 90 nm
Coût unitaireCoût de fabrication unitaire ≈ 0.2 $ par mm2
Un wafer 300 mm (90000 mm2)= 300 $
Les gate-arrays réduisent les NRE.
17- ASIC vs FPGAIntroduction- Modèles de coûts
S. Mancini

Comparaison
Données : système de 250K portes
NRE ($) Coût unitaire ($)
FPGA 3 200ASIC 350 000 30
Device Only Cost (ASIC includes NRE)Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
5 16,000$ 350,150$ FPGA NRE -$
10 32,000$ 350,300$ ASIC 30$ Each
50 160,000$ 351,500$ ASIC NRE 350,000$
100 320,000$ 353,000$ 150 480,000$ 354,500$
Device + EDA Tools Estimate (ASIC includes NRE)FPGA EDA 82,000$ Simulation+Synthesis+FPGA Place&RouteASIC EDA 343,000$ Simulation+Synthesis+Timing+ATPG
Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
10 114,000$ 693,300$ FPGA NRE -$
50 242,000$ 694,500$ ASIC 30$ Each
100 402,000$ 696,000$ ASIC NRE 350,000$
150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$
FPGA/ASIC Cost vs Units (250KGates)
$-
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
5 10 50 100 150
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
FPGA/ASIC Cost vs Units (250KGates)
$-
$200,000
$400,000
$600,000
$800,000
$1,000,000
10 50 100 150 200 250
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
1
Device Only Cost (ASIC includes NRE)Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
5 16,000$ 350,150$ FPGA NRE -$
10 32,000$ 350,300$ ASIC 30$ Each
50 160,000$ 351,500$ ASIC NRE 350,000$
100 320,000$ 353,000$ 150 480,000$ 354,500$
Device + EDA Tools Estimate (ASIC includes NRE)FPGA EDA 82,000$ Simulation+Synthesis+FPGA Place&RouteASIC EDA 343,000$ Simulation+Synthesis+Timing+ATPG
Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
10 114,000$ 693,300$ FPGA NRE -$
50 242,000$ 694,500$ ASIC 30$ Each
100 402,000$ 696,000$ ASIC NRE 350,000$
150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$
FPGA/ASIC Cost vs Units (250KGates)
$-
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
5 10 50 100 150
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
FPGA/ASIC Cost vs Units (250KGates)
$-
$200,000
$400,000
$600,000
$800,000
$1,000,000
10 50 100 150 200 250
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
1Coût du circuit
Device Only Cost (ASIC includes NRE)Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
5 16,000$ 350,150$ FPGA NRE -$
10 32,000$ 350,300$ ASIC 30$ Each
50 160,000$ 351,500$ ASIC NRE 350,000$
100 320,000$ 353,000$ 150 480,000$ 354,500$
Device + EDA Tools Estimate (ASIC includes NRE)FPGA EDA 82,000$ Simulation+Synthesis+FPGA Place&RouteASIC EDA 343,000$ Simulation+Synthesis+Timing+ATPG
Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each
10 114,000$ 693,300$ FPGA NRE -$
50 242,000$ 694,500$ ASIC 30$ Each
100 402,000$ 696,000$ ASIC NRE 350,000$
150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$
FPGA/ASIC Cost vs Units (250KGates)
$-
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
5 10 50 100 150
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
FPGA/ASIC Cost vs Units (250KGates)
$-
$200,000
$400,000
$600,000
$800,000
$1,000,000
10 50 100 150 200 250
# of Units
To
tal U
nit
Co
st (
US
$)
FPGA Cost
ASIC Cost
1. . . et la CAO
http ://www.altera.com/products/devices/cost/cst-cost_step1.jsp
18- ASIC vs FPGAIntroduction- Modèles de coûts
S. Mancini
Les circuits multi-projets
Plusieurs projets/circuits sont faits sur le même waferpour partager les NRE.
Europractice : AMI Semiconductor 0,35 µm CMOS680 Euro/mm2
CMP : STMicroelectronics 0,18 µm CMOS HC-MOS8D 990 Euro/mm2
19- ASIC vs FPGAMéthodologie de conception
S. Mancini
Plan
IntroductionMéthodologie de conception
Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON
Durcissement aux radiationsSoCBilan
20- ASIC vs FPGA
Flot de conception
nonoui oui
FPGA
ASICFabrication
Programmation
Validation
Simulation
routagePlacementSynthèse
Simulation
ValidationValidation
Simulation
Spécification
de testVecteurs
VHDL(RTL)
21- ASIC vs FPGAMéthodologie de conception- Méthodes communes
S. Mancini
Plan
IntroductionMéthodologie de conception
Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON
Durcissement aux radiationsSoCBilan
22- ASIC vs FPGA
Synthèse directe
Les descriptions à un "haut" niveau d’abstraction desblocs fonctionnels sont transformées en cellules stan-dards.
Entity
e
ee
s
3
21
VHDL
LAYOUT
NETLIST
Synthèse
PlacementRoutage
Pas de circuits spécifiques de type RAM/CAM, PLL
23- ASIC vs FPGAMéthodologie de conception- Spécificité des ASICs
S. Mancini

Composants "précaractérisés"-IP
Les circuits complexes sont proposés sous la forme demacro-blocs.
Les fondeurs pro-posent des modèlesde simulation etdes masques (vueabstraite).La synthèse se faitpar instanciation de“boîte noire”.
e
ee
s
3
21Entity
VHDL NETLIST
LAYOUT
IP
RAM
24- ASIC vs FPGAMéthodologie de conception- Spécificité des ASICs
S. Mancini
Le “Back-End”
Le placement/routage se décompose en plusieursétapes :
PlacementInsertion testInsertion arbre d’horlogeRoutage des horloges
Routage completAnalyse de timingVérification (DRC, LVS, simu-lation post placement/routage,. . . )
Les blocs fonctionnelspeuvent être décom-posés et placés/routésséparement
chiplet timing, clock matching, and I/O tim-
ing analysis.
To achieve timing closure, we made engi-
neering change orders to the netlist after routing.
Following each manipulation step, formal verifi-
cation ensured that the modified netlist was func-
tionally equivalent to the one after test insertion.
We aligned all clock domains having syn-
chronous chiplet crossings. For example, if the
memory interface clock in one chiplet was syn-
chronously connected to the same clock in
another chiplet, we phase-aligned these clocks
and analyzed the signal paths to meet timing
constraints. We achieved clock alignment by
tweaking the clock insertion delays, using align-
ers in the clock module. Similarly, we made the
clock trees as structurally identical as possible.
As part of the physical design process, we met
design completion and manufacturability goals
by implementing techniques such as design rule
checks, antenna fixes, track filling, and doubling
of vias wherever possible. Figure 4 shows the lay-
out plot for the Viper design’s initial version.
Table 3 summarizes the major design
parameters.
WE HAVE LEARNED much from the Viper design
experience and trust it will guide us in the
future, particularly since the next-generation
SOC designs are significantly more complex,
calling for still higher levels of integration. Some
of our current activities, in addition to regular
chip-development tasks, are investigating more
efficient on-chip bus architectures and better
design-reuse methodologies. �
AcknowledgmentsWe thank the Viper management and design
teams for their hard work, particularly chief
architects Gert Slavenburg and Lane Albanese,
without whose foresight and leadership the pro-
ject never would have been successful.
References1. S. Rathnam and G. Slavenburg, “An Architectural
Overview of the Programmable Multimedia
Processor, TM-1,’’ Proc. 41st IEEE Computer
Society Int’l Conf. (COMPCON 96), IEEE CS
Press, Los Alamitos, Calif., 1996, pp. 319-326.
2. D. Paret and C. Fenger, The I2C Bus, John Wiley
& Sons, New York, 1997.
Santanu Dutta is a designengineering manager atPhilips Semiconductors inSunnyvale, California. Hisresearch interests includedesign of high-performance
Application-Specific SOC Multiprocessors
30 IEEE Design & Test of Computers
CAB MPEG
MBS+
VIP1+
VIP2
ICP1 + ICP2 + MMI
Conditionalaccess
(MSP1 + MSP2)T-PI
M-PI
TM32
1394
MSP3
PR3940
Figure 4. Layout of Viper (PNX8500).
Table 3. Design statistics.
Parameter Value
Process technology TSMC 0.18 µm, six metal layers
Transistors About 35 million
Instances 1.2 million instances, or 8 million gates
Memories 243 instances, 750-Kbit memory
CPUs 2 (TriMedia TM32 and MIPS PR3940)
Peripherals 50
Clock domains 82
Clock speed TM32: 200 MHz; PR3940: 150 MHz;
SDRAM: 143 MHz
Power 4.5 W
Supply voltage 1.8-V core and 3.3-V I/O
Package BGA456
1PNX8500 (philips)
La physique des interconnexions doit être prise en compte.
25- ASIC vs FPGAMéthodologie de conception- Spécificité des ASICs
S. Mancini
Plan
IntroductionMéthodologie de conception
Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON
Durcissement aux radiationsSoCBilan
26- ASIC vs FPGA
Modèles d’entrées
Les vendeurs de FPGA proposent des outils“propriétaires” pour utiliser les FPGAs :
Saisie de schématiqueLangages de description spécifiques
AHDL - AlteraABEL - Xilinx
La synthèse peut être réalisée par des outils tiers(Leonardo, Synplicity, Synopsys, etc ...).
27- ASIC vs FPGAMéthodologie de conception- Spécificité des FPGAs
S. Mancini
Placement/routage
Le placement/routage est réalisé par des outilspropriétaires. Ces outils permettent :
d’allouer les blocs fonctionnelsd’extraire une analyse de timing
L’acroissement de com-plexité des FPGA imposel’utilisation de méthodolo-gies hiérarchiques.
28- ASIC vs FPGAMéthodologie de conception- Spécificité des FPGAs
S. Mancini
Utilisation des ressources
?Comment utiliser les ressources des FPGAs ?
Instanciation directePrimitives (macro-cells,RAM, etc ...)
Bibliothèques de macro-fonctions
Selon les outils de syn-thèse ces instances nepeuvent pas être synthé-tisées de façon classique
MainMacro
Enveloppe
Enveloppe
Synthèse
RoutagePlacement
Description de haut niveau/ inférenceLes synthétiseurs détectent les blocs complexes.
Exemple : RAM, multiplieurs, etc ...
29- ASIC vs FPGAMéthodologie de conception- Spécificité des FPGAs
S. Mancini

Plan
IntroductionMéthodologie de conception
Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON
Durcissement aux radiationsSoCBilan
30- ASIC vs FPGA
Principe
On utilise des FPGAs pour valider la conception d’unASIC.
Il existe des plateformes d’émulation génériques degrandes complexité (Aptix, Quickturn, . . . ).
Accroissement de la vitesse de simulation
Pas de vérification temporelle
L’architecture de l’émulateur peut être inadaptée auprojet
31- ASIC vs FPGAMéthodologie de conception- Le prototypage : FPGA vers ASIC
S. Mancini
Exemple : Aptix
“Nokia made a commitment to create real-time prototypes of
all its new mobile phone designs. Prototypes are the only way
to validate our algorithms by testing actual voice transmission
quality. We adopted the Aptix solution because it provides a
productive debug environment while maintaining our objective
of real-time verification.
Stelios Podimatis Member of Technical Staff, ASIC Engineering, Nokia (San Diego, CA)
The System Explorer MP3CF is optimizedfor prototyping DSP-based pipelineddesigns with moderate requirements forinterconnect between prototyping compo-nents. The MP3CF architecture providesmaximum performance for prototypesincorporating fixed-pin prototyping com-ponents such as CPUs, DSPs, memorycards, etc. Use the MP3CF for buildinghigh-speed prototypes of wireless commu-nication and digital-imaging applications.
FPCB® user “freehole” areawith 1,920 routable pins
accommodates a wide varietyof prototyping components
FPIC® Programmable InterconnectComponents (3) provides soft-
ware-controlled interconnect anddiagnostic probing
Microcontroller configures all programmable hardware,
performs system self-test andstores data for stand-alone
configurationBoard-edge I/O
Modular hard-wired buses forhigh-fanout bi-directional nets
System Explorer MP3CF hardware
I/O cable connectors (20) withinterleaved grounds provide flexibleconnection to target systems
System Explorer MP3CF interconnect architecture
/ /
/
REGION #3REGION #2REGION #1
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
USER COMPONENT HOLES
FPIC#1
FPIC#2
FPIC#3
140 140
140
GLOBAL INTERCONNECT LINES
One-to-one connectionsbetween FPIC®
Device and component pins
All component pins in agiven region connectthrough one FPIC® device
Component pins in differentregions connect through twoFPIC® devices
Solutions for Wireless Communications and Image Processing
User-controlled power supply voltageselection and monitoring to supportadvanced prototyping components todayand tomorrow
Modular low-skewclock circuits (8)
12
3
5
4
6
7
8
1
“Nokia made a commitment to create real-time prototypes of
all its new mobile phone designs. Prototypes are the only way
to validate our algorithms by testing actual voice transmission
quality. We adopted the Aptix solution because it provides a
productive debug environment while maintaining our objective
of real-time verification.
Stelios Podimatis Member of Technical Staff, ASIC Engineering, Nokia (San Diego, CA)
The System Explorer MP3CF is optimizedfor prototyping DSP-based pipelineddesigns with moderate requirements forinterconnect between prototyping compo-nents. The MP3CF architecture providesmaximum performance for prototypesincorporating fixed-pin prototyping com-ponents such as CPUs, DSPs, memorycards, etc. Use the MP3CF for buildinghigh-speed prototypes of wireless commu-nication and digital-imaging applications.
FPCB® user “freehole” areawith 1,920 routable pins
accommodates a wide varietyof prototyping components
FPIC® Programmable InterconnectComponents (3) provides soft-
ware-controlled interconnect anddiagnostic probing
Microcontroller configures all programmable hardware,
performs system self-test andstores data for stand-alone
configurationBoard-edge I/O
Modular hard-wired buses forhigh-fanout bi-directional nets
System Explorer MP3CF hardware
I/O cable connectors (20) withinterleaved grounds provide flexibleconnection to target systems
System Explorer MP3CF interconnect architecture
/ /
/
REGION #3REGION #2REGION #1
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
FPGA FPGA
USER COMPONENT HOLES
FPIC#1
FPIC#2
FPIC#3
140 140
140
GLOBAL INTERCONNECT LINES
One-to-one connectionsbetween FPIC®
Device and component pins
All component pins in agiven region connectthrough one FPIC® device
Component pins in differentregions connect through twoFPIC® devices
Solutions for Wireless Communications and Image Processing
User-controlled power supply voltageselection and monitoring to supportadvanced prototyping components todayand tomorrow
Modular low-skewclock circuits (8)
12
3
5
4
6
7
8
132- ASIC vs FPGAMéthodologie de conception- Le prototypage : FPGA vers ASIC
S. Mancini
Plan
IntroductionMéthodologie de conception
Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON
Durcissement aux radiationsSoCBilan
33- ASIC vs FPGA
Architecture de LEON
LEON-2 User’s Manual 9 Version 1.0.19
Gaisler Research
1.4 Functional overview
A block diagram of LEON-2 can be seen in figure 1.
1.4.1 Integer unit
The LEON integer unit implements the full SPARC V8 standard, including all multiply anddivide instructions. The number of register windows is configurable within the limit of theSPARC standard (2 - 32), with a default setting of 8. To aid software debugging, up to fourwatchpoint registers can be configured. Each register can cause a trap on an arbitraryinstruction or data address range. If the debug support unit is enabled, the watchpoints canbe used to enter debug mode.
1.4.2 Floating-point unit and co-processor
The LEON model does not include an FPU, but provides a direct interface to the Meiko FPUcore, and a general interface to connect other floating-point units. A generic co-processorinterface is provided to allow interfacing of custom co-processors.
1.4.3 Cache sub-system
Separate, multi-set instruction and data caches are provided, each configurable with 1 - 4sets, 1 - 64 kbyte/set, 16 - 32 bytes per line. Sub-blocking is implemented with one valid bitper 32-bit word. The instruction cache uses streaming during line-refill to minimise refill
Figure 1: LEON-2 block diagram
Integer unit
I-Cache D-Cache
FPU
MemoryController
AMBA AHB
UARTS
Timers IrqCtrl
I/O port
AMBA APB
AHB/APBBridge
AHBController
PCI
LEON processor
I/OPROM SRAM
8/16/32-bits memory bus
DebugSupport Unit CP
DebugSerial Link
MMU
SDRAM
EthernetLocal ram
Local ram
1Références : http ://www.gaisler.com
34- ASIC vs FPGAMéthodologie de conception- Exemple de projet “multi-plateforme” : LEON
S. Mancini
Cibles technologiques
Technologie RAM PADSModèle comportemental inférée inférésXilinx VIRTEX/2 FPGA instanciée inférésAtmel ATC18/25/35 instanciée instanciésUMC FS90A/B instanciée instanciésUMC 0.18 um CMOS instanciée instanciésTSMC 0.25 um w. Artisan rams instanciée instanciésActel Proasic FPGA instanciée inférésActel AX anti-fuse FPGA instanciée inférés
35- ASIC vs FPGAMéthodologie de conception- Exemple de projet “multi-plateforme” : LEON
S. Mancini

Organisation du projet
hdss1_128x32cm4sw0
atc18_syncram
Code VHDL
generic_syncram
RAMB16_S36
virtex2_syncram
RAM256x9SST
proasic_syncram
syncram
cache
Les mémoires instanciées sont à la fois :
Des boîtes noires pour la synthèseLes entités sont considérées comme des cellules de la biblio-thèque.
Des descriptions comportementales pour la simulationElles peuvent être fournies par le vendeur de RAM.
36- ASIC vs FPGAMéthodologie de conception- Exemple de projet “multi-plateforme” : LEON
S. Mancini
Exemple de codecachemem.vhdentity cachemem is...dtags0 : syncram port map (......
tech_map.vhdentity syncram is...inf : if INFER_RAM generateu0 : generic_syncram generic map (...hb : if (not INFER_RAM) generateatc1 : if TARGET_TECH = atc18 generateu0 : atc18_dpram generic map (......
tech_act18.vhd– pragma translate_offentity hdss2_512x32cm4sw0 is...architecture behavioral of hdss2_512x32cm4sw0 is...– pragma translate_on
entity atc18_syncram is...id0 : hdss1_128x32cm4sw port map (......
37- ASIC vs FPGADurcissement aux radiations
S. Mancini
Plan
IntroductionMéthodologie de conceptionDurcissement aux radiations
Durcissement des ASICsDurcissement des FPGAs
SoCBilan
38- ASIC vs FPGA
Single Event Upset (SEU)
Une particule peut faire changer d’état les éléments de mé-morisation (Latch, registres, SRAM, . . . ) .
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
sgnd vdd
N+ P P N N P+
Substrat NCaisson P
Select
Select
se
e
0
39- ASIC vs FPGADurcissement aux radiations
S. Mancini
Single Event Transient (SET)
La circuiterie combinatoire peut être altérée :
Une erreur à l’instant d’échantillonnage peut être mé-morisée
L’arbre d’horloge génère des fronts parasites
Q
D
Clk
Q
D
Clk
Clk
QD
SET sur l’horlogeSET sur la donnée
40- ASIC vs FPGADurcissement aux radiations
S. Mancini
Latchup
gnd vdd
P P N+P+ N N
Caisson NSubstrat P
41- ASIC vs FPGADurcissement aux radiations
S. Mancini

Plan
IntroductionMéthodologie de conceptionDurcissement aux radiations
Durcissement des ASICsDurcissement des FPGAs
SoCBilan
42- ASIC vs FPGA
Principales méthodes
Utilisation de technologies :
Sur-mesures
Dissipation des charges (dimensionnement, capacités)Filtrage temporel (retard+vote)Isolation des transistorsCellules intra-redondantes
Standards
TMRCodes correcteurs d’erreurAuto-test
43- ASIC vs FPGADurcissement aux radiations- Durcissement des ASICs
S. Mancini
Les registres
TMR : Triple Modular Redundancy
CLK
Vote
Les registres doivent être éloignés pour ne pas subirle même défaut. Il doivent être mis à jour par la valeurcorrigée.
44- ASIC vs FPGADurcissement aux radiations- Durcissement des ASICs
S. Mancini
Les mémoires
SRAM
StandardDes codes correcteurs d’erreurs protègent les donnéesstockées. Des bits supplémentaires sont nécessaires.
SpécifiquesLes bits d’un mot sont spatialement séparés. La surface estaccrue.
(S)DRAMLes SEU accélèrent la décharge des points mémoire.
On peut accroître le taux de rafraîchissement.
45- ASIC vs FPGADurcissement aux radiations- Durcissement des ASICs
S. Mancini
Méthodologies de durcissement
Méthodes automatiques
Technologies spécifiquesLes cellules durcies sont utilisées au lieu des cellulesstandards.
Atmel propose la technologie durcie 0.18µ
ATC18RHA.TMRla synthèse “classique” est suivie d’une modification denetlist.
Cela peut être fait par des scripts des outils de synthèse ou par
modification des fichiers résultats.
Utilisation de gate-array durcisPar conception
46- ASIC vs FPGADurcissement aux radiations- Durcissement des ASICs
S. Mancini
Introduire des technologies d’auto-test dans les circuits.
49

Plan
IntroductionMéthodologie de conceptionDurcissement aux radiations
Durcissement des ASICsDurcissement des FPGAs
SoCBilan
47- ASIC vs FPGA
Origine des disfonctionnements
Les éléments des FPGAs qui sont susceptibles deprovoquer des disfonctionnements :
Registres des cellulesRAM embarquéeLa configuration est sensible aux SEU
La SRAM peut être altérée (XC2VP125 : 43 Mbits de configuration)
Les Anti-fusibles peuvent “claquer”Les EEPROM peuvent changer d’état
La logique générique génère des SETLogique d’interconnexionArbre d’horloge
48- ASIC vs FPGADurcissement aux radiations- Durcissement des FPGAs
S. Mancini
Les éléments de configuration externe (pour les FPGAs de typeSRAM) doivent aussi être protégés.
52
Remèdes
Les FPGAs sont plus délicats à durcir :Les registres et la RAM
Ce sont les mêmes méthodes que les ASICs.
La configuration
Adopter des technologies moins sensibles aux SEUsLes anti-fusibles sont moins sensibles que les SRAM/EEPROM
Vérifier la configurationUtilisation de la configuration partielle des FPGAs pour vérifier les cellules
automatiquement.
Insérer de l’auto-contrôle des calculsInsérer des séquences connues dans les calculs pour vérifier les résultats
ROM de séquences et référencesLFSR
Une détection de faute provoque la reconfiguration du FPGA.
49- ASIC vs FPGADurcissement aux radiations- Durcissement des FPGAs
S. Mancini
Méthodologie de durcissement
Il est possible d’implanter des TMR de façon transparente.
Pour les FPGAS d’Actel, Synplify permet d’implanterdirectement :
des Flip-flop combinatoiredes TMRdes Flip-flop combinatoire avec TMR
En VHDL, cela se fait à l’aide d’attributs :architecture top of top isattribute syn_radhardlevel of top : architecture is "tmr_cc" ;...attribute syn_radhardlevel of counter_q : signal is "tmr" ;...
50- ASIC vs FPGADurcissement aux radiations- Durcissement des FPGAs
S. Mancini
Composants spécifiques
Actel propose des circuits résistants aux radiations :
Programmation par anti-fusibles résistantsSans registres
Les registres sont faits avec des éléments combinatoire
Avec des registres durcis
RT54SX-S RadTolerant FPGAs for Space Applications
4 Advanced v1.4
To achieve the SEU requirements, the D flip-flop in the RT54SX-S R-cell is enhanced (Figure 3). Both the master and slave “latches” are actually implemented with three latches. The feedback path of each of the three latches is voted with the outputs of the other two latches. If one of the three latches is struck by an ion and starts to change state, the voting with the other two latches prevents the change from feeding back and permanently latching. Care was taken in the layout to ensure that a single ion strike could not affect more than one latch.
Figure 4 is a simplified schematic of the test circuitry that has been added to test the functionality of all the components of the flip-flop. The inputs to each of the three latches are independently controllable so the voting circuitry in the feedback paths can be exhaustively tested. This testing is performed on an unprogrammed array during wafer sort, final test and post burn-in test. This test circuitry cannot be used to test the flip-flops once the device has been programmed.
Figure 3 • RT54SX-S R-Cell Implementation of D Flip-Flop Using Voter Gate Logic
Figure 4 • R-Cell Implementation— Test Circuitry
CLKCLK
D
CLK
Q
VoterGate
CLK
CLK
CLK
CLK
CLK
Tst1
CLK
D Q
VoterGate
Tst2
Tst3Test
Circuitry
1Les latchs sont séparées pour ne pas subir les mêmes
rayonnements.
51- ASIC vs FPGADurcissement aux radiations- Durcissement des FPGAs
S. Mancini

Efficacité des durcissements
Quelques circuits d’Actel :
LRH1280 0.8 µm ( A1280 )
GEO SEUFlip Flop 10−6
Flip Flop (CC) 10−7
TMR 10−10
RTAX 0.15 µm (AX 0.15 µm S-cell=TMR)SRAM Registre
Famille LETTH GEO SEU LETTH GEO SEUAX 1, 4 3.10−7 3, 36 > .. > 2, 89 10−6
RTAX 1, 4 10−10 (EDAC) > 37 < 10−10
Pas de SEL pour LET = 120 MeV-cm2/mgLETTH en MeV-cm2/mg
GEO SEU= erreur/bit/jour en orbite géostationnaire
52- ASIC vs FPGASoC
S. Mancini
Plan
IntroductionMéthodologie de conceptionDurcissement aux radiationsSoC
Rappels sur les SoCsEtude comparative
Bilan
53- ASIC vs FPGA
Constituants des SoCs
Les technologies actuelles permettent de mettre sur unmême circuit :
ASICProcesseursMémoire (SRAM et DRAM)Bus systèmesAnalogique
SoC=System on Chip.
Les circuit programmables permettent le même type deréalisation : les SoPC (System on Programmable Chip).
54- ASIC vs FPGASoC- Rappels sur les SoCs
S. Mancini
Un SoPC : Excalibur (Altera)
55- ASIC vs FPGASoC- Rappels sur les SoCs
S. Mancini
Plan
IntroductionMéthodologie de conceptionDurcissement aux radiationsSoC
Rappels sur les SoCsEtude comparative
Bilan
56- ASIC vs FPGA
Les microprocesseurs
ASIC
FPGA
Ils sont disponibles selon les besoins.
PrécaractérisésOptimisés par les fondeurs sous licence.
SynthétisablesModèles disponibles de haut niveau pour la synthèse. Certaines
parties doivent être adaptées à la technologies.
ParamétrablesLes processeurs s’adaptent aux besoins de l’application :
Taille et type des cachesMécanismes systèmes (TLB, adressage virtuel, . . . )Co-processeurs
Performances : MIPS 32 bits = 300 MHz
57- ASIC vs FPGASoC- Etude comparative
S. Mancini

Les microprocesseurs
ASIC
FPGA
On trouve deux type de processeurs :Synthétisables
Modèles génériques (ex Leon) ou processeur fournis par ven-
deurs de FPGAs (ex : NIOS (Altera), MicroBlaze (Xilinx)).
Ressources utilisées : RAM double port, CAM.
Performance ≈ 50 MHz
La limitation des ressources impose desprocesseurs simples.
Intégrés dans les FPGAExempleExcalibur ARM (Altera), Virtex II Pro (Xilinx)
Performance ≈ 300 MH
Leurs caractéristiques sont figées.
58- ASIC vs FPGASoC- Etude comparative
S. Mancini
Les bus
ASIC
FPGA
Les technologies sont adaptées aux be-soins.
Maître Maître
Esclave Esclave Esclave
Mux Mux Mux
Maître Maître
Esclave Esclave Esclave
Bus Trois-états Bus à multiplexeurs
et peuvent cohabiter dans un même circuit.
59- ASIC vs FPGASoC- Etude comparative
S. Mancini
Les bus
ASIC
FPGA
La technologie est imposée par les res-sourcesLes bus trois-états sont peu recomman-dés (et même souvent impossibles).Pour économi-ser la logique,l’arbitrage peutêtre fait au ni-veau de chaqueesclave : les filsd’interconnexionssont nombreux.
6 Altera Corporation
Avalon Bus Specification
The Avalon bus module is generated automatically by the SOPC Builder, so that the system designer is spared the task of connecting the bus and peripherals together. The Avalon bus module is very rarely used as a discrete unit, because the SOPC Builder will almost always be used to automate the integration of processors and other Avalon bus peripherals into a system module. The designer’s view of the Avalon bus module usually is limited to the specific ports that relate to the connection of custom Avalon peripherals.
Note that the Avalon bus module (an Avalon bus) is a unit of active logic that takes the place of passive, metal bus lines on a physical PCB. (See Example 2). In this context, the ports of the Avalon bus module could be thought of as the pin connections for all peripheral devices connected to a passive bus. The Avalon Bus Specification Reference Manual defines only the ports, logical behavior and signal sequencing that comprise the interface to the Avalon bus module. It does not specify any electrical or physical characteristics of a physical bus.
Figure 2. Avalon Bus Module Block Diagram - an example system
The Avalon bus module provides the following services to Avalon peripherals connected to the bus:
1Bus AvalonLes CPUs embarqués imposent des bus sys-tèmes.
60- ASIC vs FPGASoC- Etude comparative
S. Mancini
La mémoire
ASIC
FPGA
Les mémoires sont disponibles sousforme de blocs pré-caractérisés .ROM et RAM sont générées selon lesbesoins.Les technologies actuelles permettentla cohabitation de plusieurs types demémoires (SRAM, SDRAM, associatives,. . . ).Les ROMs sont crées sur-mesures.
UMC propose des bibliothèque et généra-teurs de SRAM.http ://www.umc.com/english/design/b_1.asp
Performances 0,13 µm : SRAM 1K x 16 ac-cess time = 1,1 ns
61- ASIC vs FPGASoC- Etude comparative
S. Mancini
La mémoire
ASIC
FPGA
Les FPGAs fournissent des blocs de mé-moire élémentaires (≈ 4 KOctets).Ils peuvent être assemblées pour former degrandes quantités.Les ROMs sont synthétisées en circuitscombinatoires.Pas de SDRAMs .
Xilinx XC2VP125 (Virtex II Pro) (0,13 µm )556 blocs de SRAM de 18Kbits = 10,008KbitsConfigurations
{16K x 1 bit 4K x 4 bits 1K x 18 bits8K x 2 bits 2K x 9 bits 512 x 36 bits
Timings
{Setup Prop Clk min
SelectRAM 0,4 1,5 1,3CLB 0,5 1,8 1,4
62- ASIC vs FPGASoC- Etude comparative
S. Mancini
Horloges multiples
ASIC
FPGA
Les ASICs permettent des architecturesde domaines d’horloges complexes.
Des FIFOs asynchrones adaptées per-mettent les changements de domaines : lesméta-stabilitées sont résolues.Chaque domained’horloge a sonarbre d’hor-loge propre.
82 horlogesdans lePNX8500(Philips).
chiplet timing, clock matching, and I/O tim-
ing analysis.
To achieve timing closure, we made engi-
neering change orders to the netlist after routing.
Following each manipulation step, formal verifi-
cation ensured that the modified netlist was func-
tionally equivalent to the one after test insertion.
We aligned all clock domains having syn-
chronous chiplet crossings. For example, if the
memory interface clock in one chiplet was syn-
chronously connected to the same clock in
another chiplet, we phase-aligned these clocks
and analyzed the signal paths to meet timing
constraints. We achieved clock alignment by
tweaking the clock insertion delays, using align-
ers in the clock module. Similarly, we made the
clock trees as structurally identical as possible.
As part of the physical design process, we met
design completion and manufacturability goals
by implementing techniques such as design rule
checks, antenna fixes, track filling, and doubling
of vias wherever possible. Figure 4 shows the lay-
out plot for the Viper design’s initial version.
Table 3 summarizes the major design
parameters.
WE HAVE LEARNED much from the Viper design
experience and trust it will guide us in the
future, particularly since the next-generation
SOC designs are significantly more complex,
calling for still higher levels of integration. Some
of our current activities, in addition to regular
chip-development tasks, are investigating more
efficient on-chip bus architectures and better
design-reuse methodologies. �
AcknowledgmentsWe thank the Viper management and design
teams for their hard work, particularly chief
architects Gert Slavenburg and Lane Albanese,
without whose foresight and leadership the pro-
ject never would have been successful.
References1. S. Rathnam and G. Slavenburg, “An Architectural
Overview of the Programmable Multimedia
Processor, TM-1,’’ Proc. 41st IEEE Computer
Society Int’l Conf. (COMPCON 96), IEEE CS
Press, Los Alamitos, Calif., 1996, pp. 319-326.
2. D. Paret and C. Fenger, The I2C Bus, John Wiley
& Sons, New York, 1997.
Santanu Dutta is a designengineering manager atPhilips Semiconductors inSunnyvale, California. Hisresearch interests includedesign of high-performance
Application-Specific SOC Multiprocessors
30 IEEE Design & Test of Computers
CAB MPEG
MBS+
VIP1+
VIP2
ICP1 + ICP2 + MMI
Conditionalaccess
(MSP1 + MSP2)T-PI
M-PI
TM32
1394
MSP3
PR3940
Figure 4. Layout of Viper (PNX8500).
Table 3. Design statistics.
Parameter Value
Process technology TSMC 0.18 µm, six metal layers
Transistors About 35 million
Instances 1.2 million instances, or 8 million gates
Memories 243 instances, 750-Kbit memory
CPUs 2 (TriMedia TM32 and MIPS PR3940)
Peripherals 50
Clock domains 82
Clock speed TM32: 200 MHz; PR3940: 150 MHz;
SDRAM: 143 MHz
Power 4.5 W
Supply voltage 1.8-V core and 3.3-V I/O
Package BGA456
1
63- ASIC vs FPGASoC- Etude comparative
S. Mancini

Horloges multiples
ASIC
FPGA
les arbres d’horloge sont déjà construits.Le nombre d’horloges est limité.
macro bloc Apex 20k
Les changements dedomaines sont déli-cats.
Xilinx propose desDigital Clock Ma-nagerles FIFOs asyn-
chrones sontfaites de cellulesdu FPGA : leurperformances sontlimitées .
Functional Description: FPGAR
38 www.xilinx.com DS083-2 (v2.7) June 2, 20031-800-255-7778 Advance Product Specification
Each global clock multiplexer buffer can be driven either bythe clock pad to distribute a clock directly to the device, orby the Digital Clock Manager (DCM), discussed in DigitalClock Manager (DCM), page 40. Each global clock multi-
plexer buffer can also be driven by local interconnects. TheDCM has clock output(s) that can be connected to globalclock multiplexer buffer inputs, as shown in Figure 47.
Global clock buffers are used to distribute the clock to someor all synchronous logic elements (such as registers inCLBs and IOBs, and SelectRAM+ blocks.
Eight global clocks can be used in each quadrant of theVirtex-II Pro device. Designers should consider the clockdistribution detail of the device prior to pin-locking and floor-planning. (See the Virtex-II Pro Platform FPGA UserGuide.)
Figure 48 shows clock distribution in Virtex-II Pro devices.
In each quadrant, up to eight clocks are organized in clockrows. A clock row supports up to 16 CLB rows (eight up andeight down).
To reduce power consumption, any unused clock branchesremain static.
Figure 47: Virtex-II Pro Clock Multiplexer Buffer Configuration
ClockPad
LocalInterconnect
ClockPad
ClockBuffer
Clock Multiplexer
I
O
Clock Distribution
CLKIN
CLKOUT
DCM
DS083-2_43_122001
Figure 48: Virtex-II Pro Clock Distribution
8
88
8
NW NE
SWSE
DS083-2_45_122001
8 BUFGMUX
8 max
8 BUFGMUX
16 Clocks
NW NE
SW SE
8 BUFGMUX
8 BUFGMUX
16 Clocks
1Horloges Virtex II Pro
64- ASIC vs FPGASoC- Etude comparative
S. Mancini
L’analogique
ASIC
FPGA
La plupart des technologies numériquessont compatibles avec l’analogique.
Les blocs analogiques sont conçus à partet intégrés à l’assemblage.Les zones numériques/analogiques sontséparées pour réduire le bruit d’horloge.
65- ASIC vs FPGASoC- Etude comparative
S. Mancini
L’analogique
ASIC
FPGA
Pas d’analogique intégrée.Les circuit analogiques programmablesexistent mais ils sont peu performants.
66- ASIC vs FPGABilan
S. Mancini
Plan
IntroductionMéthodologie de conceptionDurcissement aux radiationsSoCBilan
67- ASIC vs FPGA
Comparaisons de performances
Performances et complexité de la réalisation dumicroprocesseur LEON pour différentes ciblestechnologiques :
Technologie Complexité Fréquence
ASIC
Atmel 0.18 CMOS std-cell 35K gates + RAM 165 MHz (pre-layout)Atmel 0.25 CMOS std-cell 33K gates + RAM 140 MHz (pre-layout)UMC 0.25 CMOS std-cell 35K gates + RAM 130 MHz (pre-layout)Atmel 0.35 CMOS std-cell 2 mm2+ RAM 65 MHz (pre-layout))
FPGA
Xilinx XC2V500-6 (0.15 µm ) 4,800 LUT + 14/32 block RAM 65 MHz (post-layout)Altera 20K200C-7 (0.15 µm ) 5,700 LCELLs + EAB RAM (52%) 49 MHz (post-layout)Actel AX1000-3 (0.15 µm ) 7,600 cells + 14/36 RAM 48 MHz (post-layout)
http ://www.gaisler.com/
68- ASIC vs FPGABilan
S. Mancini
Bilan
ASIC
FPGA
Maîtrise complète du projet
Maîtrise de la résistance aux radiationsCoûts réduits à grande échelle
Fort taux d’intégration
Performances maximum
Les erreurs coûtent cherConnaissance approfondie de la tech-nologie
NRE
69- ASIC vs FPGABilan
S. Mancini

Bilan
ASIC
FPGA
Temps de développement réduits
Familles résistantes aux radiationsInvestissements réduitsContraintes d’architectureMéconnaissance des détails internes/caractéristiques
Relachement de l’attentionAccroissement des risques de pannes
Coûts unitaires élevéesComplexité limitée
Performances limitées70- ASIC vs FPGA
BilanS. Mancini
Conclusion
Choisir entre un FPGA et un ASIC ?Surface/coût
Efficacité
Souplesse
Réutilisabilité
Temps de développement
DébitsConsommationArchitecture mémoire
Puissance de calcul
Technologie
Fonctionnalité
?
... ça dépend ...
71- ASIC vs FPGABilan
S. Mancini
Références
72- ASIC vs FPGABilan
S. Mancini
ASIC vs FPGA
S. Mancini
Plan Détaillé
Introduction
Problématique
ASICs
Les famillesEvolution des technologie
FPGA
PrincipeTechnologies de programmationActel (ProAsic)Actel (Axcelerator)Xilinx (Spartan 3/Virtex II)Altera (Apex/Stratix)
Modèles de coûts
Coûts des FPGAsCoût des ASICSComparaisonLes circuits multi-projets
Méthodologie de conception
Méthodes communes
Flot de conception
Spécificité des ASICs
Synthèse directe
Composants "précaractérisés"-IPLe “Back-End”
Spécificité des FPGAs
Modèles d’entréesPlacement/routageUtilisation des ressources
Le prototypage : FPGA vers ASIC
PrincipeExemple : Aptix
Exemple de projet “multi-plateforme” :LEON
Architecture de LEONCibles technologiquesOrganisation du projetExemple de code
Durcissement aux radiations
Single Event Upset (SEU)Single Event Transient (SET)Latchup
Durcissement des ASICs
Principales méthodesLes registresLes mémoires
Méthodologies de durcissement
Durcissement des FPGAs
Origine des disfonctionnementsRemèdesMéthodologie de durcissementComposants spécifiquesEfficacité des durcissements
SoC
Rappels sur les SoCs
Constituants des SoCsUn SoPC : Excalibur (Altera)
Etude comparative
Les microprocesseursLes busLa mémoireHorloges multiplesL’analogique
Bilan
Comparaisons de performancesBilanConclusionRéférences