dependable memory techniques for highly reliable vlsi system
TRANSCRIPT
1
Dependable Memory Techniques for Dependable Memory Techniques for Highly Reliable VLSI SystemHighly Reliable VLSI System
JST/CREST/DVLSI Meeting, June 9, 2012
Masahiko Yoshimoto, Kobe Univ.Masahiko Yoshimoto, Kobe Univ.Hiroshi Kawaguchi, Kobe Univ.Hiroshi Kawaguchi, Kobe Univ.
Makoto Nagata, Kobe Univ.Makoto Nagata, Kobe Univ.Koji Nii, Renesas ElectronicsKoji Nii, Renesas Electronics
Shigeru Oho, Nippon Institute of TechnologyShigeru Oho, Nippon Institute of TechnologyYasuoYasuo SugureSugure, Hitachi, Ltd., Hitachi, Ltd.
2
Vth variation
NBTI, temperature fluctuation
Power supply noise
Soft error
• QoB Cache• FGVC memory
(with BIST)
• Power supply noise filter• Autonomic dependable
memory with ODM
• Soft error tolerant memory cell• Lockstep with QoB multicore
Improvement / tolerant technique
Dependability degradation factor
Dependability degradation factorDependability degradation factor
QoB: Quality of BitFGVC: Fine-Grain Voltage Control
3
SRAM operating voltage trendSRAM operating voltage trend
Presented in ISSCC
0.5
0
1.0
1.5
2.0
180nm 130nm 90nm 65nm
Supp
ly v
olta
ge V
dd[V
]
45nm
HitachiHitachi
Minimum Minimum VVdddd of SRAMof SRAM
Standard operating voltageStandard operating voltage
Hitachi
Intel
Samsung
Intel
NEC
Hitachi
Infineon
Intel (Dual-Vdd)
Hitachi
Matsushita NEC(7T)
Intel (Dual-Vdd)
Technology node
HP (SOI) IBM (SOI)
Intel (Dual-Vdd)IBM (SOI)
IBM (SOI)Sony
Renesas (8T)
32nm 28nm
Toshiba
Intel
Intel
Toshiba
Toshiba
AMD
AMD
4
Dependability evaluation with operation fault and soft errorDependability evaluation with operation fault and soft error
Margin improvement and soft error tolerant technique is necessary.
Rel
iabi
lity
(FIT
)
0.6 0.8 1.0 1.2 1.4 Voltage (V)
Operation fault
Soft error
Nominal voltage
1E+20
1E+08
1E+02
1E-04
1E+14
In advanced process,operation fault caused by
variation and NBTI is dominant in the memory
dependability.
NBTI degradation
5
AssociativelyAssociatively--Reconfigurable Cache with Reconfigurable Cache with QoBQoB SRAMSRAMMinimum op. voltage and Minimum op. voltage and Performance evaluationPerformance evaluation
Proposed cache memoryProposed cache memory
256KB L2 Cache 32KB IL1 Cache 32KB DL1 Cache
0.940.950.960.970.980.991
Nor
mal
ized
IPC
8-way /256 KB
7-way /224 KB
6-way /192 KB
5-way /160 KB
4-way /128 KB
0.5
0.55
0.6
0.65
0.7
8 way 7 way 6 way 5 way 4 wayAssociativity
Vmin
[V] 115 mV
- 4.93 %
Normal mode operation Dependable mode operation
Consist one dependable way in dependable mode operation by pair structure of cache ways and QoB SRAMs
Consist one dependable way in dependable mode operation by pair structure of cache ways and QoB SRAMs
115 mV Vmin improvement with only 4.93% IPC degradationProp. cache can trade off processor performance for reliability
115 mV Vmin improvement with only 4.93% IPC degradationProp. cache can trade off processor performance for reliability
6
ASR0 ASR1
WL0
WL1
WL31
......
WL driver Pull-down NMOS
ASW1
VDDCVDD0
...
ASW0
WE(write-enable)
Y0
VDDCVDD31
Y31
...
...
...
Read assist circuit
TEG chip and 128-Kb SRAM layout
5Mb SRAM (128kb x 40)and
memory BIST w/ assist logic
プログラマブル電源制御部
電圧モニタ
Cell array Cell array
Colum I/O Colum I/O
Add. buffer and control
Row
dec
oder
Write-assistcircuits
Write-assistcircuits
10.5 µm
495 µm
580
µm
Read-assistcircuits
2.2 µm
0.60
0.65
0.70
0.75
0.80
0.85
0.90
#1 #2 #3 #4 #5 #6 #7 #8 #9Samples
Vmin
(V)
w/o AssistConv. AssistProp. Assist
Measurement resultWrite assist circuit
Fine Grain Voltage Control (FGVC) SRAM Fine Grain Voltage Control (FGVC) SRAM
7
Confirmation of SRAM margin improvement scheme and failure detect technique.
• QoB-SRAM and FGVC-SRAM are implemented.• Autonomic control logic applying BIST and power supply noise monitor
are combined with QoB and FGVC SRAM.Chip layoutBlock diagram
(40nm process, 5mm x 5mm)
QoB-SRAM
ODM FGVC-SRAM
Controller
QoB-SRAM
Autonomic dependable memory chipAutonomic dependable memory chip
On-die monitor
Flexible power supply network
Logic circuit
BIST
Power supply monitorTemperature monitor Aging
monitor
Autonomic dependable memory system
Programmable power switch
controller
• FGVC• QoB• Failure block
replace
Voltage change
Reconfig.
Replace
Memory bank
8
MCU tolerant 6T SRAM cell layoutMCU tolerant 6T SRAM cell layout
0
50
100
150
200
250
300
350
-86%
-86%
(c) CS pattern, 1.2 V
0
50
100
150
200
250
300
350
MC
USE
R [F
IT/M
b]
NPN cellTwin well
PNP cellTwin well
NPN cellTriple well
PNP cellTriple well
0
50
100
150
200
250
300
350
MC
USE
R [F
IT/M
b]
(d) RS pattern, 1.2 V
-95%
-98%
NPN cellTwin well
PNP cellTwin well
NPN cellTriple well
PNP cellTriple well
0
50
100
150
200
250
300
350
Irradiation board
256 KbNPN
SRAM
256 KbPNP
SRAM
256 KbNPN
SRAM
256 KbPNP
SRAM
I/O buffer I/O buffer
750 μm
1050
μm
Twin well Triple well
W target
400 MeV proton beam
7562 mm 150 mm 180 mm
Measurement board
2.11 μm
0.60 μm
PL0
PL1
ND0
ND1NA0
NA1
2.11 μm
PL0
PL1 ND1
NA1ND0
NA0
PMOS(N-well)
NMOS(P-well)
NMOS(P-well)
(a) NMOS-PMOS-NMOS (NPN) 6T cell
NMOS(P-well)
PMOS(N-well)
PMOS(N-well)
(b) PMOS-NMOS-PMOS (PNP) 6T cell
BL BLN
WL WL
BL BLN
0.60 μm
0
50
100
150
200
250
300
350
-97%
-93%
(a) CKB pattern, 1.2 V
0
50
100
150
200
250
300
350
MC
USE
R [F
IT/M
b]NPN cellTwin well
PNP cellTwin well
NPN cellTriple well
PNP cellTriple well
MCUBL=1
MCUBL>1
0
50
100
150
200
250
300
350
-67% -67%
(b) All0 pattern, 1.2 V
0
50
100
150
200
250
300
350
MC
USE
R [F
IT/M
b]
NPN cellTwin well
PNP cellTwin well
NPN cellTriple well
PNP cellTriple well
(a) Conventional NPN 6T cell (b) Proposed PNP 6T cell
1Mb SRAM TEG
Neutron acceleration test at RCNP
Spallation neutron beam irradiates to the measurement board at RCNP. The measurement board includes 256-Kb
SRAMs w/ and w/o triple wells. Horizontal MCU SER is improved by 67%‒98%
PMOS-NMOS-PMOS(NMOS-inside) 6T SRAM cell layout is proposed
The proposed layout mitigates horizontal
multiple cell upset (MCU)
9
Rel
iabi
lity
(FIT
)
0
1E+10
1E+05
1E+00
1E-05
1E-10
1.00.80.60.40.2
Dependability is improved by QoB cache and noise filterPower supply noise (Vpp)
Power Supply Noise filter
@ Nominal supply voltage
1E-15
1E-20
Power supply noise tolerancePower supply noise tolerance
Conventional 6T/ QoB 8way
QoB 7way
6way
5way
4way
1E+15
10
Rel
iabi
lity
(FIT
)
1E+21
1E+16
1E+06
1E+01
1E-040.6 0.8 1.0 1.2 1.4
Voltage (V)
Conv. SRAM
QoB SRAM(8way - 4way)
Prop. SRAM(MCU tolerant cell)
Operation fault
Soft error
Reliability improvement evaluationReliability improvement evaluation
①
②
①Operation faultin low voltage region is saved by QoB cache.
②Soft error in nominal voltage region is saved by MCU tolerant cell layout.
1E+11
108improvement
NBTI
11
Hazard analysis and Risk assessment
(H&R)
Hazard analysis and Risk assessment
(H&R)
Mitigate rate of occurrence of hazard
Mitigate rate of occurrence of hazard
Remove hazard(Intrinsically safety)
Remove hazard(Intrinsically safety)
Minimize hazard exposure time
Minimize hazard exposure time
Removal of the source of risk
Mitigate risk to the acceptance level
Functional safety
QoB memory
FGVC
Autonomic dependable memory
Lockstep with QoB multicore
Power supply noise filter
Our research contributions
Functional safety
Source: ISO26262, Nikkei Electronics 2011.1.10, Fig.4, pp.50
12
Controller model
Vehicle engine model
Collaborative sim.
Control softwareControl software
ECUMicro-controller
CPU Peripheral
SH-2A CPU
Bridge
INTC CMT
DMAC
A/D ATU
Fault-Injectablebus bridge
Micro-controllermodel
InternalSRAMmodel
SRAM failuredata pattern
Matlab®/Simulink®simulator
CoMETTM
simulatorVerification conditions for system
Fault-injection scheme
Fault-injection scheme
Fault CaseGenerator
System-level verification environment (PILS)
System-level verification environment (PILS)
Proposed Fault-Injection System (FIS)
13
Controller model
Vehicle engine model
Collaborative sim.
Control softwareControl software
ECUMicro-controller
CPU Peripheral
SH-2A CPU
Bridge
INTC CMT
DMAC
A/D ATU
Fault-Injectablebus bridge
Micro-controllermodel
InternalSRAMmodel
SRAM failuredata pattern
Matlab®/Simulink®simulator
CoMETTM
simulatorVerification conditions for system
Fault-injection scheme
Fault-injection scheme
Fault CaseGenerator
System-level verification environment (PILS)
System-level verification environment (PILS)
Can inject device-level SRAM failures into the system-level verification environmentCan inject deviceCan inject device--level SRAM failures into level SRAM failures into the systemthe system--level verification environmentlevel verification environment
The proposed fault-injection system can evaluate the impacts of SRAMs reliability on
the operating stability of system
Proposed Fault-Injection System (FIS)
14
Fault Case Generator (FCG)
FCG can generates time-series SRAM failure data patternscorrespond to arbitrary waveforms for the power supply noise and operating temperature, and device parameters
15
Supply voltage (V)Temp. (degC)
0.4
0.5
0.6
0.7
0.8
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
-50
080150
-254012
5
Abn
orm
al te
rmin
atio
n ra
te
Supply voltage (V)Temp. (degC)
0.4
0.5
0.6
0.7
0.8
-50
080150
-254012
5
System-Level Evaluation Result ECU System w/ 6T SRAM ECU System w/ 7T/14T SRAMw/ 7T/14T SRAM
ECU w/ 7T/14T SRAM improves the minimum op. voltage by 0.05–0.15 V
16
SummaryQuestion:
What is the definition of “dependability”?
Answer:Reliability (FIT) against …
variation (mismatch),aging (NBIT,HC) and RTN,supply noise,soft error
Reducing SRAM Vmin by autonomic controlled QoB and FGVCPower noise monitor and filterSER tolerant design (bitcell and system)