06212621
TRANSCRIPT
-
7/28/2019 06212621
1/4
',& %,65 IRU 6WDFNHG 0HPRULHV 8VLQJ &URVV'LH 6SDUHV
Chun-Chuan Chi1,2 Yung-Fa Chou2 Ding-Ming Kwai2 Yu-Ying Hsiao2
Cheng-Wen Wu1,2 Yu-Tsao Hsing3 Li-Ming Denq3 Tsung-Hsiang Lin3
1 Department of
Electrical Engineering
National Tsing-Hua University
Hsinchu, Taiwan
{ccchi, cww}@larc.ee.nthu.edu.tw
2 Information and Communications
Research Laboratories
Industrial Technology Research Institute
Hsinchu, Taiwan
{yfchou, dmkwai, yuyinghsiao}@itri.org.tw
3 HOY Technologies
Hsinchu, Taiwan
{john.hsing, taros.denq,
sean.lin}@hoy-tech.com
$EVWUDFW
' ,&V EDVHG RQ 7KURXJK6LOLFRQ9LDV 769V HQDEOH WKH VWDFNLQJ RI ORJLF DQG PHPRU\ GLHV WR PDQXIDFWXUH FKLSV ZLWK KLJKHU SHU
IRUPDQFH ORZHU SRZHU DQG VPDOOHU IRUP IDFWRU 7R LPSURYH WKH \LHOG RI WKH PHPRU\ GLHV LQ ' ,&V WKLV SDSHU SURSRVHV D %XLOW,Q
6HOI5HSDLU %,65 DUFKLWHFWXUH ZKLFK DOORZV WKH VKDULQJ RI VSDUHV EHWZHHQ GLIIHUHQW OD\HUV RI GLHV 7KH FRUUHVSRQGLQJ SUHERQG
EHIRUH WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU DQG SRVWERQG DIWHU WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU WHVW RZ LV SUHVHQWHG DV
ZHOO ,Q RUGHU WR PD[LPL]H WKH \LHOG JDLQ LQWURGXFHG E\ WKH FURVVGLH VSDUHV D GLH PDWFKLQJ DOJRULWKP LV SURSRVHG WR GHWHUPLQH ZKLFK
GLHV VKRXOG EH VWDFNHG WRJHWKHU VR WKDW WKH VSDUH VKDULQJ FDQ EH PRVW HIFLHQW ([SHULPHQWDO UHVXOWV VKRZ WKDW WKH DUHD RYHUKHDG
RI WKH SURSRVHG %,65 FLUFXLW LV RQO\ ZKLFK FDQ EH VPDOOHU LI ODUJHU ORJLF DQG PHPRU\ GLHV DUH DGRSWHG DQG WKH \LHOG JDLQ
DFKLHYHG E\ FURVVGLH VSDUH VKDULQJ FDQ EH XS WR
,QWURGXFWLRQ
Through Silicon Via (TSV) is an emerging process technology
which can provide inter-die connections through silicon sub-
strate. TSVs are manufactured by drilling through a silicon sub-
strate and filling the holes with metal, such as copper or tungsten,
so that they can provide high-density, low-latency, and low-power
vertical interconnects between dies. [16].
TSVs enable three-dimensional ICs (3D ICs) that integrate mul-
tiple dies into a single chip. Due to the shorter vertical inter-die
connections, 3D ICs can offer benefits like higher performance,lower power, and smaller form factor. In addition, 3D ICs en-
able heterogeneous integration, allowing each individual die to
be manufactured by different process technologies. One of the
most promising 3D integration paradigms is to stack processor
and memory dies together, which is effective in addressing the
memory wall problem that limits the processor performance
[714].
Recently, several techniques are proposed to improve the yield of
3D-stacked memories. [15] presents a redundancy scheme that
can be shared between different memory dies, by means of which
the yield of memory stacks can be increased. [16] proposes a
die matching algorithm to select which memory dies should bestacked together to maximize the stack yield, under the assump-
tion that the redundancy can be shared between different memory
dies. The proposed algorithm only addresses two-die stacks. [17]
tries to combine two bad dies into a good stack, so that the amount
of memory products that can be shipped is increased. However,
prior works lack in discussion of details of BISR architectures.
This paper focuses on the repair of memory dies in processor-
memory stacks. We propose a 3D BISR architecture, which
adopts a cross-die spare scheme that allows the sharing of spares
among memory dies. The hardware is identical for each mem-
ory die, independent of the layer it is located at. A pre-bond
and post-bond test/repair flow is presented as well, based on theproposed 3D BISR to improve the yield of memory stacks. In
order to share the spares efficiently between dies, we propose a
die matching algorithm, which uses heuristics, and therefore can
quickly determine which dies should be stacked together to make
the number of good stacks as large as possible.
The rest of this paper is organized as follows. Section 2 presents
the target 3D memory and the proposed BISR architecture; then
the corresponding pre-bond and post-bond test/repair flow is de-
tailed. Section 3 describes the proposed die matching algorithm,
which is essential to make the spare sharing effective. Experi-
mental results on area costs of the proposed BISR and the yield
gain brought by cross-die spares are shown in Section 4. Finally,Section 5 concludes this paper.
978-1-4577-2081-9/12/$26.00 2012 IEEE
-
7/28/2019 06212621
2/4
' %,65 $UFKLWHFWXUH
7KH 7DUJHW ' 0HPRU\
The target 3D memory in this paper is shown in Figure 1, inwhich the bottom die is assumed to be a logic die, and several
memory dies are stacked on it. In a typical paradigm, the logic
die can be a processor; the memory dies can be either SRAM
or DRAM. The interconnects between dies are implemented by
TSVs; external I/Os are assumed to be located on the bottom
logic die.
)LJXUH The target 3D memory in a logic-memory stack.
The access of the memory is controlled by Die 0, which generates
memory enable signals and broadcasts control signals to memory
dies. Only one memory die is enabled and responds to the control
signals at a time, since address and data buses are shared. Each
memory die consists of a main memory which is partitioned into
several banks, as well as a small amount of spare memory (not
shown in Figure 1).
' %,65
Figure 2 shows the proposed BISR architecture for the memory
dies in a logic-memory stack. In each memory die, there is a ded-
icated local Built-In Self-Test (BIST) and Built-In Redundancy
Analysis (BIRA). The BIST is responsible for generating test pat-
terns for the main and spare memories on the die and comparing
test responses with expected results to locate fault sites. Based on
the test results from the BIST, the BIRA can perform a built-in
RA algorithm and determine how many and what types of spares
are required to repair the faults in the main memory. The Re-
pair Sig. registers are used to temporarily store repair signaturesgenerated by the BIRA, which indicate the addresses of fault lo-
cations and the required types of spares to repair such faults. The
signatures are shifted out after the testing is finished. The added
testing-related hardware is identical for each memory die, and is
independent of which layer the memory is stacked at.
On the bottom die (Die 0), there is a Global Spare Assignment
Unit (GSAU), which receives repair signatures from memory dies
and assign spares globally. The scope of GSAU covers the entire
memory stack, and it is aware of which dies have spares available
and vice versa, since all repair signatures generated by local BI-
RAs are transferred to GSAU after testing. Therefore, the GSAU
can allocate spares on all memory dies to repair faults across dies,that is, using one dies spares to fix another dies faults is allowed.
It should be noted that we assume the main and spare memories
in every memory die meet the timing specification. Hence, us-
ing spares on a die to repair faults on another die does not intro-
duce timing problems. This assumption is reasonable, because all
memory dies in the 3D IC must meet the specified timing spec.in order to enable random access.
)LJXUH The proposed 3D BISR architecture.
The function of Address Remapping Unit (ARU) is to remap
faulty addresses to spare addresses according to the spare assign-
ment results from GSAU, and hence the faulty memory can be
repaired. The Remap Test Unit (RTU) is used to test the memory
after address remapping, to check whether the remapping is cor-
rect. Another purpose of this RTU test is to access memory dies
from Die 0, so that the faults on TSVs that serve as address and
data buses, and other control signals can be detected.
7HVW )ORZ
(a) Pre-bond
(b) Post-bond
)LJXUH Pre- and post-bond test/repair flow.
Based on the BISR architecture presented above, this sub-section
presents a corresponding pre-bond and post-bond test/repair flow,
which tries to maximize the yield of stacked memories. Fig-
ure 3 (a) shows the pre-bond test flow, which is a typical memory
test flow with BIRA. The BIST and BIRA cooperates to classify
memory dies into Good Dies and Bad Dies. Since in our pro-
posed scheme, spares can be shared between dies, a die is con-
sidered as a bad die only if the faults on itself cannot be repairedby all spares in the memory stack. For example, if a memory die
-
7/28/2019 06212621
3/4
contains 4 spare rows, and the number of memory dies in a stack
is set to 2, then a die will be considered as a bad die only if its
faults cannot be repaired by 4 2 = 8 spare rows.
Note that some of the Good Dies here are only possibly re-pairable, because they may require spares from other dies to be
repaired. We propose an algorithm that can select which dies
should be bonded together to efficiently share spares across dies
and maximize the yield of memory stacks, which will be detailed
in Section 3.
After pre-bond testing and the dies are stacked together, a post-
bond test flow is performed, as shown in Figure 3 (b). This post-
bond test is to detect the faults introduced by 3D process steps,
such as wafer thinning, bonding, etc. During the post-bond test-
ing, all memory dies are tested in parallel, and if there is any
memory die that is irreparable, then the test operations stop. On
the other hand, if all memory dies are still Good Dies after all ofthe BIST circuits finish the testing, the GSAU allocates spares to
repair faults according to the repair signatures from every mem-
ory die. If the amount of requested spares is more than the total
amount of available spares in the memory stack, the 3D mem-
ory is considered irreparable; otherwise the address remapping
configuration is stored, and an optional test can be performed to
check whether the remapping is successful as well as to test the
faults on TSVs that interconnect logic and memory dies.
'LH 0DWFKLQJ $OJRULWKP
)LJXUH Optimization flow of the proposed die matching algorithm.
After pre-bond testing, a die matching algorithm is required to
select which dies should be bonded together, so that the spares in
different dies can be shared efficiently and the stack yield can be
maximized. The optimization flow of the proposed die matching
algorithm is shown in Figure 4.
A two-valued vector (# spare rows, # spare cols) is attached to
each die, which represents the remaining spares on itself after
pre-bond testing. The input of the algorithm is a set of dies withtheir corresponding two-valued vectors, and a pre-defined num-
ber of dies that should be in a stack, k; the output is a set of die
stacks, each having two non-negative values in its overall two-
valued vector, which is defined as the sum of all individual vec-
tors of the dies in a stack.
In the beginning of the optimization flow, the dies are catego-
rized into 4 bins, according to their two-valued vectors. Bin1
contains the dies which have both spare rows and columns avail-
able (the amount is indicated by positive values); Bin2 contains
the dies which have only spare rows available, and need spare
columns from other dies to be repaired (the required amount is
indicated by a negative value); Bin3 contains the dies which have
only spare columns available, and need spare rows from other
dies to be repaired; Bin4 contains the dies which have run out of
all spares on themselves, and need both spare rows and columns
from other dies to be repaired.
The brief concept of the flow in Figure 4 is to always keepthe resulting overall vector consisting of positive values on each
step. Whenever a resulting vector consists of negative values,
the newly stacked die is removed, and a die from another bin is
stacked according to the previously obtained vector. For exam-
ple, if the previous vector is (+, -), indicating that the vector has
a positive value in the first position and a negative value in the
second, then a die from Bin3, which has a vector of (-, +) will be
stacked; if the previous vector is (-, -), then a die from Bin1 with
a vector of (+, +) will be stacked, and so on. The choice is made
based on an attempt to neutralize the negative values. After the
neutralization, the algorithm tries to stack a Bin4 die.
In Figure 4, the circle, Stack a Bin2 die & a Bin3 die, representsthe process that exhaustively searches for a combination of Bin2
and Bin3 dies, so that the two-die combination results in an over-
all vector which is (+, +). This process is performed when the
algorithm needs to stack a Bin1 die but Bin1 is already empty.
The matching process stops when Bin1 is empty, and there is no
Bin2-Bin3 combination that has a (+, +) vector.
([SHULPHQWDO 5HVXOWV
$UHD &RVW RI %,65
We have implemented an example design, which is a three-diestack, with a logic die at the bottom (Die 0), and two SRAM dies
(Dies 1 and 2) stacked on it, to evaluate the area cost of the pro-
posed 3D BISR. Each memory die consists of a main memory of
8k x 64, and three spare columns.
The area result is listed in Table 1. For each memory die, the area
introduced by the 3D BISR occupies 2.3% (excluding the area
of spares). Compared to the entire logic-memory stack, the 3D
BISR hardware represents approximately 2.43%.
The area of Die 0 (the logic die) is set to the same value as the
memory dies. Although this value is unrealistically small com-
pared to a real processor, it is sufficient to give us an estimate of
the BISR area overhead. Since the BISR hardware does not in-crease with respect to the area of the logic die, it is expected that
-
7/28/2019 06212621
4/4
the area overhead will be smaller if a more realistic large pro-
cessor die is adopted. Similarly, if the size of the main memory
increases, the area percentage occupied by the BIST and BIRA
will be smaller.
Functional Circuits GSAU ARU RTU Area
(m2) (m2) (m2) (m2) Overhead
Die 0 1.78M 20k 7k 21k 2.70%
Main Memory Spares BIST BIRA Area
(m2) (m2) (m2) (m2) Overhead
Die 1 1.78M 410k 39k 2k 2.30%
Die 2 1.78M 410k 39k 2k 2.30%
Total area overhead 2.43%
7DEOH Area costs of the proposed 3D BISR in 0.13m technology.