06212621

7/28/2019 06212621

1/4

',& %,65 IRU 6WDFNHG 0HPRULHV 8VLQJ &URVV'LH 6SDUHV

Chun-Chuan Chi1,2 Yung-Fa Chou2 Ding-Ming Kwai2 Yu-Ying Hsiao2

Cheng-Wen Wu1,2 Yu-Tsao Hsing3 Li-Ming Denq3 Tsung-Hsiang Lin3

1 Department of

Electrical Engineering

National Tsing-Hua University

Hsinchu, Taiwan

{ccchi, cww}@larc.ee.nthu.edu.tw

2 Information and Communications

Research Laboratories

Industrial Technology Research Institute

Hsinchu, Taiwan

{yfchou, dmkwai, yuyinghsiao}@itri.org.tw

3 HOY Technologies

Hsinchu, Taiwan

{john.hsing, taros.denq,

sean.lin}@hoy-tech.com

$EVWUDFW

' ,&V EDVHG RQ 7KURXJK6LOLFRQ9LDV 769V HQDEOH WKH VWDFNLQJ RI ORJLF DQG PHPRU\ GLHV WR PDQXIDFWXUH FKLSV ZLWK KLJKHU SHU

IRUPDQFH ORZHU SRZHU DQG VPDOOHU IRUP IDFWRU 7R LPSURYH WKH \LHOG RI WKH PHPRU\ GLHV LQ ' ,&V WKLV SDSHU SURSRVHV D %XLOW,Q

6HOI5HSDLU %,65 DUFKLWHFWXUH ZKLFK DOORZV WKH VKDULQJ RI VSDUHV EHWZHHQ GLIIHUHQW OD\HUV RI GLHV 7KH FRUUHVSRQGLQJ SUHERQG

EHIRUH WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU DQG SRVWERQG DIWHU WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU WHVW RZ LV SUHVHQWHG DV

ZHOO ,Q RUGHU WR PD[LPL]H WKH \LHOG JDLQ LQWURGXFHG E\ WKH FURVVGLH VSDUHV D GLH PDWFKLQJ DOJRULWKP LV SURSRVHG WR GHWHUPLQH ZKLFK

GLHV VKRXOG EH VWDFNHG WRJHWKHU VR WKDW WKH VSDUH VKDULQJ FDQ EH PRVW HIFLHQW ([SHULPHQWDO UHVXOWV VKRZ WKDW WKH DUHD RYHUKHDG

RI WKH SURSRVHG %,65 FLUFXLW LV RQO\ ZKLFK FDQ EH VPDOOHU LI ODUJHU ORJLF DQG PHPRU\ GLHV DUH DGRSWHG DQG WKH \LHOG JDLQ

DFKLHYHG E\ FURVVGLH VSDUH VKDULQJ FDQ EH XS WR

,QWURGXFWLRQ

Through Silicon Via (TSV) is an emerging process technology

which can provide inter-die connections through silicon sub-

strate. TSVs are manufactured by drilling through a silicon sub-

strate and filling the holes with metal, such as copper or tungsten,

so that they can provide high-density, low-latency, and low-power

vertical interconnects between dies. [16].

TSVs enable three-dimensional ICs (3D ICs) that integrate mul-

tiple dies into a single chip. Due to the shorter vertical inter-die

connections, 3D ICs can offer benefits like higher performance,lower power, and smaller form factor. In addition, 3D ICs en-

able heterogeneous integration, allowing each individual die to

be manufactured by different process technologies. One of the

most promising 3D integration paradigms is to stack processor

and memory dies together, which is effective in addressing the

memory wall problem that limits the processor performance

[714].

Recently, several techniques are proposed to improve the yield of

3D-stacked memories. [15] presents a redundancy scheme that

can be shared between different memory dies, by means of which

the yield of memory stacks can be increased. [16] proposes a

die matching algorithm to select which memory dies should bestacked together to maximize the stack yield, under the assump-

tion that the redundancy can be shared between different memory

dies. The proposed algorithm only addresses two-die stacks. [17]

tries to combine two bad dies into a good stack, so that the amount

of memory products that can be shipped is increased. However,

prior works lack in discussion of details of BISR architectures.

This paper focuses on the repair of memory dies in processor-

memory stacks. We propose a 3D BISR architecture, which

adopts a cross-die spare scheme that allows the sharing of spares

among memory dies. The hardware is identical for each mem-

ory die, independent of the layer it is located at. A pre-bond

and post-bond test/repair flow is presented as well, based on theproposed 3D BISR to improve the yield of memory stacks. In

order to share the spares efficiently between dies, we propose a

die matching algorithm, which uses heuristics, and therefore can

quickly determine which dies should be stacked together to make

the number of good stacks as large as possible.

The rest of this paper is organized as follows. Section 2 presents

the target 3D memory and the proposed BISR architecture; then

the corresponding pre-bond and post-bond test/repair flow is de-

tailed. Section 3 describes the proposed die matching algorithm,

which is essential to make the spare sharing effective. Experi-

mental results on area costs of the proposed BISR and the yield

gain brought by cross-die spares are shown in Section 4. Finally,Section 5 concludes this paper.

978-1-4577-2081-9/12/$26.00 2012 IEEE

7/28/2019 06212621

2/4

' %,65 $UFKLWHFWXUH

7KH 7DUJHW ' 0HPRU\

The target 3D memory in this paper is shown in Figure 1, inwhich the bottom die is assumed to be a logic die, and several

memory dies are stacked on it. In a typical paradigm, the logic

die can be a processor; the memory dies can be either SRAM

or DRAM. The interconnects between dies are implemented by

TSVs; external I/Os are assumed to be located on the bottom

logic die.

)LJXUH The target 3D memory in a logic-memory stack.

The access of the memory is controlled by Die 0, which generates

memory enable signals and broadcasts control signals to memory

dies. Only one memory die is enabled and responds to the control

signals at a time, since address and data buses are shared. Each

memory die consists of a main memory which is partitioned into

several banks, as well as a small amount of spare memory (not

shown in Figure 1).

' %,65

Figure 2 shows the proposed BISR architecture for the memory

dies in a logic-memory stack. In each memory die, there is a ded-

icated local Built-In Self-Test (BIST) and Built-In Redundancy

Analysis (BIRA). The BIST is responsible for generating test pat-

terns for the main and spare memories on the die and comparing

test responses with expected results to locate fault sites. Based on

the test results from the BIST, the BIRA can perform a built-in

RA algorithm and determine how many and what types of spares

are required to repair the faults in the main memory. The Re-

pair Sig. registers are used to temporarily store repair signaturesgenerated by the BIRA, which indicate the addresses of fault lo-

cations and the required types of spares to repair such faults. The

signatures are shifted out after the testing is finished. The added

testing-related hardware is identical for each memory die, and is

independent of which layer the memory is stacked at.

On the bottom die (Die 0), there is a Global Spare Assignment

Unit (GSAU), which receives repair signatures from memory dies

and assign spares globally. The scope of GSAU covers the entire

memory stack, and it is aware of which dies have spares available

and vice versa, since all repair signatures generated by local BI-

RAs are transferred to GSAU after testing. Therefore, the GSAU

can allocate spares on all memory dies to repair faults across dies,that is, using one dies spares to fix another dies faults is allowed.

It should be noted that we assume the main and spare memories

in every memory die meet the timing specification. Hence, us-

ing spares on a die to repair faults on another die does not intro-

duce timing problems. This assumption is reasonable, because all

memory dies in the 3D IC must meet the specified timing spec.in order to enable random access.

)LJXUH The proposed 3D BISR architecture.

The function of Address Remapping Unit (ARU) is to remap

faulty addresses to spare addresses according to the spare assign-

ment results from GSAU, and hence the faulty memory can be

repaired. The Remap Test Unit (RTU) is used to test the memory

after address remapping, to check whether the remapping is cor-

rect. Another purpose of this RTU test is to access memory dies

from Die 0, so that the faults on TSVs that serve as address and

data buses, and other control signals can be detected.

7HVW )ORZ

(a) Pre-bond

(b) Post-bond

)LJXUH Pre- and post-bond test/repair flow.

Based on the BISR architecture presented above, this sub-section

presents a corresponding pre-bond and post-bond test/repair flow,

which tries to maximize the yield of stacked memories. Fig-

ure 3 (a) shows the pre-bond test flow, which is a typical memory

test flow with BIRA. The BIST and BIRA cooperates to classify

memory dies into Good Dies and Bad Dies. Since in our pro-

posed scheme, spares can be shared between dies, a die is con-

sidered as a bad die only if the faults on itself cannot be repairedby all spares in the memory stack. For example, if a memory die

7/28/2019 06212621

3/4

contains 4 spare rows, and the number of memory dies in a stack

is set to 2, then a die will be considered as a bad die only if its

faults cannot be repaired by 4 2 = 8 spare rows.

Note that some of the Good Dies here are only possibly re-pairable, because they may require spares from other dies to be

repaired. We propose an algorithm that can select which dies

should be bonded together to efficiently share spares across dies

and maximize the yield of memory stacks, which will be detailed

in Section 3.

After pre-bond testing and the dies are stacked together, a post-

bond test flow is performed, as shown in Figure 3 (b). This post-

bond test is to detect the faults introduced by 3D process steps,

such as wafer thinning, bonding, etc. During the post-bond test-

ing, all memory dies are tested in parallel, and if there is any

memory die that is irreparable, then the test operations stop. On

the other hand, if all memory dies are still Good Dies after all ofthe BIST circuits finish the testing, the GSAU allocates spares to

repair faults according to the repair signatures from every mem-

ory die. If the amount of requested spares is more than the total

amount of available spares in the memory stack, the 3D mem-

ory is considered irreparable; otherwise the address remapping

configuration is stored, and an optional test can be performed to

check whether the remapping is successful as well as to test the

faults on TSVs that interconnect logic and memory dies.

'LH 0DWFKLQJ $OJRULWKP

)LJXUH Optimization flow of the proposed die matching algorithm.

After pre-bond testing, a die matching algorithm is required to

select which dies should be bonded together, so that the spares in

different dies can be shared efficiently and the stack yield can be

maximized. The optimization flow of the proposed die matching

algorithm is shown in Figure 4.

A two-valued vector (# spare rows, # spare cols) is attached to

each die, which represents the remaining spares on itself after

pre-bond testing. The input of the algorithm is a set of dies withtheir corresponding two-valued vectors, and a pre-defined num-

ber of dies that should be in a stack, k; the output is a set of die

stacks, each having two non-negative values in its overall two-

valued vector, which is defined as the sum of all individual vec-

tors of the dies in a stack.

In the beginning of the optimization flow, the dies are catego-

rized into 4 bins, according to their two-valued vectors. Bin1

contains the dies which have both spare rows and columns avail-

able (the amount is indicated by positive values); Bin2 contains

the dies which have only spare rows available, and need spare

columns from other dies to be repaired (the required amount is

indicated by a negative value); Bin3 contains the dies which have

only spare columns available, and need spare rows from other

dies to be repaired; Bin4 contains the dies which have run out of

all spares on themselves, and need both spare rows and columns

from other dies to be repaired.

The brief concept of the flow in Figure 4 is to always keepthe resulting overall vector consisting of positive values on each

step. Whenever a resulting vector consists of negative values,

the newly stacked die is removed, and a die from another bin is

stacked according to the previously obtained vector. For exam-

ple, if the previous vector is (+, -), indicating that the vector has

a positive value in the first position and a negative value in the

second, then a die from Bin3, which has a vector of (-, +) will be

stacked; if the previous vector is (-, -), then a die from Bin1 with

a vector of (+, +) will be stacked, and so on. The choice is made

based on an attempt to neutralize the negative values. After the

neutralization, the algorithm tries to stack a Bin4 die.

In Figure 4, the circle, Stack a Bin2 die & a Bin3 die, representsthe process that exhaustively searches for a combination of Bin2

and Bin3 dies, so that the two-die combination results in an over-

all vector which is (+, +). This process is performed when the

algorithm needs to stack a Bin1 die but Bin1 is already empty.

The matching process stops when Bin1 is empty, and there is no

Bin2-Bin3 combination that has a (+, +) vector.

([SHULPHQWDO 5HVXOWV

$UHD &RVW RI %,65

We have implemented an example design, which is a three-diestack, with a logic die at the bottom (Die 0), and two SRAM dies

(Dies 1 and 2) stacked on it, to evaluate the area cost of the pro-

posed 3D BISR. Each memory die consists of a main memory of

8k x 64, and three spare columns.

The area result is listed in Table 1. For each memory die, the area

introduced by the 3D BISR occupies 2.3% (excluding the area

of spares). Compared to the entire logic-memory stack, the 3D

BISR hardware represents approximately 2.43%.

The area of Die 0 (the logic die) is set to the same value as the

memory dies. Although this value is unrealistically small com-

pared to a real processor, it is sufficient to give us an estimate of

the BISR area overhead. Since the BISR hardware does not in-crease with respect to the area of the logic die, it is expected that

7/28/2019 06212621

4/4

the area overhead will be smaller if a more realistic large pro-

cessor die is adopted. Similarly, if the size of the main memory

increases, the area percentage occupied by the BIST and BIRA

will be smaller.

Functional Circuits GSAU ARU RTU Area

(m2) (m2) (m2) (m2) Overhead

Die 0 1.78M 20k 7k 21k 2.70%

Main Memory Spares BIST BIRA Area

(m2) (m2) (m2) (m2) Overhead

Die 1 1.78M 410k 39k 2k 2.30%

Die 2 1.78M 410k 39k 2k 2.30%

Total area overhead 2.43%

7DEOH Area costs of the proposed 3D BISR in 0.13m technology.

06212621

Documents