06212621

Upload: raffi-sk

Post on 03-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 06212621

    1/4

    ',& %,65 IRU 6WDFNHG 0HPRULHV 8VLQJ &URVV'LH 6SDUHV

    Chun-Chuan Chi1,2 Yung-Fa Chou2 Ding-Ming Kwai2 Yu-Ying Hsiao2

    Cheng-Wen Wu1,2 Yu-Tsao Hsing3 Li-Ming Denq3 Tsung-Hsiang Lin3

    1 Department of

    Electrical Engineering

    National Tsing-Hua University

    Hsinchu, Taiwan

    {ccchi, cww}@larc.ee.nthu.edu.tw

    2 Information and Communications

    Research Laboratories

    Industrial Technology Research Institute

    Hsinchu, Taiwan

    {yfchou, dmkwai, yuyinghsiao}@itri.org.tw

    3 HOY Technologies

    Hsinchu, Taiwan

    {john.hsing, taros.denq,

    sean.lin}@hoy-tech.com

    $EVWUDFW

    ' ,&V EDVHG RQ 7KURXJK6LOLFRQ9LDV 769V HQDEOH WKH VWDFNLQJ RI ORJLF DQG PHPRU\ GLHV WR PDQXIDFWXUH FKLSV ZLWK KLJKHU SHU

    IRUPDQFH ORZHU SRZHU DQG VPDOOHU IRUP IDFWRU 7R LPSURYH WKH \LHOG RI WKH PHPRU\ GLHV LQ ' ,&V WKLV SDSHU SURSRVHV D %XLOW,Q

    6HOI5HSDLU %,65 DUFKLWHFWXUH ZKLFK DOORZV WKH VKDULQJ RI VSDUHV EHWZHHQ GLIIHUHQW OD\HUV RI GLHV 7KH FRUUHVSRQGLQJ SUHERQG

    EHIRUH WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU DQG SRVWERQG DIWHU WKH PHPRU\ GLHV DUH ERQGHG WRJHWKHU WHVW RZ LV SUHVHQWHG DV

    ZHOO ,Q RUGHU WR PD[LPL]H WKH \LHOG JDLQ LQWURGXFHG E\ WKH FURVVGLH VSDUHV D GLH PDWFKLQJ DOJRULWKP LV SURSRVHG WR GHWHUPLQH ZKLFK

    GLHV VKRXOG EH VWDFNHG WRJHWKHU VR WKDW WKH VSDUH VKDULQJ FDQ EH PRVW HIFLHQW ([SHULPHQWDO UHVXOWV VKRZ WKDW WKH DUHD RYHUKHDG

    RI WKH SURSRVHG %,65 FLUFXLW LV RQO\ ZKLFK FDQ EH VPDOOHU LI ODUJHU ORJLF DQG PHPRU\ GLHV DUH DGRSWHG DQG WKH \LHOG JDLQ

    DFKLHYHG E\ FURVVGLH VSDUH VKDULQJ FDQ EH XS WR

    ,QWURGXFWLRQ

    Through Silicon Via (TSV) is an emerging process technology

    which can provide inter-die connections through silicon sub-

    strate. TSVs are manufactured by drilling through a silicon sub-

    strate and filling the holes with metal, such as copper or tungsten,

    so that they can provide high-density, low-latency, and low-power

    vertical interconnects between dies. [16].

    TSVs enable three-dimensional ICs (3D ICs) that integrate mul-

    tiple dies into a single chip. Due to the shorter vertical inter-die

    connections, 3D ICs can offer benefits like higher performance,lower power, and smaller form factor. In addition, 3D ICs en-

    able heterogeneous integration, allowing each individual die to

    be manufactured by different process technologies. One of the

    most promising 3D integration paradigms is to stack processor

    and memory dies together, which is effective in addressing the

    memory wall problem that limits the processor performance

    [714].

    Recently, several techniques are proposed to improve the yield of

    3D-stacked memories. [15] presents a redundancy scheme that

    can be shared between different memory dies, by means of which

    the yield of memory stacks can be increased. [16] proposes a

    die matching algorithm to select which memory dies should bestacked together to maximize the stack yield, under the assump-

    tion that the redundancy can be shared between different memory

    dies. The proposed algorithm only addresses two-die stacks. [17]

    tries to combine two bad dies into a good stack, so that the amount

    of memory products that can be shipped is increased. However,

    prior works lack in discussion of details of BISR architectures.

    This paper focuses on the repair of memory dies in processor-

    memory stacks. We propose a 3D BISR architecture, which

    adopts a cross-die spare scheme that allows the sharing of spares

    among memory dies. The hardware is identical for each mem-

    ory die, independent of the layer it is located at. A pre-bond

    and post-bond test/repair flow is presented as well, based on theproposed 3D BISR to improve the yield of memory stacks. In

    order to share the spares efficiently between dies, we propose a

    die matching algorithm, which uses heuristics, and therefore can

    quickly determine which dies should be stacked together to make

    the number of good stacks as large as possible.

    The rest of this paper is organized as follows. Section 2 presents

    the target 3D memory and the proposed BISR architecture; then

    the corresponding pre-bond and post-bond test/repair flow is de-

    tailed. Section 3 describes the proposed die matching algorithm,

    which is essential to make the spare sharing effective. Experi-

    mental results on area costs of the proposed BISR and the yield

    gain brought by cross-die spares are shown in Section 4. Finally,Section 5 concludes this paper.

    978-1-4577-2081-9/12/$26.00 2012 IEEE

  • 7/28/2019 06212621

    2/4

    ' %,65 $UFKLWHFWXUH

    7KH 7DUJHW ' 0HPRU\

    The target 3D memory in this paper is shown in Figure 1, inwhich the bottom die is assumed to be a logic die, and several

    memory dies are stacked on it. In a typical paradigm, the logic

    die can be a processor; the memory dies can be either SRAM

    or DRAM. The interconnects between dies are implemented by

    TSVs; external I/Os are assumed to be located on the bottom

    logic die.

    )LJXUH The target 3D memory in a logic-memory stack.

    The access of the memory is controlled by Die 0, which generates

    memory enable signals and broadcasts control signals to memory

    dies. Only one memory die is enabled and responds to the control

    signals at a time, since address and data buses are shared. Each

    memory die consists of a main memory which is partitioned into

    several banks, as well as a small amount of spare memory (not

    shown in Figure 1).

    ' %,65

    Figure 2 shows the proposed BISR architecture for the memory

    dies in a logic-memory stack. In each memory die, there is a ded-

    icated local Built-In Self-Test (BIST) and Built-In Redundancy

    Analysis (BIRA). The BIST is responsible for generating test pat-

    terns for the main and spare memories on the die and comparing

    test responses with expected results to locate fault sites. Based on

    the test results from the BIST, the BIRA can perform a built-in

    RA algorithm and determine how many and what types of spares

    are required to repair the faults in the main memory. The Re-

    pair Sig. registers are used to temporarily store repair signaturesgenerated by the BIRA, which indicate the addresses of fault lo-

    cations and the required types of spares to repair such faults. The

    signatures are shifted out after the testing is finished. The added

    testing-related hardware is identical for each memory die, and is

    independent of which layer the memory is stacked at.

    On the bottom die (Die 0), there is a Global Spare Assignment

    Unit (GSAU), which receives repair signatures from memory dies

    and assign spares globally. The scope of GSAU covers the entire

    memory stack, and it is aware of which dies have spares available

    and vice versa, since all repair signatures generated by local BI-

    RAs are transferred to GSAU after testing. Therefore, the GSAU

    can allocate spares on all memory dies to repair faults across dies,that is, using one dies spares to fix another dies faults is allowed.

    It should be noted that we assume the main and spare memories

    in every memory die meet the timing specification. Hence, us-

    ing spares on a die to repair faults on another die does not intro-

    duce timing problems. This assumption is reasonable, because all

    memory dies in the 3D IC must meet the specified timing spec.in order to enable random access.

    )LJXUH The proposed 3D BISR architecture.

    The function of Address Remapping Unit (ARU) is to remap

    faulty addresses to spare addresses according to the spare assign-

    ment results from GSAU, and hence the faulty memory can be

    repaired. The Remap Test Unit (RTU) is used to test the memory

    after address remapping, to check whether the remapping is cor-

    rect. Another purpose of this RTU test is to access memory dies

    from Die 0, so that the faults on TSVs that serve as address and

    data buses, and other control signals can be detected.

    7HVW )ORZ

    (a) Pre-bond

    (b) Post-bond

    )LJXUH Pre- and post-bond test/repair flow.

    Based on the BISR architecture presented above, this sub-section

    presents a corresponding pre-bond and post-bond test/repair flow,

    which tries to maximize the yield of stacked memories. Fig-

    ure 3 (a) shows the pre-bond test flow, which is a typical memory

    test flow with BIRA. The BIST and BIRA cooperates to classify

    memory dies into Good Dies and Bad Dies. Since in our pro-

    posed scheme, spares can be shared between dies, a die is con-

    sidered as a bad die only if the faults on itself cannot be repairedby all spares in the memory stack. For example, if a memory die

  • 7/28/2019 06212621

    3/4

    contains 4 spare rows, and the number of memory dies in a stack

    is set to 2, then a die will be considered as a bad die only if its

    faults cannot be repaired by 4 2 = 8 spare rows.

    Note that some of the Good Dies here are only possibly re-pairable, because they may require spares from other dies to be

    repaired. We propose an algorithm that can select which dies

    should be bonded together to efficiently share spares across dies

    and maximize the yield of memory stacks, which will be detailed

    in Section 3.

    After pre-bond testing and the dies are stacked together, a post-

    bond test flow is performed, as shown in Figure 3 (b). This post-

    bond test is to detect the faults introduced by 3D process steps,

    such as wafer thinning, bonding, etc. During the post-bond test-

    ing, all memory dies are tested in parallel, and if there is any

    memory die that is irreparable, then the test operations stop. On

    the other hand, if all memory dies are still Good Dies after all ofthe BIST circuits finish the testing, the GSAU allocates spares to

    repair faults according to the repair signatures from every mem-

    ory die. If the amount of requested spares is more than the total

    amount of available spares in the memory stack, the 3D mem-

    ory is considered irreparable; otherwise the address remapping

    configuration is stored, and an optional test can be performed to

    check whether the remapping is successful as well as to test the

    faults on TSVs that interconnect logic and memory dies.

    'LH 0DWFKLQJ $OJRULWKP

    )LJXUH Optimization flow of the proposed die matching algorithm.

    After pre-bond testing, a die matching algorithm is required to

    select which dies should be bonded together, so that the spares in

    different dies can be shared efficiently and the stack yield can be

    maximized. The optimization flow of the proposed die matching

    algorithm is shown in Figure 4.

    A two-valued vector (# spare rows, # spare cols) is attached to

    each die, which represents the remaining spares on itself after

    pre-bond testing. The input of the algorithm is a set of dies withtheir corresponding two-valued vectors, and a pre-defined num-

    ber of dies that should be in a stack, k; the output is a set of die

    stacks, each having two non-negative values in its overall two-

    valued vector, which is defined as the sum of all individual vec-

    tors of the dies in a stack.

    In the beginning of the optimization flow, the dies are catego-

    rized into 4 bins, according to their two-valued vectors. Bin1

    contains the dies which have both spare rows and columns avail-

    able (the amount is indicated by positive values); Bin2 contains

    the dies which have only spare rows available, and need spare

    columns from other dies to be repaired (the required amount is

    indicated by a negative value); Bin3 contains the dies which have

    only spare columns available, and need spare rows from other

    dies to be repaired; Bin4 contains the dies which have run out of

    all spares on themselves, and need both spare rows and columns

    from other dies to be repaired.

    The brief concept of the flow in Figure 4 is to always keepthe resulting overall vector consisting of positive values on each

    step. Whenever a resulting vector consists of negative values,

    the newly stacked die is removed, and a die from another bin is

    stacked according to the previously obtained vector. For exam-

    ple, if the previous vector is (+, -), indicating that the vector has

    a positive value in the first position and a negative value in the

    second, then a die from Bin3, which has a vector of (-, +) will be

    stacked; if the previous vector is (-, -), then a die from Bin1 with

    a vector of (+, +) will be stacked, and so on. The choice is made

    based on an attempt to neutralize the negative values. After the

    neutralization, the algorithm tries to stack a Bin4 die.

    In Figure 4, the circle, Stack a Bin2 die & a Bin3 die, representsthe process that exhaustively searches for a combination of Bin2

    and Bin3 dies, so that the two-die combination results in an over-

    all vector which is (+, +). This process is performed when the

    algorithm needs to stack a Bin1 die but Bin1 is already empty.

    The matching process stops when Bin1 is empty, and there is no

    Bin2-Bin3 combination that has a (+, +) vector.

    ([SHULPHQWDO 5HVXOWV

    $UHD &RVW RI %,65

    We have implemented an example design, which is a three-diestack, with a logic die at the bottom (Die 0), and two SRAM dies

    (Dies 1 and 2) stacked on it, to evaluate the area cost of the pro-

    posed 3D BISR. Each memory die consists of a main memory of

    8k x 64, and three spare columns.

    The area result is listed in Table 1. For each memory die, the area

    introduced by the 3D BISR occupies 2.3% (excluding the area

    of spares). Compared to the entire logic-memory stack, the 3D

    BISR hardware represents approximately 2.43%.

    The area of Die 0 (the logic die) is set to the same value as the

    memory dies. Although this value is unrealistically small com-

    pared to a real processor, it is sufficient to give us an estimate of

    the BISR area overhead. Since the BISR hardware does not in-crease with respect to the area of the logic die, it is expected that

  • 7/28/2019 06212621

    4/4

    the area overhead will be smaller if a more realistic large pro-

    cessor die is adopted. Similarly, if the size of the main memory

    increases, the area percentage occupied by the BIST and BIRA

    will be smaller.

    Functional Circuits GSAU ARU RTU Area

    (m2) (m2) (m2) (m2) Overhead

    Die 0 1.78M 20k 7k 21k 2.70%

    Main Memory Spares BIST BIRA Area

    (m2) (m2) (m2) (m2) Overhead

    Die 1 1.78M 410k 39k 2k 2.30%

    Die 2 1.78M 410k 39k 2k 2.30%

    Total area overhead 2.43%

    7DEOH Area costs of the proposed 3D BISR in 0.13m technology.