spintronics-based nonvolatile logic-in-memory architecture ... · spintronics-based nonvolatile...

6
Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing Paradigm Takahiro Hanyu *1) , Daisuke Suzuki *2) , Naoya Onizawa *3) , Shoun Matsunaga *4) , Masanori Natsui *1) , and Akira Mochizuki *2) *1) Research Institute of Electrical Communication, Tohoku University, Japan *2) Center for Innovative Integrated Electronic Systems, Tohoku University, Japan *3) Frontier Research Institute for Interdisciplinary Science, Tohoku University, Japan *4) AC Technologies Co., Ltd., Japan Abstract—novel logic-LSI architecture, called “spintronics- based nonvolatile logic-in-memory (NV-LIM) architecture,” where nonvolatile spintronic storage elements are distributed over a logic-circuit plane, is proposed as a promising candidate to overcome performance wall and power wall due to the present CMOS-only-based logic-LSIs. Some concrete design examples based on the NV-LIM architecture are demonstrated and their usefulness is discussed in comparison with the corresponding CMOS-only-based realization. I. INTRODUCTION In the Internet of things (IoT) era, it is strongly necessary to achieve ultra-low-power computer architecture, while still increasing high-performance computing power. However, in the present CMOS-only-based VLSI, communication bottleneck between memory and logic modules inside a VLSI chip, as well as increasing standby power dissipation and device-characteristic variation effect, limits solving the above problems [1]. Fig. 1(a) shows a conventional logic-LSI architecture, where logic and memory modules are separately implemented together and these modules are connected each other through global interconnections. Even if the device feature size is scaled down in accordance with the semiconductor technology roadmap, the global interconnections are not shorten, rather than are getting longer, which resulting in longer delay and higher power dissipation due to interconnections. In addition, since on-chip memory modules are “volatile,” they always consume the static power to maintain the stored data. On the other hand, several emerging storage devices are getting developed to overcome the weak points of ordinary semiconductor memories; dynamic random-access memory (DRAM) and static random-access memory (SRAM). Especially, magnetoresistive random-access memory (MRAM) that has already undergone a few incarnations, is now converging on a scheme for upending the memory business. Spin-transfer torque (STT) MRAM promises speed and reliability comparable to that of SRAM, where SRAM is the quick-access memory embedded inside microprocessors, along with the “non-volatility” of flash, the storage of smartphones and other portables [2,3]. Since magnetic tunnel junction (MTJ) device, the key element of MRAM, is easily distributed over a logic-circuit plane by using a three-dimensional (3D) stack structure as shown in Fig. 1(b), performance degradation due to intra-chip global wires could be drastically mitigated, which leads to a high-performance, ultra-low-power and highly reliable (or highly resilient) logic LSIs. One of the most useful methods to cut off leakage power is to use power gating. Fig. 2(a) shows a time chart of power dissipation in conventional logic LSI without power gating. If the power gating is applied in the conventional logic LSI, a part of standby power can be eliminated, but two additional operations, “back-up” and “boost-up” procedures, must be performed before and after applying the power gating, respectively, which may discourage to apply the power-gating technique as shown in Fig. 2(b). In contrast, the use of non- volatility is a good combination of applying the power gating, which ideally eliminates the wasted power dissipation as shown in Fig. 2(c). Fig. 3(a) shows nonvolatile VLSI processor architecture, where high-density and high-speed MRAMs and nonvolatile flip-flops are used to simply realize nonvolatile logic LSIs [4,5]. When you could merge a part of nonvolatile on-chip memory into logic-circuit modules as shown in Fig. 3(b), it could improve the performance of the nonvolatile logic LSI. The use of spintronics-based nonvolatile logic-in-memory (NV-LIM) architecture makes not only performance improved, but also reliability enhanced in the future logic LSIs. In the following section, some concrete design examples using spintronics- based NV-LIM architecture such as nonvolatile field- programmable gate array (FPGA) and its basic components [6- 11], nonvolatile ternary content-addressable memory (TCAM) [12-14] are demonstrated, and their advantages in terms of reliability enhancement as well as power reduction are discussed. II. DESIGN OF A SPINTRONICS-BASED NONVOLATILE FIELD- PROGRAMMABLE GATE ARRAYS (FPGAS) Two major issues in the present FPGAs are to reduce power dissipation and to realize compactly. The use of NV-LIM architecture makes it possible to completely cut off the power supply during the sleep mode of VLSI chip, which results in eliminating static power dissipation. In this section, some 1006 978-3-9815370-4-8/DATE15/ c 2015 EDAA

Upload: dangmien

Post on 26-Jan-2019

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and

Highly Reliable VLSI Computing Paradigm

Takahiro Hanyu*1), Daisuke Suzuki*2), Naoya Onizawa*3), Shoun Matsunaga*4), Masanori Natsui*1), and Akira Mochizuki*2)

*1) Research Institute of Electrical Communication, Tohoku University, Japan *2) Center for Innovative Integrated Electronic Systems, Tohoku University, Japan

*3) Frontier Research Institute for Interdisciplinary Science, Tohoku University, Japan *4) AC Technologies Co., Ltd., Japan

Abstract—novel logic-LSI architecture, called “spintronics-

based nonvolatile logic-in-memory (NV-LIM) architecture,” where nonvolatile spintronic storage elements are distributed over a logic-circuit plane, is proposed as a promising candidate to overcome performance wall and power wall due to the present CMOS-only-based logic-LSIs. Some concrete design examples based on the NV-LIM architecture are demonstrated and their usefulness is discussed in comparison with the corresponding CMOS-only-based realization.

I. INTRODUCTION In the Internet of things (IoT) era, it is strongly necessary to

achieve ultra-low-power computer architecture, while still increasing high-performance computing power. However, in the present CMOS-only-based VLSI, communication bottleneck between memory and logic modules inside a VLSI chip, as well as increasing standby power dissipation and device-characteristic variation effect, limits solving the above problems [1]. Fig. 1(a) shows a conventional logic-LSI architecture, where logic and memory modules are separately implemented together and these modules are connected each other through global interconnections. Even if the device feature size is scaled down in accordance with the semiconductor technology roadmap, the global interconnections are not shorten, rather than are getting longer, which resulting in longer delay and higher power dissipation due to interconnections. In addition, since on-chip memory modules are “volatile,” they always consume the static power to maintain the stored data.

On the other hand, several emerging storage devices are getting developed to overcome the weak points of ordinary semiconductor memories; dynamic random-access memory (DRAM) and static random-access memory (SRAM). Especially, magnetoresistive random-access memory (MRAM) that has already undergone a few incarnations, is now converging on a scheme for upending the memory business. Spin-transfer torque (STT) MRAM promises speed and reliability comparable to that of SRAM, where SRAM is the quick-access memory embedded inside microprocessors, along with the “non-volatility” of flash, the storage of smartphones and other portables [2,3]. Since magnetic tunnel junction (MTJ) device, the key element of MRAM, is easily distributed

over a logic-circuit plane by using a three-dimensional (3D) stack structure as shown in Fig. 1(b), performance degradation due to intra-chip global wires could be drastically mitigated, which leads to a high-performance, ultra-low-power and highly reliable (or highly resilient) logic LSIs.

One of the most useful methods to cut off leakage power is to use power gating. Fig. 2(a) shows a time chart of power dissipation in conventional logic LSI without power gating. If the power gating is applied in the conventional logic LSI, a part of standby power can be eliminated, but two additional operations, “back-up” and “boost-up” procedures, must be performed before and after applying the power gating, respectively, which may discourage to apply the power-gating technique as shown in Fig. 2(b). In contrast, the use of non-volatility is a good combination of applying the power gating, which ideally eliminates the wasted power dissipation as shown in Fig. 2(c).

Fig. 3(a) shows nonvolatile VLSI processor architecture, where high-density and high-speed MRAMs and nonvolatile flip-flops are used to simply realize nonvolatile logic LSIs [4,5]. When you could merge a part of nonvolatile on-chip memory into logic-circuit modules as shown in Fig. 3(b), it could improve the performance of the nonvolatile logic LSI. The use of spintronics-based nonvolatile logic-in-memory (NV-LIM) architecture makes not only performance improved, but also reliability enhanced in the future logic LSIs. In the following section, some concrete design examples using spintronics-based NV-LIM architecture such as nonvolatile field- programmable gate array (FPGA) and its basic components [6-11], nonvolatile ternary content-addressable memory (TCAM) [12-14] are demonstrated, and their advantages in terms of reliability enhancement as well as power reduction are discussed.

II. DESIGN OF A SPINTRONICS-BASED NONVOLATILE FIELD-PROGRAMMABLE GATE ARRAYS (FPGAS)

Two major issues in the present FPGAs are to reduce power dissipation and to realize compactly. The use of NV-LIM architecture makes it possible to completely cut off the power supply during the sleep mode of VLSI chip, which results in eliminating static power dissipation. In this section, some

1006978-3-9815370-4-8/DATE15/ c©2015 EDAA

Page 2: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

concrete design examples are demonstrated that the use of spintronics-based NV-LIM architecture makes basic components in FPGA compact with maintaining the same functionality as that of the conventional CMOS-only-based realization.

A. Compact programmable switch using spintronics-based NV-LIM architecture Fig. 4 shows a schematic diagram of the nonvolatile

programmable switch (NVPS) that is used to route data from/to the logic block. In order to reduce the effective area of the routing block, control transistors are shared among all the NVPSs. The output of nonvolatile storage element (Q) is used to turn on/off like NMOS pass switch. The nonvolatile storage element consists of a sense amplifier using two inverters and two local write control transistors together with two perpendicular MTJ (p-MTJ) devices, where routing information is programmed in a complementary fashion. The sense amplifier is used to read the stored state M and to keep it as Q during the power-on state without steady current. Once configuration data are programmed into the NVPSs, they never change, which eliminates additional control transistor.

Fig. 5 compares the areas of three typical NVPSs. The area of the conventional NVPS [15] becomes large because a bi-directional write current must be applied simultaneously to the series connected p-MTJ devices. By utilizing a two-step MTJ configuration, the area of the conventional NVPS [16] becomes smaller than that of the previous one. However, it requires additional control transistors to operate as the NVPS. Moreover, it is somewhat difficult to explore design space between reading and writing properties because the reading and writing functions are merged into the same circuit. In contrast, the proposed NVPS is implemented in the smallest area by removing all the wasted functions and sharing a portion of the read/write-control transistors. Moreover, we can optimize reading properties and writing one of the proposed circuit independently using pseudo three-terminal structure. Note that the effective voltage applied to a p-MTJ device is lower than 0.63 V, which is much lower than both breakdown voltage and back-hopping one[17,18]. While the write current IM (or I'M) when a p-MTJ device is an anti-parallel state is decreased as the tunnel magneto-resistance ratio (TMR) is increased, the proposed NVPS can drive sufficient write current even if the TMR reaches 200%, which is sufficiently high for reading (the applied voltage to a p-MTJ device is lower than 0.69 V).

B. Design of a compact 6-input logic element using spintronics-based NV-LIM architecture Fig. 6 shows a schematic diagram of the 2-input nonvolatile

logic element (NV-LE), where multi-input NV-LE can be easily extended if the 2-input NMOS multiplier is replaced to the multi-input one together with inserting the corresponding nonvolatile configuration cells. Any 2-input logic functions are stored into four configuration cells (Cell1, Cell2, Cell3, Cell4) in “domain-wall motion” (DWM) configuration array, where the DWM device is a spintronics device with three terminals. When EN=1, the PMOS pull-up transistor MP and read-control transistor MSR are turned on and the combinational logic operation is performed using external logic inputs X= (X1, X2)

as by the use of voltage division [9,10]. In the write operation, word lines (WLs) and bit lines (BLs) are activated and bi-directional write currents are applied to the corresponding DWM devices. Although an additional terminal is required for the DWM device, there is no area penalty because the DWM devices are stacked over the CMOS plane, and they share one write-control transistor (MSW) amongst themselves.

Table 1 summarizes a comparison of four 6-input NV-LEs; a CMOS-based one using STT-MTJ-based nonvolatile SRAM (NV-SRAM) cells [19], a differential-pair-based one using STT-MTJ devices [20], a differential-pair-based one using a DW shift register [21], and the proposed one. All the gate length of MOS transistors are set to the minimum feature size. The gate width NMOS transistors except for write-control transistors is set to WMIN, and that of PMOS transistors is set to 2WMIN. MTJ device parameters for evaluation are summarized in Table 2 that are estimated from measurement results. Since each NV-SRAM cell has two large write-control transistors and two inverters, the effective area of the CMOS-based NV-LE is the largest. In contrast to the CMOS-based one, the differential-pair-based NV-LEs are more compact. The effective area of the proposed NV-LE is the smallest owing to its simple circuitry and the gate width of the write-control transistor in each configuration cell is small. The number of configuration cycles of each STT-MTJ-based NV-LE is large since write current must be applied via high MTJ resistance and their configuration cells must be programmed in serial. In the DW-shift-register-based NV-LE, no write-control transistor is included in each configuration cell, but a large number of cycles is required for shifting data due to its serial configuration scheme. In contrast, the proposed NV-LE demonstrates the fastest bit-parallel reconfiguration capability. As a result, 2-ns 64-bit-parallel circuit reconfiguration with 67% lesser area than a conventional CMOS-based alternative is achieved.

III. DESIGN OF HIGHLY RELIABLE SPINTRONICS-BASED NONVOLATILE TCAMS

In this section, highly reliable TCAMs with maintaining non-volatility are implemented using spintronics-based NV-LIM architecture.

A. Highly reliable spintronics-based nonvolatile TCAM cell circuit with complementary 5T-4MTJ structure Fig. 7(a) shows the proposed spintronics-based nonvolatile

TCAM cell circuit with five MOS transistors and four MTJ devices (5T-4MTJ).By the use of complementary cell-circuit structure, the output voltages at HIT (MISS) cases are determined cross points between current-characteristic curve IHIT (IMISS) and its complementary load current-characteristic curve I'HIT (I'MISS), respectively, as shown in Fig. 7(b), which enlarges Vco and ML-voltage swing compared to a conventional 6T-2MTJ cell circuit [22]. In order to compactly realize the complementary cell circuit, MOS transistors and wires except diode-connected transistor M3 and ML inside a cell are shared. Vertical lines are used as bit-lines (BL1/BL2) in write mode, and search-lines (SL-bar/SL) in search mode. Horizontal lines are used as plate-lines (PL1/PL2) in write mode, and power/ground-lines (VDD/GND) in search mode.

Identify applicable sponsor/s here. If no sponsors, delete this text box (sponsors).

2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1007

Page 3: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

Two PL drivers are used to generate voltage signals of PL1 and PL2, or VDD and GND. As a result, the proposed cell structure is realized compactly with area reduction of 30% compared with that of the conventional 6T-2MTJ cell circuit as shown in Fig. 8.

Fig. 9(a) shows simulated waveforms of a 72-bit parallel search operation in the conventional 6T-2MTJ-cell-based TCAM word circuit. Because of small sensing margin of 39mV, there are one error and four errors during detecting “HIT” and “MISS”, respectively, which results in error rate of 0.25%. In contrast, these errors are completely removed in the proposed 5T-4MTJ-cell-based TCAM word circuit with enlarged sensing margin of 204mV by using complementary cell-circuit structure as shown in Fig. 9(b). Because of more than five-times larger ML-voltage swing, the matched delay during evaluation phase is reduced from 2.1nsec to 1.3nsec in comparison with that of the conventional one.

B. Design of a soft-error tolerant asynchronous TCAM using the spintronics-based nonvolatile logic circitry Fig. 10 shows the proposed asynchronous i-th word circuit

that contains three dual-rail TCAM cells (1 < i < w) [14]. The upper part is a NAND-type word circuit and the lower part is a NOR-type word circuit. In the NAND-type word circuit, the TCAM cells are serially connected, while they are connected in parallel in the NOR-type word circuit. Each TCAM cell stores one of three states (B): “0”, “1”, and “X” that are represented by dual-rail signals.

Fig. 11 shows the proposed dual-rail TCAM cell circuit, where output feedback transistors are added to two complementary TCAM cells in order to cut off the steady current. The gate of the transistor in the NAND-type cell connects to the output of the NOR-type cell (MOUTNOR) and vice versa. As the dual-rail TCAM cell stores complementary data, one of two outputs (MOUTNOR and MOUTNAND) becomes low during searching. The low output cuts off the steady current in the other TCAM cell whose output is high. Hence, there is no steady current path, which greatly reduces the power dissipation while maintaining the functionality of the TCAM cell.

Table 3 shows performance comparisons between the synchronous and the proposed TCAMs. The cycle time is the summation of the search delay time and the precharge delay time. The search delay time of the proposed TCAM is larger than that of the synchronous TCAM because the delay time of the completion detector and the C-elements is included in the proposed TCAM. In contrast, the precharge delay time is smaller than the search delay time in the proposed TCAM, while they are fixed to be the same in the synchronous TCAM. As a result, the proposed TCAM operates at almost the same speed compared to the synchronous TCAM with an 18% energy overhead. In terms of area, as the MTJ devices are stacked on a CMOS layer, the area of the proposed TCAM cell is 83% of that of the synchronous TCAM cell. The conventional TCAM cell uses 10T soft-error tolerant SRAMs, but the stored data is still affected by particle strikes. The proposed TCAM cell uses MTJ devices whose stored data is not affected by particle strikes. In addition, the proposed

TCAM is robust against the timing variations due to the SET and can check the soft errors using the dual-rail signals, if they occur. Another advantage is non volatility of the stored data using the MTJ device that the power supply can be cut off during a standby mode in order to reduce leakage currents, while the SRAM cell used in the synchronous TCAM is volatile.

IV. CONCLUSION Some concrete design examples have been demonstrated

and some important possibilities using spintronics-based NV-LIM architecture have been pointed out in terms of a power-reduction effect, design-margin improvement with keeping compactness, and soft-error tolerance. As a future prospect, it is also important to establish the design automation environment of the spintronics-based NV-LIM architecture [23,24]. Moreover, the proposed spintronics-based NV-LIM architecture can be utilized in implementing not only digital circuits, but also analog circuits. Since analog-circuit design is sensitive in process, voltage, and temperature (PVT) variations, post-design approach based on the proposed technique has a potential advantage in the future sub-nanometer VLSI realization [25].

ACKNOWLEDGMENT A part of this research was supported by JSPS FIRST

Program and R&D for Next-Generation Information Technology of MEXT in Japan.

REFERENCES [1] T. Hanyu, D. Suzuki, A. Mochizuki, M. Natsui, N. Onizawa, T.

Sugibayashi, S. Ikeda, T. Endoh, and Hi. Ohno, IEEE IEDM, Dec. 2014 (in press).

[2] S. Ikeda, J. Hayakawa, Y. M. Lee, F. Matsukura, Y. Ohno, T. Hanyu, and H. Ohno, "Magnetic Tunnel Junctions for Spintronic Memories and Beyond," IEEE Trans. Electron Devices, vol.54, no.5, pp.991-1002, May 2007.

[3] R. Courtland, "Spin memory shows its might," IEEE Spectrum, pp.11-12, Aug. 2014.

[4] N. Sakimura, Y. Tsuji, R. Nebashi, H. Honjo, A. Morioka, K. Ishihara, K. Kinoshita, S. Fukami, S. Miura, N. Kasai, T. Endoh, H. Ohno, T. Hanyu, and T. Sugibayashi, IEEE ISSCC, pp.184-185, Feb. 2014.

[5] H. Koike, T. Ohsawa, N. Sakimura, R. Nebashi, Y. Tsuji, A. Morioka, K. Miura, H. Honjo, T. Sugibayashi, S. Ikeda, T. Hanyu, H. Ohno and T. Endoh, IEEE ASSCC 2013, pp.317-320, Nov. 2013.

[6] D. Suzuki, et al., J. Appl. Phys. vol. 115, 17B742, 2014. [7] D. Suzuki, et al., IEICE ELEX, vol.10, no.23, 20130772, Dec. 2013. [8] D. Suzuki, et al., IEEE Trans. Magn. Vol. 50, no.11, 2014 (in press). [9] D. Suzuki, Y. Lin, M. Natsui, and T. Hanyu, JJAP, vol.52, no.4, pp.

04CM04, Mar. 2013. [10] D. Suzuki, M. Natsui, A. Mochizuki, and T. Hanyu, Japanese Journal of

Applied Physics (JJAP), vol. 53, no. 4S, pp. 04EM03, Feb. 2014. [11] D. Suzuki, N. Sakimura, M. Natsui, A. Mochizuki, T. Sugibayashi, T.

Endoh, H. Ohno, and T. Hanyu, IEICE ELEX, vol.11, no.13, 20140296, June 2014.

[12] S. Matsunaga, N. Sakimura, R. Nebashi, Y. Tsuji, A. Morioka, T. Sugibayashi, S. Miura, H. Honjo, K. Kinoshita, H. Sato, S. Fukami, M. Natsui, A. Mochizuki, S. Ikeda, T. Endoh, H. Ohno, and T. Hanyu, IEEE Symp. VLSI Circuits, C106-C107, pp. 106-107, June 2013.

[13] S. Matsunaga, A. Mochizuki, T. Endoh, H. Ohno, and T. Hanyu, IEICE ELEX, vol. 11, no. 3, 20131006, March 2014.

1008 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Page 4: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

[14] N. Onizawa, S. Matsunaga, and T. Hanyu, IEEE ASYNC 2014, pp. 1-8, May 2014.

[15] W. Zhao, E. Belhaire, C. Chappert, F. Jacquet, and P. Mazoyer, Phys. Status Solidi A 205, 1373, 2008.

[16] Y. Shuto, S. Yamamoto, and S. Sugahara, J. Appl. Phys. 105, 07C933, 2009.

[17] T. Min, J. Z. Sun, R. Beach, D. Tang, and P. Wang, J. Appl. Phys. 105, 07D126, 2009.

[18] S.-C. Oh, S.-Y. Park, A. Manchon, M. Chshiev, J.-H. Han, H.-W. Lee, J.-E. Lee, K.-T. Nam, Y. Jo, Y.-C. Kong, B. Dieny, and K.-J. Lee, Nat. Phys. 5, 898, 2009.

[19] S. Yamamoto, Y. Shuto, and S. Sugahara, Jpn. J. Appl. Phys. 51, 11PB02, 2012.

[20] D. Suzuki, M. Natsui, T. Endoh, H. Ohno, and T. Hanyu, Jpn. J. Appl. Phys. 51, 04DM02, 2012.

[21] W. Zhao, D. Ravelosona, J.-O. Klein, and C. Chappert, IEEE Trans. Magn. 47, 2966, 2011.

[22] S. Matsunaga, S. Miura, H. Honjou, K. Kinoshita, S. Ikeda, T. Endoh, H. Ohno, and T. Hanyu, IEEE Symp. VLSI Circuits, pp. 44-45, June 2012.

[23] N. Sakimura, R. Nebashi, Y. Tsuji, H. Honjo, T. Sugibayashi, H. Koike, T. Ohsawa, S. Fukami, T. Hanyu, H. Ohno and T. Endoh, IEEE ISCAS, pp. 1971-1974, May 2012.

[24] M. Natsui, D. Suzuki, N. Sakimura, R. Nebashi, Y. Tsuji, A. Morioka, T. Sugibayashi, S. Miura, H. Honjo, K. Kinoshita, S. Ikeda, T. Endoh, H. Ohno, and T. Hanyu, IEEE JSSC, 2015 (in press).

[25] M. Natsui and T. Hanyu, J. Multiple-Valued Logic and Soft Computing, Vol.21, No.5-6, pp.597-608, 2013.

Figure 1: Comparison of logic-LSI architectures; (a) conventional, (b) nonvolatile logic-in-memory.

Wire

Memory

MemoryLogic

Logic

Leakage current

Transfer bottleneck

(a)

Figure 2: Combination of power-gating and nonvolatile logic techniques; (a) Conventional CPU without power gating, (b) Conventional CPU with power gating, (c)

NV-LIM CPU with power gating.

Pow

er

Time

Active Standby

Pow

er

Time

Active No standby

Back-up

Boost-up

Back-up

Boost-up

Pow

er

Time

Active No standby

Overhead using NV storages

Silicon

Logic

Magnetic Tunnel Junction (MTJ) device

CMOS layer

MTJ layer

(b)(a)

(b)

(c) Figure 3: Configuration of nonvolatile logic LSIs; (a) 1st-generation nonvolatile logic-LSI architecture, (b)

2nd-generation nonvolatile logic-LSI architecture.

1st -generation Nonvolatile Processor(a)

(b)

DRAMFlash

NV NV

Spin RAM Si

NVSRAM SP-LogicFF FFGP-Logic

GP-Logic: General-purpose logicSP-Logic: Special-purpose logic

2nd-generation Nonvolatile Processor

NV NVLIM

Spin RAM Si

NVLIMSRAM SP-LogicGP-Logic

Nonvolatile Field-Programmable Gate Array (FPGA), Nonvolatile Random Logic LSI

Nonvolatile Ternary Content-Addressable Memory (TCAM)

2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1009

Page 5: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

Figure 4: Schematic diagram of the proposed NVPS.

Figure 5: Comparison of areas in nonvolatile programmable switches.

Figure 6: Circuit diagram of the proposed two-input NV-LE.

Table 2: Device parameters of STT-MTJ and DWM devices.

Table 1: Comparison of 6-input NV-Les.

Figure 7: Proposed complementary 5T-4MTJ TCAM cell; (a) circuit diagram, (b) Icell-Vco characteristics.

CMOS-based *6)

Differential-pair-based*7)

Differential-pair-based*8) Proposed

Storage STT-MTJ device DWM device

# of transistors. 667 270 312 226# of write-control transistors / bit

(gate width)

2(3WMIN)

1(6WMIN)

0 *1)

(0)1

(2WMIN)

Area [ m2] *2) 1,410 589 760 471

Delay [ps] *2,*3) 173 177 153 198Active power at 1GHz [ W] *2,*4) 36 28 31 24

# of cycles forconfiguration 66 *5) 64 64 *1) 2

*1) Data are serially configured using DW shift register. *2) 90 nm CMOS/MTJ technologies (VDD = 1.2 V). *3) Worst delay to hold output data of the NV-LUT circuit. *4) Average power for logic operations (SEL = 1). *5) Two cycles for the SRAM configuration and 64 cycles for the MTJ one.*6) S. Yamamoto, et al., JJAP, 51, 11PB02, 2012. *7) D. Suzuki, et al., JJAP, 51, 04DM02, 2012.*8) W. Zhao, et al., IEEE Trans. Magn., 47, 2966, 2011.

STT-MTJ device(2-terminal)

DWM device(3-terminal)

RP [k ] 4 6RAP [k ]

(TMR ratio*1))9

(1.25) 24

(3.00)IP-AP [mA]*2) 0.10 0.10

IAP-P [mA]*2) 0.06 0.10

*1) TMR ratio =(RAP - RP)/RP. *2) Required current to program within 2 nsec.

RCLQ

VDD

BLSTR

RM RM’

M M’

Write-Control Transistors

Sense Amp.

Rou

ting

Trac

k

NMOS Switch

7T-2MTJ NV-Latch

STR

C

BLC

GND

EN

Shared Control Transistors

Write Current(Large)

Read Current(Small)

MW

CM

RC

Conventional *1) Proposed *3)Are

a of

NVP

S [a

.u.]

1.00 -40%0.60

Conventional *2)

2.00

*1) W. Zhao, et al., Phys. Status Solidi A 205, 1373 (2008), *2) Y. Shuto, et al., J. Appl. Phys. 105, 07C933 (2009), *3) Voltage applied to a p-MTJ device is lower than 0.63 V. Gate width of shared write-control transistor is 40 WMIN, where WMIN is the minimum gate width of NMOS transistor.

CLK

D

CLK

VDD

CLK

CLK

Cell3Y3

Cell4Y4

Cell1Y1

Cell2Y2

CLK

X1 X1

X2 X2 X2 X2

WL1

EN

ENVGND

BL

WLS

PMOS pull-up

transistor

Shared-control transistors

NMOSmultiplexer

DWMconfiguration array

WL2 WL3 WL4

BL

MSW(for write)

MSR(for read)

ML

MW4

Cross-sectional view

MTJ3

Metal layer

MP

CLK

CLK

CLK

CLK

WL4

Y4

BL

Mw4

Schematic view

*

SEL

Z1

0Q

NV-SL

DWM devicew/ controller

BL1/SL BL2/SL

PL2/GND

ML

(a)

Comparison logicIMISS IHIT

PL1/VDD

IHIT’ IMISS’

PLdriver 1

PLdriver 2

WL

WL

SE (Search enable)

SE

M2’ M1’

M1 M2

M3

MTJ1 MTJ2

MTJ2’ MTJ1’Complementarylogic

Vco

Icell

VcoVDDVco_HVco_M

Vco

(b)

IMISS

IHITIMISS’

IHIT’

1010 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Page 6: Spintronics-Based Nonvolatile Logic-in-Memory Architecture ... · Spintronics-Based Nonvolatile Logic-in-Memory Architecture Towards an Ultra-Low-Power and Highly Reliable VLSI Computing

Figure 8: Nonvolatile TCAM cell layouts; (a) 6T-2MTJ twin-cell, (b) complementary 5T-4MTJ cell.

Figure 9: Simulated waveforms of 72-bit NV- TCAM word circuits; (a) 6T-2MTJ-based, (b) 5T-4MTJ-based.

Figure 10: Dual-rail word circuit whose length is 3 bits. It consists of complementary NAND-type and NOR-

type word circuits. Figure 11: Spintronics-based TCAM cell using dual-rail

structure.

Table 3: Comparison of 256-word x 64-bit TCAMs under a 90nm CMOS technology.

(a)

Cell size:4.536 m2

(b)

Cell size:3.186 m2

Cycle time[nsec]

Search delay[nsec]

Precharge delay[nsec]

Energy metric[fJ/bit/search]

# of transistors in TCAM cell

SEU tolerantIn cell

SEU freeIn cell

Delay-variationtolerant

Soft-errordetection Non-volatility

Synchronous(MOS) 3.398 1.699 1.699 0.580 24 Yes No No No No

Extension of Async’13 (MOS) N/A N/A N/A N/A 48 (2x24) Yes No Yes No No

Proposed(MOS/MTJ) 3.410 2.330 (data) 1.060

(spacer) 0.686 20+4MTJs Yes Yes Yes Yes Yes

2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1011