shashi seminar
TRANSCRIPT
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 1/25
TECHNICAL SEMINAR REPORT
ON
Three-dimensional Image Processing VLSI System with Network-
on-chip System and Reconfigurable Memory Architecture
Submitted in partial fulfillment of the Technical Seminar
VIII Semester, ECE
Under the guidance of
Mrs. Bhagirathi N.M.
(Asst. Professor, Dept. of E & C)
Prescribed By
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Submitted by:
SHASHIKIRAN K 1BI08EC090
2012
Department of Electronics and Communication Engineering
Bangalore Institute of Technology
K R Road, V V Puram, Bengaluru-560004
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 2/25
BANGALORE INSTITUTE OF TECHNOLOGY(Affiliated to Visveswaraya Technological University)
K R ROAD, V V PURAM, BANGALORE-560004
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
CERTIFICATE
This is to certify that the seminar report entitled “Three-dimensional Image
Processing VLSI System with Network-on-chip System and Reconfigurable Memory
Architecture” is presented by Shashikiran.K bearing the USN 1BI08EC090, student of
Final year is in partial fulfillment for the course of Bachelor of Engineering in Electronics
and Communication Engineering of the Visvesvaraya Technological University duringthe academic year 2011-2012.
Signature of the H.O.D Evaluated By
Mrs. Bhagirathi N.M.
Asst. Professor
Name of the student: Shashikiran.K
USN: 1BI08EC090
Date:50
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 3/25
CONTENTS
PAGE
Abstract 1
Introduction 2
Three-dimensional layer architecture 3
3D. RAM/ROM synthesis design system 4
Reconfigurable memory system 6
Processor control system design 7
Network-on-chip design 8
Self-repairable VLSI and dependable reconfigurable system 9
Chip simulation and experimental results 11
Conclusion 18
References i
Appendix iii
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 4/25
1
ABSTRACT
This is an introduction to“new RAM/ROMmodule system with reconfigurablememory architecture forthree-dimensional (3D) image processing VLSI system”.
To enable flexible image data processing, suitable input/output data control is
critical feature for high performance image processing system. The fast speed 3D VLSIsystem also requires efficient pipeline data operation. New RAM/ROM synthesis design
system is realized by specific arrangement with RAM, ROM, pin and interconnection.
The pipeline Flip-Flop control, clock buffer insertion and critical signal route have been
improved to enhance whole system operation speed. The network-on-chip system is alsoproposed to enable fast signal transmission and correct control operation. The 3D image
processing VLSI system can also be improved by suitable data storage and pipeline
control flow. The chip simulation experiments show the accurate results with247.728mW power consumption and 50MHz processing frequency. Practical chip test
conclusion confirms that new RAM/ROM synthesis design can successfully realize inner-
chip write/read function and efficient data flow control to improve 3D reconfigurable
system efficiency. Better image VLSI system can be realized by elaborate network-on-chip system and precise 3D stacking layer design.
.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 5/25
2
I. INTRODUCTION
Recently, image processing technology has been widely used in vision system,
multimedia processor, and consumer electronics. Rapid developing technology requireshigh performance image processor with fast computation speed, small chip size and low
power consumption. In addition, flexible data flow, robust signal control and innerwrite/read operation are also important for image processing system.
To improve image chip performance, three-dimensional (3D) technology has beenused to realize effective image processing VLSI system. Typical 3D technology separates
whole image chip to several function layers. Different layers are stacked vertically andare connected by Through-Silicon Via (TSV) between each layers [4]. In Fig. 1, thefunction layers include CMOS image sensor layer and analog-to-digital (A/D) converters
layer, which is used to transfer analog image signal to input digital image data. In
addition, the following stacking layers, such as frame memory layer, reconfigurable
memory layer, and Processing Element (PE) module layer, are used to deal with inputdigital data and realize fast speed image processing. To improve system operation
efficiency and avoid multi-layer pipeline delay, reconfigurable memory technology has
been introduced to accelerate 3D image processing speed. In addition, recent network-on-chip research has also been developed for 3D architecture construction and inter-layer
data transmission. Data synchronization can be improved by single instruction multiple
data (SIMD) stream, and related pipeline operation stream of multiple instruction
multiple data (MIMD) can also be used to enable image VLSI system performance.
The global data control is important for parallel image processing system. To
realize suitable data control function, several RAM modules are inserted into PE layer, as3D image chip layer architecture in Fig. 1. Some useless ROM memory parts have been
replaced by additional RAM modules. The special Flip-Flop design and clock buffer
adjustment are also used to enable inner data write/read flow. Consequently, image data
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 6/25
3
and control instruction can be inserted or be monitored by outside controller parts. The
3D image system can also be realized easily and control pipeline thread can be improvedby direct data/instruction operation.
The rest parts of this paper are organized as follows. Section II describes whole
system configuration and layer architecture for 3D image processing VLSI system. InSection III, new design methods for Flip-Flop and clock buffer are proposed to solve
RAM/ROM co-design operation. Section IV and Section V also describe reconfigurable
memory system and processor layer system design. Section VI proposes 3D network-on-chip system architecture, and Section VII also introduces self-repairable image system for
dependable reconfigurable VLSI design. Section VIII presents 3D system simulation
results and image chip experiments. Finally, we draw our conclusion and future work inSection IX.
II. THREE-DIMENSIONAL LAYER ARCHITECTURE
The three-dimensional (3D) architecture for parallel image processing system isshown in Fig. 1. Many different function layers are stacked vertically and the Through-
Silicon Via
(TSV) can be used to connect whole chip layers with specific stacking sequence. Byeffective function layer design and precise inter-layer connection, 3D architecture can
reduce chip size, drop power consumption and accelerate system speed. In addition,
image data transmission, system signal bandwidth, and analog-digital converter
efficiency can also be significantly improved.
As shown in Fig. 1, input image data can flow from top layer to down layer forreconfigurable system operation. The input image signal can be sampled by image sensorlayer and be converted to digital image data by A/D converter layer. The frame memory
layer, reconfigurable memory layer, and processing element layer are used to deal with
digital image data. System reconfigurable operation requires careful thread pipeline andintricate state control. Frequent data write/read and direct instruction control can be
considered as critical characteristic for 3D pipeline image system. Thus RAM and ROM
combination system has been proposed to realize effective data control and highperformance image processing.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 7/25
4
III. 3D RAM/ROM S YNTHESIS D ESIGN S YSTEM
A. Synchronous System Architecture
The RAM and ROM modules are used together to realize better data write/read
and inner-chip signal control in 3D image system. New 3D processing technology withFlip-Flop and clock buffer is proposed to generate input image signals as in Fig. 2. To
enable system control and data pipeline, synchronous signal system is used in 3DRAM/ROM co-design system. As in Fig. 2, synchronous clock buffer is usedto push and
delay input clock signal. The serial Flip-Flops are also used to create synchronous signals
under input clock control. The signal phase can be adjusted and synchronous output canalso keep whole system signal in same operation sequence with input clock. By our
proposed synchronous architecture, RAM/ROM synthesis design method can realize
global synchronous control, and image processing system can be pipelined together to
accelerate chip operation speed.
B. Pipeline Latch System
For 3D image processing system, pipeline thread mismatch can happen frequently
and will cause system processing faults. To keep suitable 3D system pipeline process,
synchronous system is recommended with precise instruction control. The proposed
method of replacing common latch with pipeline Flip-Flop is described in Fig. 3. Aswaveform data illustration, input signals cannot always keep synchronous with clock
signal. Then output signal cannot easily get synchronous data output and will cause
system mismatch. Pipeline Flip-Flop (PFF) method is proposed and data switch moduleis used to control output signal under input signal combination. The related Karnaugh
table is also described in Fig. 3 to show the detailed switch selection. In addition, new D-
FF module is also used to replace common RS-FF latch to enable signal synchronizationand 3D system pipeline process.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 8/25
5
C. 3D RAM/ROM Reconfigurable Memory System
The RAM/ROM whole system configuration for 3D image processing system is
illustrated in Fig. 4. The input image data are stored in frame memory and inner-chip datamemory. Through interconnection network between adjacent layers, image data can be
sent to four Process Elements (PEs) for 3D pipeline system operation. Output image
signal can be sent out by system output interface. To control inner-chip data, control unit
and RISC processor are used to realize signal pipeline and data flow. The configurationmemory is also used to insert the reconfiguration signal and enable the 3D reconfigurable
image processing. The RAM/ROM synthesis design system can also write/read input
image data to inner memory modules directly, and straight control instruction through 3Dlayers can also improve image chip performance.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 9/25
6
D. Whole Chip Architecture for 3D Image System
The VLSI chip architecture of 3D image processing system is given in Fig. 5. The
input data, address information and control signal can enter input switch in VLSI
processor chip. Through control module and SRAM module, input image data can realize
pipeline image processing. By output switch module, image data can be sent out toconstruct new output image picture. The control module in Fig. 5 consists of several
modules, including frame memory, four PE modules, MAIN memory, INST memory for
instant data process, and CONFIG memory for reconfigurable data process. The innerimage data can realize pipeline operation by frame memory and PE modules. The image
data are fetched from MAIN memory module. Neighboring INST memory and CONFIG
memory are used together to control pipeline thread and reconfigurable sequence. Theadditional SRAM modules are applied to store image data and control instruction for
direct outside system control into inner-chip modules. The RAM and ROM synthesis
architecture in inner control module can realize system control and precise data pipeline
by proposed chip architecture and memory modules.
IV. RECONFIGURABLE M EMORY SYSTEM
In proposed 3D image processing system, RAM and ROM are used together torealize image data write/read and inner signal control. Synchronous cl ock buffer and
pipeline Flip-Flop element are applied to realize image data operation for system
instruction insertion and memory data fetch. To realize synchronous system control in 3Dimage processing system, new 3D reconfigurable memory system is proposed to enable
image data reconfiguration and system self-repairable operation. Fig. 6 illustrates typical
3D stacking architecture for sensor and reconfigurable memory. Common sensor network
was used to grasp input image data, including static picture data and dynamic movingimage data. The sensor image data will be transferred by A/D converter layer and
interconnect network to next function layer as shown in Fig. 1.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 10/25
7
Next image processing layer is divided by several frame memory blocks as in Fig.
6. Different target image data will be assembled to get related reconfigurable memory
blocks. The separated memory blocks will be different and be suitable for detailed input
image data. If image data operation has some problems, such as image data loss andpicture damage, neighboring memory block will be combined again to remove error
image blocks and enable re-healing processing or self-repairable image processing. The
processing image data and reconfigurable instruction are controlled by processor elementlayer. Thus 3D reconfigurable memory system can realize precise image processing and
raise whole system robustness.
V. PROCESSOR CONTROL SYSTEM DESIGN
The Processing Element (PE) layer in 3D image processor system can controlimage memory configuration and pipeline data flow, as shown in Fig. 7. To realize direct
system control and reduce inter-layer transmission loss, processor modules and relatedmemory blocks are mostly stacked in same vertical column. Input image data from
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 11/25
8
outside sensor layer can be converted by following A/D converter layer. The analog
image signal can be transferred to pipelined digital image data in following columnmemory block. The image data can flow from top layer to down layer vertically, and
system control instruction works from down layer to top layer on the contrary.3D
reconfigurable RAM/ROM memory layer can be controlled easily and input image data
can be operated with fast pipeline thread and flexible inner instruction.
Many related reconfigurable processor system are used in recent VLSI processing
system. Similarly, the processor layer has many processing elements and reconfigurablesystem is also applied to realize reconfigurable image operation. As in Fig. 7, the vertical
data flow can be controlled by adjacent processor layer and frame memory layer.
Processing Elements (PEs) layer enables related memory combination and data block partition. Depended on image operation requirements, frame memory layer can be
divided to several blocks to store pipeline image data. ROM and RAM modules can also
be combined to constitute whole memory block and realize image self-repairable
operation. The image data accuracy and system processing speed will be improved by
precise processor control and 3D reconfigurable architecture.
VI. NETWORK-ON-CHIP DESIGN
For advanced VLSI system research, system processing speed, whole chip area
and power consumption become the critical design challenges. Recently, the 3D stacking
layer architecture and network-on-chip design are also considered to improve systemefficiency. In addition, further research focuses on system combination design for 3D
architecture and network-on-chip system. Thus new 3D interconnect network architecture
has been proposed in this paper to improve layer stacking flexibility and whole system
performance.
In practical 3D image processing chip, we improve the 3D network-on-chip
design based on layer architecture in Fig. 8. For complex stacking layer system, manydifferent function layers are connected with specific Through-Silicon via (TSV)
architecture. Many stacking layer types are used with related TSV connection structures,
including inter-layer TSV, trans-layer TSV, and multi-layer TSV. As in Fig. 8,
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 12/25
9
neighboring layer connection means inter-layer TSV network, which is designed to
connect adjacent layers by specific silicon via and interconnect network. Another TSVtype is trans-layer TSV, which passes through neighboring function layer and connect
corresponding layers by trans-connection silicon via, such as reconfigurable memory
layer and processing element layer in Fig. 8. In addition, further TSV tunnel design can
also connect several layers together and realize multi-layer TSV type with assemblestacking layer connection. The three-layer assemble connection for A/D converter, power
network, and reconfigurable memory layer in Fig. 8 shows typical multi-layer TSV
architecture. Also, four-layer TSV network from power network layer to processingelement layer can describe further complex multi-layer structure in seven-layer image
processing system as in Fig. 8.
VII. SELF -REPAIRABLE VLSI AND DEPENDABLE RECONFIGU-
RABLE SYSTEM
Dependable reconfigurable VLSI system is recent research hotspot for high
performance processor system. In practical image VLSI chip, data operation errors can
happen frequently and will cause serious problems to influence whole systemperformance. To solve image data mismatch and processing error problem, self-
repairable methods and re-healing design technologies are applied in our practical chip
design.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 13/25
10
Common robust design method to repair VLSI system error is reconfigurable re-healing technology. As in Fig. 9, the processing image data are damaged in center part of
whole image blocks. Reconfigurable self-repair method checks the vertical image blocks
to get the detailed error address. The horizontal image blocks are also identified by
memory data scanning to get required image data, which are used to repair error imageblocks with suitable re-healing methods.
After damaged image data and related address blocks are decided by image block sweeping, specific error image blocks will be reconfigured and neighboring memory
blocks are used to replace error image blocks and repair damaged image data by system
design target and related image information. When image error data are corrected in
corresponding image memory blocks, whole VLSI system will enter reconfigurableoperation again to recover original image block architecture.
The repairable image blocks are assembled together and are used to construct new center
image block again. The border separation is removed and four corrected image sub-blocks are composited to realize image re-healing operation.
The critical points for VLSI self-repairable design are reconfigurable block areaand repairable control sequence. The VLSI re-healing performance is determined by
design requirements and system robustness. If large memory area can be used for image
repairing, system correct efficiency will be increased and block searching time will be
extended with high power consumption and large chip area. In addition, if newdependable design technologies are also used together, such as compact reconfigurable
border and small image sub-block, operation power and chip area can be reduced greatly.
However, related reconfigurable processing cannot always realize successful repairable
results, and system robustness will be reduced rapidly. Thus in common repairablemethod, selected image border varies from three pixels to five pixels around the detected
error image block.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 14/25
11
The specific repairable sequence also determinates whole system operation andprocessing efficiency for the 3D image reconfigurable VLSI system. Considering the
detailed control sequence, first system operation is scanning image range and searching
related address for repairable image blocks. Second processing method is reconfigurableoperation to separate the detected damage image blocks and construct neighboring
memory blocks. Third sequence is re-healing processing and new image block
reconfiguration by related memory interconnection and image repairable method. Finally,
memory blocks and processing elements will be combined again to recover originalimage VLSI system. The re-healing sub-image blocks are assembled together and
improved image results are created by repairable processing VLSI and reconfigurableimage system.
VIII. CHIP SIMULATION AND EXPERIMENTAL R ESULTS
A. Image Processor Chip Design and Simulation
Based on synchronous improvements for clock buffer and Flip-Flop latch in 3Dstacking layers, we designed new image processing chip by 0.13 um technology. Fig. 10
shows the layout micrograph for practical manufactured VLSI chip. The detailed chip has
208 pins with 5000 um length and 5000 um width. The gate number is about 980,000
gates and chip utilization is 20.655%. Chip clock cycle is 50 MHz, and practicalprocessing frequency is 25 MHz The interconnect distribution parts use 8 metal layers,
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 15/25
12
including power mesh network, clock tree and other function signal wires.
For IR-drop verification with zero EM violation, practical switching rate is 20%
under 50 MHz clock control. The power consumption is 247.728 mW under 1.2V powersource. For the VDD-drop simulation as in Fig. 11, worst drop value is 13.802 mV with
1.15% drop rate. Similarly, worst rise value is 9.919 mV and related rise rate is 0.827%
for VSS-rise simulation in Fig. 12. Based on experimental simulation for VDD-drop andVSS-rise, image processor chip can realize suitable image operation without excessive
disturbance for signal floating and drop/rise variance. 3D layer stacking in practical
processor chip can also be realized easily and assembled successfully with precise imagedata adaptation and fluent inter-layer signal transmission.
B. Test Board Experiments and Simulation
We also designed test board to get practical experimental results after 3D image
processing chip was manufactured. Photograph of implemented test board is shown in
Fig. 13. Image test board system consists of base board, socket part, Input/output ports,interface part, and image processing chip. Practical base board is designed by 4 layer
experimental board with 180 mm length and 180 mm width. The socket part is embedded
in center range of base board and practical tested chip is inserted in the socket with tightpin contact. Around inserted socket, we designed four column input/output ports, which
are used to write/read image data, memory address and control instruction signals. By
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 16/25
13
outside computer interface port, we can also control practical test board and access board
signals for 3D image processing system simulation
Practical test results for 3D image processing chip by computer monitor system
are given in Fig. 14 and Fig. 15. We control inner image data bus with 1 bit stepincrement input by inserted SRAM modules as in Fig. 5. The practical results of SRAM
output are also upgraded step by step with 1 bit data change. Ladder increment results can
realize suitable write and read procedure from outside part to inside chip directly, as in
Fig. 14. Image system control can be increased greatly and global pipelined operation canbe realized easily with immediate outside instruction control. Inner memory in processor
module, such as Main memory and instant memory (INST memory) in Fig. 5, can alsoaccess outside data signals directly. In addition, Fig. 15 shows processing waveform forinput data read and output data write. Main memory data are used to store main image
processing data, and INST memory data mean the adjustment instruction for image
system reconfiguration. From experimental waveform results, outside data and control
instruction can be inserted into inner RAM modules and can be fetched by outsidesystem. Thus we can realize better data control and faster pipeline operation to increase
whole 3D image chip performance.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 17/25
14
C. Image Simulation and Conversion Results
Image simulation results by 3D reconfigurable image chip are given in Fig. 16.Many image data are tested and can be used for conversion simulation by practical imagechip and computer simulation program. As the image conversion experiments in Fig. 16,
we tested typical image figure named as “Cameraman”, which describes the particular
man using Camera machine to take photograph around his environment. Based on
specific picture conversion and related intern image processing, “Cameraman” figure can
be compressed rapidly. The picture can be used for next image operation to enhance
picture display precision and system processing performance.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 18/25
15
To realize data transformation and image compression, we use specific imageMPEG algorithms to extract figure edge and get the corresponding thresholding figure for
final image processing. Also, Fig. 16(b) and Fig. 16(c) illustrate detailed image
conversion and data transmission, respectively. The practical image data can be operated
quickly and realize fast image processing for super high speed Camera design. 3Dnetwork-on-chip architecture ensures fast system speed and improves data transmission
efficiency. Other MPEG/JPEG image processing methods, such as DCT/IDCT algorithm,
pipeline image operation, multi- layer stacking method, and reconfigurable self-repairable memory, can also be applied for new 3D image processing system.
D. Reconfigurable Image Self-repairable Processing
In practical image chip test, image precision problem and picture distortion
happen frequently. Common image errors are generated in signal processing and datatransformation for 3D image processing system. Robust image self-repairable technology
is necessary for high performance image chip design. Fig. 17 shows typical image
repairable methods in our 3D image processing system. Six pictures from Fig. 17(a) to
Fig. 17(f) are used to explain detailed processing sequences for picture data recovery andimage re-healing results.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 19/25
16
First test picture with plane image in Fig. 17(a) is original experimental picture. InFig. 17(b), two square blocks mean image errors in test picture. Data scanning is
necessary to capture the detailed places in practical picture range. Next reconfigurable
technology is used to replace error image blocks with neighboring image parts as in Fig.
17(c). By related image block repairing, error image blocks can be removed and newimage blocks are assembled again with specific sequence to recover previous image part.
Similar operation is realized continuously for another error block in Fig. 17(d). Following
step with reconfigurable memory block and image re-healing operation are also used torepair image data in Fig. 17(e). Finally, whole test picture can be recovered to correct
image data results as shown in Fig. 17(f).
If there are numerous image errors and large picture area, reconfigurable self-
repairable operation in 3D image VLSI chip is also progressed step by step with similar
sequence as Fig. 17. The self-repairable technology can enhance whole image system
performance. It can also heal error picture parts after 3D image system processing and
inter-layer picture data transmission. Image size and error number can influence the datarecover efficiency and output image quality. Thus the synthesis image operation,
including picture data processing and reconfigurable repairable system, is our maindesign contribution in practical 3D image VLSI system.
E. Memory Allocation and Chip Size
Practical image operation and whole VLSI system are realized by related
RAM/ROM memory blocks in our 3D reconfigurable image chip. The chip size is also
decided by memory allocation area and RAM/ROM block number. As in Table I,memory information and allocation sequence are shown in detail for our image
processing chip.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 20/25
17
Based on 3D chip architecture as in Fig. 5, frame memory blocks (FMem) areused to store image frame data. Total Fmem module has 8 KB data capacity, including
16number and 256x8bit unit size for each frame memory. Practical area for each FMem
block is 46600 um2. Whole FMem block area is allocated with 745600 um2as shown in
Table I. Similarly, data memory (DMem) and I/O memory (IOMem) have same size andarea allocation in our 3D image chip. The important configuration memory (CMem) has
4096x40bit block size and 587600 um2 chip area. The related processing memory, such
as main memory (MMem), pipeline instruction memory (PMem), and table memory(TMem) in Table I also use SRAM modules to deal with image data directly in whole
VLSI system. The pipeline instruction is handled in PMem module, and TMem block
gives system Table memory to store image middle procedure data for next reconfigurableimage processing. Furthermore, additional SRAM module in Fig. 5 uses large 8192x32bit
size and two 32-bit fast memory blocks. The inserted SRAM blocks occupy 212800 um2
, which are used to realize the direct image data and instruction fetch operation into our
3D image processing system.
In summary, total memory blocks have nine type modules with 106 KB size and
70 memory number. Whole memory area is about 4250600 um2by size accumulation andsystem allocation. If chip peripheral ring area is also considered in practical 3D image
system, additional allocated memory area is about 1000000 um2
, which commonly occupies about 25% area in entire memory system. Thus whole imagechip area is more than 5000000 um2with detailed memory size allocation. If addition
core processing elements are also allocated, whole chip area can be increased further and
global synthesis design between memory and processor will become future research
challenges. Robust image processing, such as self-repairable operation and re-healingmethod, will also consider memory allocation and module area in whole 3D image chip.
F. Discussion and Future Challenge
Direct memory insertion in 3D image processing system can improve system
robustness and control inner data operation. As in Table II, common 2D planar image
system has large chip size and slow processing speed. The image data and operation
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 21/25
18
instruction can be handled by common sequence, and its system robustness is not enough
without suitable self-repairable capability and re-healing feature. Compared withcommon 2D image system, our proposed 3D architecture can reduce chip size and
increase processing speed. 3D operation system can also realize fast parallel processing
and robust image operation for image consumer electronics products.
In this paper, we propose new 3D RAM/ROM image system with reconfigurable
operation memory and 3D network-on chip architecture. The design chip can insert
image data or control instruction into inner-chip modules directly. The operation imagedata can also be sent out immediately and inner control instruction can be monitored for
reconfigurable system processing. Whole 3D VLSI system speed can be accelerated
rapidly with very fast parallel and direct data control. 3D image system also has self-repairable feature and re-healing merit to realize dependable VLSI reconfigurable system
and high robustness image operation as in Table II.
One demerit point for 3D network-on-chip architecture is complex critical data
path from input ports to output ports. The extended signal routes will cause data
transmission loss and can influence whole VLSI system performance. By additional clock buffer insertion and pipeline Flip-Flop latch replacement, inner signal delay can be
created and will waste inner processing time. Whole pipeline system frequency will also
be reduced and image data processing speed will be decreased because of critical da tapath delay. In addition, complex network connection can also influence system data
transmission and introduce inter-chip data mismatching.
Another weak point in our 3D chip system is layer stacking efficiency. Different
image function layers have respective layer features and connection methods. The
stacking sequence and neighboring layer relation also require precise design and system
consideration. More stacking layers can realize small chip size, fast operation speed, and
compact image operation. Global system synthesis with different stacking layers andrelated function combination is recent design hotspot. In addition, low power system and
high robust chip will also become important research targets in the future.
Thus our future design challenge in next 3D image system will focus on several
significant targets, such as reduce data critical path, decrease inter-layer connection
complexity, and accelerate image processing speed . The pipeline thread of image innerelement will be studied and the delay path will be divided for inner pipeline process. The
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 22/25
19
multi-layer stacking technology is also future research topic with detailed layer
combination and specific TSV tunnel design. Consequently, precise design adjustmentsin processing element modules and reconfigurable data sequences will be improved to
satisfy new data path flow and highly pipeline image operation in our future complex 3D
system research.
IX. CONCLUSION
In this paper, new reconfigurable system with RAM/ROM memory modules and
3D layer architecture is proposed for highly pipeline image processing chip. Flexible dataflow and direct system control can be realized by precise data fetch in RAM and ROM
memory. The synchronous clock buffer and pipeline Flip-Flop module are used to adjust
3D system processing flow. New 3D stacking layer architecture can also be applied to
reduce image chip size and increase system pipeline speed. Additional 3D network-on-chip connection system can satisfy 3D chip stacking requirements and enable global
pipeline operation for multi-layer VLSI image system. Experimental results in this paperillustrate that new 3D reconfigurable memory system can deal with inner data and control
instruction signals directly for dependable VLSI chip. Further image robust methods,including self-repairable operation and re-healing system, are also used in proposed 3D
image processing system. Future challenges will be focused on critical path reduction,
fast pipeline thread construction, complex multi-layer stacking methods, and highlyrobust self-dependable VLSI system research.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 23/25
i
REFERENCES
D. Doswald, J. Hafliger, P. Blessing, N. Felber, P. Niederer, and W.Fichtner, “A
30-frames/s megapixel real-time CMOS image processor,” IEEE J. Solid-StateCircuits, vol. 35, no. 11, pp. 1732-1743, Nov. 2000.
M. Koyanagi, Y. Nakagawa, K.-W. Lee, T. Nakamura, Y. Yamada, K. Inamura,K. Ki-Tae Park, and H. Kurino, “Neuromorphic vision chip fabricated using
three-dimensi onal integration technology,” in Proc. ISSCC Dig. Tech. Papers ,
Feb. 2001, pp. 270 – 271, 454.
J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, “Global interconnect design in a
three-dimensional system-on-a-chip,” IEEE Trans. VLSI Systems, vol. 12, no. 4,
pp. 367 – 372, Apr. 2004.
M. Koyanagi, T. Fukushima, and T. Tanaka, “High-density through silicon vias
for 3-D LSIs,” Proceedings of the IEEE, vol. 97, no. 1, pp. 49 – 59, Jan. 2009.
K. Kiyoyama, Y. Ohara, K.-W. Lee, Y. Yang, T. Fukushima, T. Tanaka, and M.
Koyanagi, “A parallel ADC for high-speed CMOS image processing system with
3D structure,” in Proc. IEEE Int. Conf. 3D System Integration , Sep. 2009, pp. 1 – 4.
T. Sugimura, Y. Konishi, J. Deguchi, T. Ishihara, T. Fukushima, A. Konno, M.Uchiyama, and M. Koyanagi, “Design of parallel reconfigurable image processor
with three-dimensional structure,” IEICE Trans. Inf. Syst., vol. J89-D, no. 6, pp.
1141 – 1152, Jun. 2006.
D. Amano, T. Sugimura, Y. Konishi, T. Fukushima, T. Tanaka, and M. Koyanagi,
“Reconfigurable stacked memory system for parallel image processing using
three-dimensional LSI technology,” in Proc. IPSJ-SLDM , Oct. 2006, pp. 147 –
152.
D. Lattard, E. Beigne, F. Clermidy, Y. Durand, R. Lemaire, P. Vivet, and F.
Berens, “A reconfigurable baseband platform based on an asynchronousnetwork-on-chip,” IEEE J. Solid-State Circuits , vol. 43, no. 1, pp. 223 – 235, Jan.
2008.
T. Komuro, S. Kagami, and M. Ishikawa, “A dynamically reconfigurable SIMD processor for a vision chip,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 265 – 268, Jan. 2004.
S. Kodama, D. Amano, T. Sugimura, T. Fukushima, T. Tanaka, and M.
Koyanagi, “New reconfigurable memory architecture for parallel image-
processing LSI with three-dimensional structure,” Japanese J. Applied Physics,vol. 47, no. 4, pp. 2774 – 2778, Apr. 2008.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 24/25
ii
D. Kim, Z. Fu, J. H. Park, and E. Culurciello, “A 1-mW CMOS temporal-
difference AER sensor for wireless sensor networks,” IEEE Trans. Elec. Devices,
vol. 56, no. 11, pp. 2586 – 2593, Nov. 2009.
J. Guo and S. Sonkusale, “A high dynamic range CMOS image sensor forscientific imaging applications,” IEEE J. Sensors , vol. 9, no. 10, pp. 1209 – 1218,
Oct. 2009.
H. Singh, M. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Filho,
“MorphoSys: an integrated reconfigur able system for data-parallel and
computation-intensive applications,” IEEE Trans. Computers, vol. 49, no. 5, pp.465 – 481, Nov. 2009.
H. Kondo, M. Nakajima, N. Masui, S. Otani, N. Okumura, Y. Takata, T. Nasu,
H. Takata, T. Higuchi, M. Sakugawa, H. Fujiwara, K. Ishida, K. Ishimi, S.
Kaneko, T. Itoh, M. Sato, O. Yamamoto, and K. Arimot, “Design andimplementation of a configurable heter ogeneous multicore SoC With nine CPUs
and two matrix processors,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 892 – 901, Jan. 2008.
H. Kanbara, R. Kinjo, Y. Toda, H. Okuhata, and M. Ise, “Dependable embedded
processor core for higher reliability,” in Proc. IEEE Int. Symp. Consumer
Electronics, May 2009, pp. 819 – 822.
O. J. Kuiken, X. Zhang, and H. G. Kerkhoff, “Built-in self-diagnostics for aNoC-based reconfigurable IC for dependable beamforming applications,” inProc. IEEE Int. Symp. Defect and Fault Tolerance of VLSI Systems, Oct. 2008,
pp. 45 – 53.
I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, “A low-overhead fault
tolerance scheme for TSV-base d 3D network-on-chip links,” in Proc. IEEE/ACM
Int. Conf. CAD, Nov. 2008, pp. 598 – 602.
F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir,
“Design and management of 3D chip multipr ocessors using network-in-
memory,” in Proc. Int. Symp. Computer Architecture , Jun. 2006, pp. 130– 141.
B. Feero and P. P. Pande, “Performance evaluation for three-dimensional
networks-on-chip,” in Proc. IEEE Computer Society Annual Symp. VLSI, Mar.2007, pp. 9 – 11.
Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang, “A low-radix and low-
diameter 3D interconnection network design,” in Proc. IEEE Int. Symp. HighPerformance Computer Architecture , Feb. 2009, pp. 30 – 42.
8/2/2019 Shashi Seminar
http://slidepdf.com/reader/full/shashi-seminar 25/25
iii
APPENDIX
IEEE paper on Three-dimensional Image Processing VLSI
System with Network- on-chip System and Reconfigurable
Memory Architecture
By
Yun Yang, Member , IEEE