network-on-chip 성균관대 조준동 교수. 차례 znoc 소개 zon chip network 구조 zon chip...
TRANSCRIPT
Network-on-chip
성균관대 조준동 교수
차례
NoC 소개On Chip network 구조On Chip Network 설계 사례NoC 설계 사례A Dynamic Routing Mechanism
for Network on Chip
Technology Evolution
NoC definition
A flexible and scalable packet-based on-chip micro-network designed according to a layered methodology
Los Angeles : Reducing commute time by 15 min -> $15b economic impact
On chip communication will dominate performance, power efficiency.
NoC 의 필요성
Wireless processing system 은 높은 throughput 과 함께 많은 계산을 필요로
하지만 엄격한 power 제약이 있음재구성 SoC 구현은 parallelism 에 의해
성능향상을 시도하고 , IP reuse 를 사용Hot spot bottleneck(or traffic) 에 의한 성능
예측을 통한 Algorithm partitioning
Network-on-chip Architecture
Network-on-chip Architecture
Design challenges for On-Chip Communication Architectures
Three System-on-chip design issues
Technology issuesPerformance issuesDesign productivity issues
Design challenges for On-Chip Communication Architectures
Technology issues : Limiting the on-chip distance travelled by critical
signal due to the global wire delay. Self-synchronous cores that communicate with
another through network-centric architecture to avoid deep sub-micron effect (clock skew,power associated with clock distribution trees)
Signal integrity issues can be solved by designing as regular structures , allow to optimise and well-control electrical parameters of wires.
Design challenges for On-Chip Communication Architectures
Perfomance issues : Network congestion can cause large latency fluctuations for packet delivery. There are two methods to solve this problem. Network overdimensioning (for NoCs support best-
effort traffic). Implementation of dedicated mechanisms to
provide guarantees for timing constrained traffic (e.g.,loss-less data transport,minimal bandwidth,bounded latency,throughput).
Design challenges for On-Chip Communication Architectures
Design productivity issues : The reuse of complex pre-verified design blocks is
efficient means to increase productivity. To use processing elements in different platform by
means of plug-and play style needs a scalable and modular on-chip network
Using processing elements is facilitated by standard network which make the modularity property NoC effective.
Some standard of networks-on-chip were proposed such as Virtual Socket Interface Alliance (VISA) ,OCP.
Network-on-chip Architecture
Network Interface (NI) : Hiding detail about network communication protocol to
the cores, developed independently of the communication infrastructure.
Communication protocol conversion (from end-to-end to network protocol).
Data packetization (packet assembly,delivery and disassembly).It is a critical task.Messages that have to transmitted across the network are partitioned into fixed-length packets.Packets are broken into flits that are represent logical units of information. A phit is a information unit that can be transferred across a physical channel.
Network-on-chip Architecture
Network Switch : Carry packets injected into the network to their final
destination , following a staticaly defined or dynamically determined routing path.
Switch may have both input and output buffer or only one type of buffers
Network flow control (routing mode) addresses the limited amount of buffering resources.
Three policies of network flow control are :Store-and-forward routing : an entire packet is received
and store before being forwared to next switch.Virtual cut-through routing : Also requires buffers space
for packet but allow lower latency communication.
Network-on-chip Architecture
Network Switch : Three policies of network flow control are (cont):
Wormhole routing : Reduce switch memory requirements and permit low latency communication.First flits is decoded and switch creates a path for next flits.A flit is passed to the next switch as soon as enough space to store it ,even though there is not enough space to store whole packet.
Guaranteeing quality-of-service (QoS) in switch operation needs to be service when time-constrained traffic is to be supported.
Contention related delay are responsible for large fluctuation of performance metrics.
From Spaghetti wires to Noc Marcello Coppola, MPSOC05
On-chip communication Infrastructure
온칩 네트워크 아키텍처
● Router/Scheduler 알고리즘 개발 ● SystemC 를 이용한 네트워크 모델 설계
및 검증 ● Star 형 /Mesh 형 온칩 네트워크 핵심 IP
설계 ● Master/Slave 네트워크 인터페이스 ,
고성능 메모리 관리 인터페이스 설계
온칩 네트워크 기반 SoC 설계 플랫폼
● 분산형 Crossbar Switch Topology 생성 및 IP 맵핑 툴 개발
● IP to Mesh Tile 맵핑 툴 개발 ● IP 간 데이터 플로우 분석 기반 네트워크
Topology 생성 툴 개발 , SoC 플랫폼 구축
활용 분야
- QoS 를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용량 데이터 대역폭이 요구되는 응용 분야에 적합
- 멀티미디어 SoC, 휴대 및 통신용 단말기 , 인터넷 셋톱 박스 , 게임기 , 네트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등
- high frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용량 응용분야 SoC 설계
- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기반
- 설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함
On chip communication
Putting the blocks togetherposed tough questions:
•Do the hardware interfaces work with one another?• Do the chip have enough bus and memory bandwidth under worst-case loads?• Do software tasks communicate without deadlock?• Do all applications and features of the full system meet functional goals?• Does the system meet performance goals? • Are the cost, power acceptable?
IBM’s Coreconnect
초기의 32 비트에서 시작하여 128 비트까지 대역폭을 확장
Sonics Smart Interconnect IP
SMART (Sonics Methodology and Architecture for Rapid Time-to-Market)
plug-and-play on-chip communications network
Packet-based50 employees in a year IP 및 설계환경 제공 , SoC 설계 지원Cadence 와 연합 SiliconBackplne III 는 통신 + 미디어
Arteris NoC layered architecture
OCN Configuration
규칙적인 연결구조와 정적인 스케줄링은 불필요한 interconnect switching 을 제거
전체 core 에서 Computational load 의 균형을 맞추어 성능향상
Overhead of the configuration streams Configuration streams must be scheduled
periodically along with the data 4% 의 bandwidth 를 configuration stream 이 사용
Data content variation 과 system operating 환경에 따라 core interface 와 core 자체가 low power 모드로 동적 재설정
Scheduled Communication
Tile 은 computational core Core interface 는
heterogeneous processing 의 사용 제공
Statically scheduled mesh of interconnect
Data 는 이웃하는 tile 과 communication pipeline 에 의해 이동 . Fast clock rate 와 interconnection resource 의 시 분할이 가능
Core 와 runtime interconnect 의 재설정 능력에 의해 dynamic power management 를 가능
Adaptive System on Chip
Communication Interface
-Stream data that passes through a communication interface is scheduled for a specific communication - clock cycle based on data link availability.-the result of scheduling for each interface is a set of instructions for its associated interconnect memory.
9-core and 16-core Mode
Evaluation Methodology
Performance of the Benchmarks
iSOC Compiler
divides applications into parts, each of which fit into a specific core.
determines data communications between the cores in a space-time fashion
generate interconnect memory contents for each individual interface.
References
aSOC: A Scalable, Single-Chip Communications ArchitectureJian Liang, Sriram Swaminathan, and Russell TessierUniversity of Massachusetts, Amherst, MA. 01003.{jliang, tessier}@ecs.umass.edu
Configurable Platforms With Dynamic Platform Management:
An Efficient Alternative to Application-Specific System-on-Chips Krishna Sekar Kanishka Lahiri Sujit Dey [email protected] [email protected] [email protected] Dept. of ECE, UC San Diego, La Jolla, CA NEC Laboratories America, Princeton, NJ
Benchmarks, EE Times,7/2005
Xpipes, Bologna and Stanford : compared w/ Amba AHB multilayer bus, 21% faster, but worse latency
When, Univ. of Kaiserslautern: LPDC decoder: 500Mhz vs 64 Mhz (fixed bus), but 30W vs. 700mW, twice the die size.
Arteris: better die size, comparable power consumption, 740Mhz (250Mhz)
SonicsMX: power-efficient mobile-handset w/ power management
STNoC, Spidergon: topology w/ degree 2-3
NoC Applicationshttp://www.eit.uni-kl.de/wehn
• Turbo-Decoder UMTS compliant, 100Mbit: large flexibilty w/ 14 parallel units, area = 16.84 mm2 (14mm2 PUs, 2.8mm2 NoC)
• LDPC Decoding, T. Theocharides, G. Link, N. Chip, T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, Int. Conference on VLSI Design 2005
– 1024 Bit block size, 1.2Gb/s, R=0.75 – NoC: 5x5 2D mesh, dimension-order routing,
large flexibility– 160nm CMOS Technology, 1.8V, 500 MHz,
110 mm2, ~30 Watt
References
Terry Tao Ye, On-Chip Multiprocessor Communication Network Design and Analysis, Ph.D. Dissertation, Stanford Univ.
E. Bolotin, et al., Automatic hardware-Efficient SoC Integration by QoS network on Chip, Israel Institute of Tech, Haifa, Israel.
E. Bolotin, et al., Efficient Routing in Irregular Topology NoCs, Technion- Israel Institute of Tech [1] Alexandre E.Eichenberger, Kathryn O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener,
Janice C. Shepherd, Byoungro So, Zehra Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael GschwindL. Gauthier, S. Yoo, A. A. Jerraya “Optimizing Compiler for a CELL Processor”, PACT 2005, 17-21, pp161 – 172, Sept. 2005
[2] Sunao TORI, *Junji SAKAI, *INOUE, Hiroaki, *Tatsuya TOKUE and YoshiYuki ITO, “Asymmetric Multi-Processing Mobile Application Processor MP211”
[3] The Intel XeonTM Processor MP and the Intel XeonTM Processor MP with up to 2-MB L3 Cache on the 0.13 Micron Process
[4] Hans-Joachim Stolberg, Mladen Berkovic, Lars Friebe, Soren Moch, Sebastian Flugel, Xun Mao, Mark B. Kulaczewski, Heiko Klubmann, and Peter Pirsch, “A Multi-Core System-on-Chip Architecture for Multimedia Signal Processing Applications”, SIPS 2003, 27-29, pp. 189 – 194, Aug. 2003 ,
[5] Chen Yingqi, Yang Yuhong, Wang Feng, Guo Kai, “Inter Multi processor communication scheme and shared memory control in the HDTV decoder SoC design”, IWVDVT 2005, 28-30, pp304 – 307, May 2005
[6] Kumar, R.; Tullsen, D.M.; Jouppi, N.P.; Ranganathan, P., “Heterogeneous Chip Multiprocessors”, Computer, Volume 38, Issue 11, pp. 32 – 38, Nov. 2005