lecture 1 introduction and overviewmapl.nctu.edu.tw/course/esl_2008/files/lecture 1.pdf · no...

29
1 Lecture 1 Introduction and Overview Lecture 1 Introduction and Overview Multimedia Architecture and Processing Laboratory 多媒體架構與處理實驗室 Prof. Wen-Hsiao Peng (彭文孝) [email protected] 2007 Spring Term

Upload: others

Post on 30-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Lecture 1 Introduction and Overview Lecture 1 Introduction and Overview

    Multimedia Architecture and Processing Laboratory多媒體架構與處理實驗室

    Prof. Wen-Hsiao Peng (彭文孝)[email protected]

    2007 Spring Term

  • 2

    AcknowledgementsAcknowledgements

    This lecture note is partly contributed by Prof. Gwo Giun Lee (李國君) in the Dept. of EE, National Cheng-Kung University and his team members 王明俊, 林和源 in the research laboratory 多媒體系統晶片實驗室

    E-mail: [email protected]: +886-6-275-7575 ext. 62448 Web: http://140.116.216.53

  • 3

    ReferencesReferences

    Frank, Ghenassia, “Transaction-Level Modeling with System C:TLM Concepts and Applications for Embedded Systems”, Springer, 2005. (ISBN: 0-387-26232-6)

    David C. Black and Jack Donovan, “SystemC From The Ground Up”, Kluwer Academic, 2004. (ISBN: 1-4020-7988-5)

  • 4

    Silicon EvolutionSilicon EvolutionGordon Moore (Intel) predicted that the number of transistors on an integrated circuit would double in SPEED and CAPACITY for every two years

    100 transistorsper chip

    200K transistorsper chip

    200~1000M transistors

    per chip

  • 5

    SystemSystem--onon--Chip EraChip Era

    System-on-Chip (SoC)Conceiving and integrating distinct electronic components on a single chip to form an entire electronic system

    Fundamental Building Blocks of SoCIntellectual Property (IP) Cores

    Reusable hardware blocks designed to perform particular tasks

    A programmable processor or a hardware entity with fixed behavior

    Embedded Software/FirmwareSoftware/Firmware executed on the programmable processor

    Communication StructuresDifferent IP cores are interconnected by dedicated wires, shared buses or network-on-chip (NoC)

  • 6

    Design Gap between Foundry and IC DesignerDesign Gap between Foundry and IC Designer

    The productivity of IC designer falls behind that of the foundry

  • 7

    SoCSoC ChallengesChallenges

    Multiple Capabilities and Explosive ComplexityMore functions are incorporated into a system

    Not even the slightest error should be tolerated

    Multifaceted-team CorporationRigorous methodology must be implemented to address reliability

    The reliability reinforcement must span throughout the design flow

    Time-to-Market PressureMarket does not allow superfluous time loss in product development

    Sky-rocketing CostSoC design necessitates higher workforce and much costly masks

    Re-spins due to errors in functionalities or performance are not tolerated

  • 8Requirement Definition

    SpecificationDevelopment

    SpecificationModel

    HardwareRTL Development

    Synthesis

    System ArchitectureModel Development

    Placement andRoute

    Emulator, FPGAPrototype, Test

    Chip

    Embedded SoftwareDevelopment

    HW Development SW Development

    Chip Fabrication

    SystemIntegration &

    Validation

    VLSI Design FlowVLSI Design Flow

    System Components & Interfaces among

    Components

    System Functionalities

    Independent SW/HW

    Development

    System Prototype after

    HW DevelopmentSystem Integration & Validation after

    SW/HW Development

  • 9

    Problems of Classical VLSI Design FlowProblems of Classical VLSI Design FlowSystem architecture is NOT justified

    Architecture study using spreadsheet or point simulations could lead to over-dimensioned design due to margins for uncertainties

    No communication between SW and HW teamsSeparate teams work on incoherent models

    SW is validated later than HWTesting SW requires mapping RTL codes on an emulator or prototypeEmulator or prototype may not precisely reflect the target design

    Co-verification with RTL and SW is time-consumingA few hundreds of bus cycles per secondEx: It takes ~60 hours to simulate the decoding of 1-second video coded with H.264/AVC at 1920x1080@30Hz

    System integration is done after SW/HW developmentsAny errors found in SW or HW require time-consuming regression

  • 10

    Needs for a Novel Needs for a Novel SoCSoC Design FlowDesign Flow

    An expanded space that links all the different phases of the design through a centralized methodology

    A fast yet accurate system simulation to explore design spaces and well perform architecture analysis and functional verification

    An efficient verification process for attesting SoC functional behavior and performance resulting from integration of IPs

    A more flexible and efficient design flow to optimize the time management of SoC projects

  • 11

    Solution: CycleSolution: Cycle--Accurate C/C++ ModelingAccurate C/C++ Modeling

    Extensively used in late 1990s for faster simulation over RTL

    ProsNo synthesis related constraints

    Simulation is at least one order of magnitude faster than RTLA few KHz compared to the several hundreds of Hz for RTL

    ConsChanges in the C model is almost as long as doing so in the RTL

    Modeling effort was close to creating synthesizable RTL model

    Not possible to keep updating C model due to tight scheduleThe C model would not be usable for the next generation design

    Cycle accurate model still captures too many design details thatmay not be necessary

  • 12

    Alternative: Electronic System Level DesignAlternative: Electronic System Level DesignElectronic System Level (ESL) Design

    Raise the design abstraction above the register transfer level (RTL)

    Create a transaction level model (TLM) as a unique reference throughout the design process

    Requirements of the TLMAllow quicker modeling

    Easy to implement and less efforts than creating RTL

    Separate communication from computation within a systemAllow communication and computation be modeled and refined independently

    Be precise and yet fast enoughEnable design space exploration

    Enable system architecture analysis and functional verification

    Enable earlier SW development and testing

    Enable concurrent SW and HW developments

  • 13

    Level of AbstractionLevel of Abstraction

    The level of details by which a system is viewedThe higher the level, the less detail

    The lower the level, the more detail

    Level of abstraction in classical VLSI design flowAlgorithm Level

    Algorithms of the given tasks for realizing the system

    Register Transfer LevelData transfer from one register to another at the cycle boundary

    Gate LevelThe compositions of logic gates for realizing the Boolean operations

    Physical Level The physical placement and routing of the transistors

  • 14

    Transaction Level ModelTransaction Level ModelTLM is a transaction-based model to cope with system level design

    A transaction is defined as data transfer or synchronization between two modules at an instant or event

    TLM reflects the system architectureCapture the details between algorithm and register transfer levels

    Concern with the internal interfaces and data flow among the components

    The TLM serves as the unique reference across different teams Software team

    The TLM serves as the reference for earlier software development

    Hardware teamThe TLM serves as the reference for coarse-grain architecture analysis

    Verification teamThe TLM serves as the golden model to generate functional verification tests that will be applied on the RTL platform once it become available

  • 15

    Triple AbstractionTriple Abstraction

    Model at Algorithm Level – Functional ViewExecutable specification of the system function

    No implementation details

    Model at Transaction Level – Architecture ViewCapture all the necessary information to develop software

    Serve system architects as a mean for design space exploration

    Provide a golden model for functional verification

    Model at Register Transfer Level – Micro-architecture ViewCapture all the information for timed and cycle-accurate simulations

    Validate low-level embedded software in real hardware simulation environment

    Validate system micro-architecture

  • 16

    Design Abstraction Above RTLDesign Abstraction Above RTL

    Cycle Accurate Model v.s. Transaction Level Model

    Clock

    Data

    Addr

    Clock

    Data

    Addr

    A+8A A+4 A+12 A+16 A+36

    RTL

    TLM

    D

    wait 10 cycles

    Transaction executed by

    memory copy

    bus_mutex.lock();

  • 17

    Evolution of Design FlowsEvolution of Design Flows

    Test&crash

  • 18

    Detailed ViewsDetailed ViewsRequirement Definition

    SpecificationDevelopment

    SpecificationModel

    HardwareRTL Development

    Synthesis

    System ArchitectureModel Development

    Placement andRoute

    Emulator, FPGAPrototype, Test

    Chip

    Embedded SoftwareDevelopment

    HW Development SW Development

    Chip Fabrication

    SystemIntegration &

    Validation

    Requirement Definition

    SpecificationDevelopment

    SpecificationModel

    System ArchitectureModel Development

    HardwareRTL Development

    Synthesis

    Placement andRoute

    HW Development

    Embedded SoftwareDevelopment

    Transaction LevelModel (TLM)

    Development forBoth SW/HW

    SW/HW Integrationand Co-verification

    Based on TLM

    Test Chip

    SW Development

    Chip Fabrication

    Model RefinementModel Refinement

    Specification

    Design Space ExplorationDesign Space Exploration

    Good Old Way Today

    Contemporary Way

    Unique Reference for SW, HW

    and VerificationTeams

  • 19

    Benefits of TLM in Project ScheduleBenefits of TLM in Project Schedule

    Architecture Design

    SW Development

    SW Verification

    HW Design

    HW Functional Verification

    HW Implementation

    System Verification

    Design cycle increases due to TLM

    Earlier software development

    Early system verification and integration due to

    earlier software development

    Shorter design cycle

    Long design cycle or

    project may fail

    T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15Design cycle

  • 20

    SystemCSystemC

    A system description language designed for transaction level modeling and behavioral modeling

    A set of library implemented in C++ plus a non-preemptive, event-driven simulator capable of simulating concurrent processes

  • 21

    History of History of SystemCSystemC

    1999/09/27Open SystemC™ Initiative announced

    2000/03/01 SystemC 0.9RTL constructs + channel concepts for more abstract modeling

    2000/03/28 SystemC 1.0Totally targeting at RTL

    2003/06/03 SystemC 2.0.1 LRM (language reference manual) Support modeling driven by system specification event

    2005/06/06 SystemC 2.1 LRM and TLM 1.0 Support transaction-level modeling API standard

    2005/12/12 IEEE 1666™ -2005 standard for SystemC

  • 22

    Language ComparisonLanguage Comparison

    Algorithm

    Architecture

    HW/SW

    Behavior

    Functional Verification

    RTL

    Gates

    Transistors

    VerilogVHDLSystem

    Verilog

    SystemC C/C++

    Matlab

  • 23

    Transaction Level Modeling with Transaction Level Modeling with SystemCSystemCModules and Processes

    System components are modeled as modules with a set of concurrent processes that represent the behavior

    InterfacesModules exchange communication in the form of transactions by accessing the interfaces through the module ports

    ChannelsTLM interfaces are implemented within channels to encapsulate communication protocol

  • 24

    A Glance at A Glance at SystemCSystemC ImplementationImplementationModuleA.h

    class moduleA :public sc_module {public: sc_in clk_p; sc_port bus_port;

    void A_process();

    SC_CTOR(moduleA){ SC_THREAD(A_process); sensitive channel_write(&data);cout

  • 25

    Design Practice: SystemDesign Practice: System--onon--Chip PlatformChip Platform

    To group IP cores and communication structures on a SoCplatform to create an application-specific SoC template.

    Platform-based design provides users with ample room for product differentiation at reduced design time and effort

    (2) 32-bit AMBA AHB Control Bus

    External Memory Interface

    Mobile DDR SDRAM (256 Mega bits)

    CABAC

    (3) 64-bit AMBA AHB Data Bus

    BitstreamFIFO

    ARM926EJS

    InstructionMemory

    IQ/IDCTMB

    TextureBuffer

    MBMotionBuffer

    Data FetchIntra/InterPrediction

    SubblockReconstruct

    BufferDeBlocking

    IIPFIFO

    DB FIFO

    AddressTranslator

    Hardware InputInterface

    (4) Video pipe

    HDMIInterface

    SynchronizatonBuffer

    (5) Memory Sub-system

    DataMemory

    H.264/AVC Decoder

    Bitstream

    SynchronizatonBuffer

    NALParsing

    23 32

    DataControl

    ModeBuffer

    MotionBuffer

    (1) ARM 9 CPU

  • 26

    Performance Analysis with Performance Analysis with ConvergenSCConvergenSC

    Import of SystemC and HDL Blocks

    Block Diagram Editor

    IP Reuse based on XML meta-

    data

    HW/SW Partitioning and

    Interface Synthesis

    Export of SystemC and HDL Blocks

    (Bus Interface)

    Scripting Interface

  • 27

    Performance Analysis with Performance Analysis with ConvergenSCConvergenSC

    Bus Utilization

    Target Count

    Initiator Count

  • 28

    Design Space ExplorationDesign Space Exploration

    Which one is the best?

    Configuration 1

    Configuration 2

    Configuration 3

  • 29

    Design Space ExplorationDesign Space Exploration

    Instances or Realizations

    ExploreHigh

    HighLow

    Low

    Cost of ChangeAbstraction

    Algorithm Level

    Transaction Level

    Register Transfer Level

    Gate Level

    Physical Design