national chiao tung university - lecture 2 …mapl.nctu.edu.tw/course/esl_2008/files/lecture 2.pdf2...

60
1 Lecture 2 Transaction Level Modeling Lecture 2 Transaction Level Modeling Multimedia Architecture and Processing Laboratory 多媒體架構與處理實驗室 Prof. Wen-Hsiao Peng (彭文孝) [email protected] 2007 Spring Term

Upload: others

Post on 23-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Lecture 2 Transaction Level ModelingLecture 2 Transaction Level Modeling

    Multimedia Architecture and Processing Laboratory多媒體架構與處理實驗室

    Prof. Wen-Hsiao Peng (彭文孝)[email protected]

    2007 Spring Term

  • 2

    AcknowledgementsAcknowledgements

    This lecture note is partly contributed by Prof. Gwo Giun Lee (李國君) in the Dept. of EE, National Cheng-Kung University and his team members 王明俊, 林和源 in the research laboratory 多媒體系統晶片實驗室

    E-mail: [email protected]: +886-6-275-7575 ext. 62448 Web: http://140.116.216.53

  • 3

    ReferencesReferences

    Frank, Ghenassia, “Transaction-Level Modeling with System C:TLM Concepts and Applications for Embedded Systems”, Springer, 2005 (ISBN: 0-387-26232-6)

    L. Cai and D. Gajski, “Transaction Level Modeling in System Level Design”, CECS Technical Report, 2003

    http://www.cecs.uci.edu/technical_report/TR03-10.pdf

  • 4

    OutlineOutline

    Raising Abstraction LevelA game of balancing the trade-off between speed and accuracy

    Concepts of Transaction Level Modeling (TLM)TLM at different abstraction levels

    Work with TLM for SoC DevelopmentSoC based TLM design flow

    TLM Modeling ApproachesTimed TLM

    Untimed TLMs

  • 5

    Raising Abstraction LevelRaising Abstraction Level

  • 6

    Call for Raising Abstraction LevelCall for Raising Abstraction LevelRationales

    Ever-increasing system complexity, cost, and time-to-market stress

    Designers cannot do well with classical VLSI design flow

    MotivationTo improve productivity through a reliable design methodology within a short time-frame

    PurposesPerform system architecture exploration

    Enable earlier SW development and system integration

    Allow HW/SW co-design founded on a unique reference

    Increase simulation speed with no or little accuracy degradation

  • 7

    Raising Abstraction LevelRaising Abstraction Level

    A game of balancing the trade-off between speed and accuracy

    The two extreme endsAlgorithmic level – functional model

    Only capture the algorithm regardless of implementation details

    No notion of HW or SW component

    Model neither registers nor system synchronization

    Cannot execute embedded SW

    Register Transfer Level (RTL) – pure logic simulationAccurate to the real implementation

    Long development phase and lengthy simulation

    Cannot execute embedded SW in a reasonable amount of time

    Embedded SW can only be tested rather late in the design flow

    Any system modification will be too costly at this stage

  • 8

    Raising Abstraction Level (c.1)Raising Abstraction Level (c.1)

    In-between solutions with the following criteria must be resolved

    Speed Not unacceptable to wait for even just a day to complete simulation

    Must simulate millions of cycles within a reasonable time

    AccuracySustain a certain degree of accuracy to deliver reliable simulation

    Be detailed enough to run the embedded SW

    Lightweight modelingEffort in addition to RTL modeling must be kept insubstantial

    Be a quick-to-develop model at a considerably low effort

  • 9

    Concepts of Transaction LConcepts of Transaction Leevel Modelingvel Modeling

  • 10

    Transaction Level Modeling (TLM)Transaction Level Modeling (TLM)

    A transaction-based model to cope with system level designResiding between algorithmic and bit-true cycle-accurate levels

    Targeting at SW development, architecture analysis, functional verification, and performance analysis by adding timing annotations

    A system design conceptBeing language independent

    Highlight the concept of separating computation from communicationDetails of computation and communication are refined independently

    Hide unnecessary details of communication and computationTrade-off between accuracy and speed

  • 11

    Efficiency of Modeling SettingsEfficiency of Modeling Settings

    Comparisons of RTL, Cycle-Accurate, and TLM models in terms of simulation speed and modeling efforts

    Fine-grain simulationat the expense of slower

    speed and later availability

    Early architecture exploration and SW development at lightweight development

    effort

  • 12

    v

    Efficiency of Modeling Settings (c. 2)Efficiency of Modeling Settings (c. 2)

    ProcessA-1

    ProcessA-2

    ProcessA-3

    CLK

    Input Output

    Module B

    Module A

    Module C

    System

  • 13

    RTL SimulationRTL Simulation

    Thread A-1

    Thread A-2

    Thread A-3

    Thread B-1

    Thread B-2

    Thread C-1

    Thread C-2

    1 Cycle

    600Cycles

  • 14

    Cycle Accurate SimulationCycle Accurate Simulation

    Thread A

    Thread B

    Thread C

    1 Cycle

    600CyclesProcess [A1, A2, A3]

    Process [C1, C2]

  • 15

    TLM SimulationTLM Simulation

    Thread A

    Thread B

    Thread C

    1 Event 600Cycles

    Process [A1, A2, A3]x600 Process [C1, C2]x600

  • 16

    Generic SoC TLM PlatformGeneric SoC TLM Platform

    RTL ASIC

    Adapter

    CPU

    Arbiter

    DSP coprocessor

    Program MemoryData

    Memory

    Display I/F

    I/O

    Hard Disk

    TLM Modules (Modeling of IPs’ Function, Behavior, or Architecture)

    Abstract Bus(Implement Interface Functions

    for Data Transactions)

    Module Ports(Provide Interface Functions to Access Channel for Abstract Data Transactions)

    RTL Modules (Modeling of IPs’ Micro-architecture)

  • 17

    SoC TLM Platform (c. 1)SoC TLM Platform (c. 1)

    ModuleEach of the system components is modeled as a module

    Behaviors are described by a set of concurrent processes and threads

    ChannelData exchange between modules are established by channel

    The realization of the interface function

    PortModules and channels are bounded by means of ports

    InterfaceTransactions are requested by interface function of the module port

    The very part separating computation from communication

  • 18

    SoC TLM Platform (c. 2)SoC TLM Platform (c. 2)

    Embedded SW Simulated with native or cross compilation

    Native compilationSW behavior is modeled as a module and executed by the workstation for fast simulation

    Cross compilationSW is compiled for target processor architecture and executed by the associate instruction set simulator (ISS) for precise accuracy

  • 19

    System SynchronizationSystem SynchronizationDefinition

    A mechanism to inform others or to get informed about system state changes when these changes influence the executions of some other parts of the systemRe-scheduling points to guarantee an simulation of concurrency

    An important employment of system synchronization is the assurance of memory or data consistency

    Prevent concurrent processes from reading data at unknown statePrevent concurrent processes from writing data at temporarily inaccessible area

    Deciding where and when to implement system synchronizationToo many synchronization points will tend to be too close to RTLToo few synchronization points may have inaccurate system execution

  • 20

    Granularity of System SynchronizationGranularity of System SynchronizationThe granularity of system synchronization in RTL and TLM

    S1 and S2 are two system states and synchronization points

    FRTL: clocked processes representing system micro-architectureContext switches at the cycle boundaryMore parallel executions of processes

    FTLM : sequential execution of programming codes between S1 and S2Context switches at the event boundaryLess parallel executions of processes

    A collection of allcycle-accurate

    computations to bring S1 to S2

    An equivalent function to

    bring S1 to S2without any

    clocks

  • 21

    Modeling AccuracyModeling Accuracy

    The precision or correctness of the model in replicating the behavior and activities of a system-under-design

    Two decisive factorsGranularity of communication

    The fineness of the data carried by the communication structure

    The transfer of a video IP with a frame-based algorithm

    Application Packet: frame-by-frame data transfer

    Bus Packet: line- or column-based transfer of video frame

    Bus Size: pixel-based transfer of video frame

    Timing AccuracyThe fidelity to the intended timing behavior

    Two extreme ends: untimed and cycle-accurate

    In-between: approximate timed

  • 22

    Modeling Accuracy (c. 1)Modeling Accuracy (c. 1)

    A glimpse at the modeling accuracyCo

    mm

    unic

    atio

    n G

    ranu

    larit

    y

    Timing Accuracy

  • 23

    Different Abstraction ModelsDifferent Abstraction Models

    ComputationCycle-timed

    A: Specification model

    B: Component-assembly model

    C: Bus-arbitration model

    D: Bus-functional model

    E: Cycle-accuracy computation model

    F: RTL model

    B,C,D,E are TLMs

    Communication

    Un-timed

    Un-timed

    Approximate-timed

    Cycle-timed

    Approximate-timed

    A B

    C

    D

    E

    F

    Several abstraction levels can be defined by considering the different timing accuracy in computation and communication

  • 24

    Specification Model Specification Model –– Functional ViewFunctional View

    The system functionality without implementation details

    Data transfer is modeled by variable accessing without any concept of channel

  • 25

    ComponentComponent--Assembly Model Assembly Model –– Architectural View Architectural View

    Allocation of concurrent processing elements and mapping of processes

    Data transfer is achieved by message passing channels

    Message-passing channelsAn abstract implementation of communication focusing on data transaction

    No cycle-accurate, pin-accurate details, and no specific bus protocol

  • 26

    Bus Arbitration Model Bus Arbitration Model –– Architectural View Architectural View Channels between PEs are realized by an abstract bus

    Require design decision in both computation and communication

    An abstract bus Data transfers are still implemented by message passing Bus protocol is simplified as blocking and non-blocking I/OArbiter is required to resolve bus conflictsNo cycle-accurate, pin-accurate details

  • 27

    Bus Functional Model Bus Functional Model –– MicroMicro--architectural Viewarchitectural View

    Abstract bus channel is inline with a cycle-/pin-accurate protocol channelWires of the bus are instantiated with variables/signalsData transfer follows the time/cycle-accurate sequenceProvide interface functions for all abstract bus transactionsWrappers convert data transfer from higher level of abstraction (PEs) to lower level of abstraction (Protocol Channel)

  • 28

    CycleCycle--Accurate Computation Accurate Computation –– MicroMicro--architectural View architectural View

    The PEs are cycle- and pin-accurate Dedicated hardware IPs are modeled at register transfer level

    Programmable processors are modeled by instruction set simulator

    Wrappers convert data transfer from higher level of abstraction (abstract bus) to lower level of abstraction (PEs)

  • 29

    RTL Model RTL Model –– Pure MicroPure Micro--architecture Viewarchitecture View

    Both computation and communication are pin- and cycle-accurate

    Programmable ProcessorsModeled with Instruction

    Set Simulator

    Dedicated Hardware Modeled by Pin- and Cycle-accurate

    RTL Model

    Interconnect StructureModeled by Pin- and

    Cycle-accurate RTL Model

  • 30

    Timing Accuracy of Transaction Level ModelsTiming Accuracy of Transaction Level Models

    Cycle-timedCycle-timedRTL

    Cycle-timedApproximate-timedCycle-Accurate Computation

    Approximate-timedCycle-timedBus Functional

    Approximate-timedApproximate-timedBus Arbitration

    Approximate-timedUn-timedComponent-assembly

    Un-timedUn-timedSpeciation

    FunctionalityCommunicationModel

  • 31

    Work with TLM for SoC DevelopmentWork with TLM for SoC Development

  • 32

    TLMTLM--based SoC Designbased SoC Design

    Requirement Definition

    SpecificationDevelopment

    SpecificationModel

    System ArchitectureModel Development

    HardwareRTL Development

    Synthesis

    Placement andRoute

    HW Development

    Embedded SoftwareDevelopment

    Transaction LevelModel (TLM)

    Development forBoth SW/HW

    SW/HW Integrationand Co-verification

    Based on TLM

    Test Chip

    SW Development

    Chip Fabrication

    Model RefinementModel Refinement

    Specification

    Design Space ExplorationDesign Space Exploration

  • 33

    Unique Reference for Different TeamsUnique Reference for Different TeamsAlgorithm team

    Algorithm development and verification with TLMSoftware team

    Functional SW development with untimed TLMReal-time SW development with timed TLM

    Hardware teamArchitectural analysis using untimed TLM with functional delays Performance analysis using timed TLM with micro-architecture details

    Verification teamGolden model to generate the expected results of test scenarios

  • 34

    Benefits of TLMBenefits of TLMThe simulation speed is fast while the accuracy is still high due to the fact that unnecessary details are ignored

    TLM creates a clear and seamless path from customer requirements to detailed hardware and software specification

    TLM helps us to explore the system architecture with the initialsoftware/hardware partition, CPU selection and bus architecture explorationTLM provides the early software developing environment so that software/hardware co-design and co-verification in the early design stages is also possibleTLM also provides the “golden model” for hardware function verification

    Hybrid abstraction level modeling and verification are possible so that the details of each module are added incrementally (module refinement)

    System integration begins at the early design stages so that the potential problems can be found and solved earlier

  • 35

    From Specification to MicroFrom Specification to Micro--architecturearchitecture

    ComputationCycle-timed

    A: Specification model

    B: Component-assembly model

    C: Bus-arbitration model

    D: Bus-functional model

    E: Cycle-accuracy computation model

    F: RTL model

    B,C,D,E are TLMs

    Communication

    Un-timed

    Un-timed

    Approximate-timed

    Cycle-timed

    Approximate-timed

    A B

    C

    D

    E

    F

    System Specification

    System Micro-architecture

    How to do it?

  • 36

    Component AssemblyComponent AssemblyBased on the algorithm analyses, we perform the following tasksPartition the algorithm into SW/HW tasks

    Select general purpose CPU or DSP based on the SW characteristicsChoose RTOS if necessary

    Design IPs or select IPs from library according to the HW tasks Define the functionalities of each IPDefine the interfaces and the data to be exchanged between IPsEstimate functional delays in IPs

  • 37

    Communication ExplorationCommunication Exploration

    Decision of interconnect structureBack-door connections or centralized buses

    Assign bus-accessing properties for each IP (master or slave)

    Decide the bus arbitration policy

    Estimate functional delays in communication

  • 38

    Protocol RefinementProtocol Refinement

    Inline abstract bus with protocol channelDetermine the pin- and cycle-accurate bus protocol

    Work out the details of the bus control signals

    Wrappers are used to bridge the models of different abstractions

    Extract delays in communication from micro-architecture

  • 39

    IP RefinementIP Refinement

    The IPs are refined to pin- and cycle-accuracy

    Delays in computation are extracted from micro-architecture

    The embedded SW is optimized to achieve real-time performance

  • 40

    IP ReplacementIP Replacement

    Some IPs are modeled with pin- or cycle-accuracyThe IPs are replaced or refined one by one

    Wrappers are used to bridge the models of different abstractions

  • 41

    Communication RefinementCommunication Refinement

    Inline abstract bus with protocol channelDetermine the pin- and cycle-accurate bus protocol

    Work out the details of the bus control signals

    Extract delays in communication from micro-architecture

    Wrappers are used to bridge the models of different abstractions

  • 42

    TLM Modeling ApproachesTLM Modeling Approaches

  • 43

    Two Fundamental Classes of TLMTwo Fundamental Classes of TLM

    Untimed TLM (Programmer’s View, PV)Serve software programmers and verification engineers in early functional SW development and functional verification

    Capture no information related to the micro-architecture of the component or IP-under-design

    No timing information related to the micro-architecture

    No interconnect topology and arbitration law

    May have functional delay/timing from system specification

    Timed TLM (Programmer’s View Plus Timing, PVT)Serve software programmers and architects for real-time embedded SW development and architectural analysis

    Containing essential timing annotations for behavioral and communication specifications

  • 44

    Untimed Transaction LUntimed Transaction Leevel Modelingvel Modeling

  • 45

    Untimed TLM Untimed TLM –– Data Flow ModelingData Flow Modeling

    There is no clock in an untimed TLMAbsolutely no timing information related to the micro-architecture

    Conceptually all processes are executed concurrently Must ensure a correct behavior during the parallel executions

    Must respect a certain degree of process execution order

    Untimed “module” is suggested to have the following characteristics

    Concurrent execution of independent processes

    Causal dependencies between processes by system synchronization

    Bit-true behavior

    Register-accurate or bit-true interface for communication

  • 46

    Modeling of ComputationModeling of Computation

    What are to be modeled?Internal computation at functional or behavioral level (primary efforts)

    Input/Output of the block as well as its synchronization

    What are NOT to be modeled?Micro-architectural implementation details

    E.g., internal pipelines or structures

    Modules representing hardware blocks or IPs are suggested to have the following characteristics

    Bit-true (bit-accurate) behavior

    Register-accurate interface

    System synchronizations managed by the component

  • 47

    System SynchronizationSystem Synchronization

    Untimed TLM must characterize the causal relation between its different processes to assure deterministic system behavior

    Such dependencies are respected by explicit system synchronization

    System synchronization only defines a partial order of executionAny execution order among the processes is permitted as long as the causal dependency are respected

    TLM shall conform to functional specification with any legal interleaves of processes

    How can one implement system synchronization in TLM?Event, signal, interrupt, polling, mailbox, or an abstract implementation that matches the considered level of abstraction

    If any of the mechanisms causes a call to simulation kernel, it enables the scheduler to active the execution of other modules

  • 48

    Example of System SynchronizationExample of System Synchronization

    P1, P2, and P4 are the 3 processes in a given system

    Each process denotes a thread for a particular module

    The full execution order within each of the 3 processes(a) P11->P12; (b) P21->P22 ; (c) P31->P32

    Two occurrences of system synchronization between P1 and P2(d) P11->P22

    (e) P22->P12

  • 49

    Example of System Synchronization (c. 1)Example of System Synchronization (c. 1)

    The 3 processes can be executed with any order as long as the constraints from (a) to (e) are followed

    P21->P11->P22->P12->P31->P32

    P31->P32->P21->P11->P22->P12

    P11->P21->P22->P31->P32->P12

  • 50

    Modeling of System SynchronizationModeling of System Synchronization

    Emit synchronizationA process sends out a synchronization that may influence the behavior or state of other processes

    Receive synchronizationA process suspend its execution and wait for an incoming event from the system that may influence its behavior or state

    Reduce the number of receive synchronization can minimize the number of context switches compared to other modeling approach

  • 51

    Modeling of System Synchronization (c. 1)Modeling of System Synchronization (c. 1)Handle the constraints of process execution order without timing1. Active or resume a process2. Read input data for control flow and data processing3. Computation4. Write output data if there is any of them5. Return to Step 2 if more computation is required6. Synchronization

    1. If it is “emit-synchronization”, return to step 22. If it is “receive-synchronization”, the process will be suspended

    The process needs an update that might influence its own behavior

    Activate

    Behavioral Simulation

    Sync.

    Sequences of Events

  • 52

    Insertion of Functional DelaysInsertion of Functional DelaysAn untimed TLM IP may insert functional delays that are parts of the system specification

    E.g., a LCD controller with 1/30-second refresh rate

    Functional delays should never cause any system inconsistencyFunctional delay can suspend a process to induce the simulation kernel to choose other eligible processes for execution

    Timeline

    Functional Delay

    Sync.

    Activate

  • 53

    Recommendations of Untimed TLM ModelingRecommendations of Untimed TLM Modeling

    Model functional specificationNo micro-architectural and clock-based information

    Determine data granularity of models (modules) according to the algorithmic accuracy and the expected precision in transfers

    E.g., the model of a video IP expecting frame-level input should be modeled with data granularity at frame level, despite the actualcapability of the interconnect in the silicon

    TLM wrapper must generate correct memory addresses in case of a mismatch between data granularity and data layout in the memory

    Model explicit system synchronization that affects IP behaviorEmploy events within a module for inter-process synchronizationUtilize synchronization protocols for inter-module synchronization

    Communications between modules

  • 54

    Recommendations of Untimted TLM Modeling (c. 1)Recommendations of Untimted TLM Modeling (c. 1)

    Model all sorts of communication interfaces at bit-accurate levelParticularly for register modeling

    Model all sorts of behavior at bit-accurate level

    Avoid implementing process activation based on a regular basisThe process activation based on system activity is compulsory

    Ban uses of global variables

    Reuse readily standalone C models as much as possibleC models should never be replicated as hardwire copies

    They should be reused by means of wrapper or external function calls

  • 55

    Timed Transaction Level ModelingTimed Transaction Level Modeling

  • 56

    Timed TLM Timed TLM –– MicroMicro--architecture Modelingarchitecture Modeling

    Timed TLM can determine a full order of process execution by specifying the delay between each activation and synchronization

    A complete specification of implementation

    Main objectivesBenchmark the performance of the micro-architecture

    Fine tune the micro-architecture

    Optimize the embedded SW to meet real time constraints

  • 57

    Modeling ApproachModeling Approach

    Development of timed TLM must give considerations to the time consumption to the following two aspects

    Computational delayThe time amount used to perform specific system behavior or function

    Communication delayThe time amount consumed in accessing and transferring data

    Physical constraints such as bus size, bus throughput, or memorysize must be considered for timed TLM development

    Modeling tacticsAnnotated model

    Standalone timed model

  • 58

    Annotated ModelAnnotated Model

    Insert annotated timing delays into an untimed modelThe delay for each possible set of activation-synchronization in a process is defined based on the control flow of the component

    The delay are the timing from the micro-architecture

    The delay can be the values of the best, mean, or worst cases

    Suitable if the structure of the untimed model matches the structure of a micro-architecture model

    Annotations are simply wait statements related to the computation time of a specific functionality

    Protect the timing annotations with preprocessing directives#ifdef ANNOTATED_MODEL

  • 59

    Standalone Timed ModelStandalone Timed Model

    A detached model incorporated with the timing informationHigh-level analytical timing models without functional information

    Timing behavior is modeled in such a way that delays are computed by executing the standalone timed model

    Applicable on hardware IPs or processor models

    Suitable when the structure of algorithm is very different from that of micro-architecture

    Example: consider the example of modeling a video applicationIf modeled at the frame level, only those delays associated with decoding a frame can be annotated. However, the micro-architecture allows both the computation and communication to be interleaved

    An untimed model may have multiple standalone timed models for investigating several micro-architecture scenarios

  • 60

    Standalone Timed Model Standalone Timed Model –– Concepts of OperationsConcepts of Operations

    Standalone timed model can be controlled externally if the timing behavior of a component depends on its functional behavior

    Behavioral simulation

    Micro-architecture timing simulation with delays in both computation

    and communication

    Activate

    Timeline