-
1
Lecture 2 Transaction Level ModelingLecture 2 Transaction Level Modeling
Multimedia Architecture and Processing Laboratory多媒體架構與處理實驗室
Prof. Wen-Hsiao Peng (彭文孝)[email protected]
2007 Spring Term
-
2
AcknowledgementsAcknowledgements
This lecture note is partly contributed by Prof. Gwo Giun Lee (李國君) in the Dept. of EE, National Cheng-Kung University and his team members 王明俊, 林和源 in the research laboratory 多媒體系統晶片實驗室
E-mail: [email protected]: +886-6-275-7575 ext. 62448 Web: http://140.116.216.53
-
3
ReferencesReferences
Frank, Ghenassia, “Transaction-Level Modeling with System C:TLM Concepts and Applications for Embedded Systems”, Springer, 2005 (ISBN: 0-387-26232-6)
L. Cai and D. Gajski, “Transaction Level Modeling in System Level Design”, CECS Technical Report, 2003
http://www.cecs.uci.edu/technical_report/TR03-10.pdf
-
4
OutlineOutline
Raising Abstraction LevelA game of balancing the trade-off between speed and accuracy
Concepts of Transaction Level Modeling (TLM)TLM at different abstraction levels
Work with TLM for SoC DevelopmentSoC based TLM design flow
TLM Modeling ApproachesTimed TLM
Untimed TLMs
-
5
Raising Abstraction LevelRaising Abstraction Level
-
6
Call for Raising Abstraction LevelCall for Raising Abstraction LevelRationales
Ever-increasing system complexity, cost, and time-to-market stress
Designers cannot do well with classical VLSI design flow
MotivationTo improve productivity through a reliable design methodology within a short time-frame
PurposesPerform system architecture exploration
Enable earlier SW development and system integration
Allow HW/SW co-design founded on a unique reference
Increase simulation speed with no or little accuracy degradation
-
7
Raising Abstraction LevelRaising Abstraction Level
A game of balancing the trade-off between speed and accuracy
The two extreme endsAlgorithmic level – functional model
Only capture the algorithm regardless of implementation details
No notion of HW or SW component
Model neither registers nor system synchronization
Cannot execute embedded SW
Register Transfer Level (RTL) – pure logic simulationAccurate to the real implementation
Long development phase and lengthy simulation
Cannot execute embedded SW in a reasonable amount of time
Embedded SW can only be tested rather late in the design flow
Any system modification will be too costly at this stage
-
8
Raising Abstraction Level (c.1)Raising Abstraction Level (c.1)
In-between solutions with the following criteria must be resolved
Speed Not unacceptable to wait for even just a day to complete simulation
Must simulate millions of cycles within a reasonable time
AccuracySustain a certain degree of accuracy to deliver reliable simulation
Be detailed enough to run the embedded SW
Lightweight modelingEffort in addition to RTL modeling must be kept insubstantial
Be a quick-to-develop model at a considerably low effort
-
9
Concepts of Transaction LConcepts of Transaction Leevel Modelingvel Modeling
-
10
Transaction Level Modeling (TLM)Transaction Level Modeling (TLM)
A transaction-based model to cope with system level designResiding between algorithmic and bit-true cycle-accurate levels
Targeting at SW development, architecture analysis, functional verification, and performance analysis by adding timing annotations
A system design conceptBeing language independent
Highlight the concept of separating computation from communicationDetails of computation and communication are refined independently
Hide unnecessary details of communication and computationTrade-off between accuracy and speed
-
11
Efficiency of Modeling SettingsEfficiency of Modeling Settings
Comparisons of RTL, Cycle-Accurate, and TLM models in terms of simulation speed and modeling efforts
Fine-grain simulationat the expense of slower
speed and later availability
Early architecture exploration and SW development at lightweight development
effort
-
12
v
Efficiency of Modeling Settings (c. 2)Efficiency of Modeling Settings (c. 2)
ProcessA-1
ProcessA-2
ProcessA-3
CLK
Input Output
Module B
Module A
Module C
System
-
13
RTL SimulationRTL Simulation
Thread A-1
Thread A-2
Thread A-3
Thread B-1
Thread B-2
Thread C-1
Thread C-2
1 Cycle
600Cycles
-
14
Cycle Accurate SimulationCycle Accurate Simulation
Thread A
Thread B
Thread C
1 Cycle
600CyclesProcess [A1, A2, A3]
Process [C1, C2]
-
15
TLM SimulationTLM Simulation
Thread A
Thread B
Thread C
1 Event 600Cycles
Process [A1, A2, A3]x600 Process [C1, C2]x600
-
16
Generic SoC TLM PlatformGeneric SoC TLM Platform
RTL ASIC
Adapter
CPU
Arbiter
DSP coprocessor
Program MemoryData
Memory
Display I/F
I/O
Hard Disk
TLM Modules (Modeling of IPs’ Function, Behavior, or Architecture)
Abstract Bus(Implement Interface Functions
for Data Transactions)
Module Ports(Provide Interface Functions to Access Channel for Abstract Data Transactions)
RTL Modules (Modeling of IPs’ Micro-architecture)
-
17
SoC TLM Platform (c. 1)SoC TLM Platform (c. 1)
ModuleEach of the system components is modeled as a module
Behaviors are described by a set of concurrent processes and threads
ChannelData exchange between modules are established by channel
The realization of the interface function
PortModules and channels are bounded by means of ports
InterfaceTransactions are requested by interface function of the module port
The very part separating computation from communication
-
18
SoC TLM Platform (c. 2)SoC TLM Platform (c. 2)
Embedded SW Simulated with native or cross compilation
Native compilationSW behavior is modeled as a module and executed by the workstation for fast simulation
Cross compilationSW is compiled for target processor architecture and executed by the associate instruction set simulator (ISS) for precise accuracy
-
19
System SynchronizationSystem SynchronizationDefinition
A mechanism to inform others or to get informed about system state changes when these changes influence the executions of some other parts of the systemRe-scheduling points to guarantee an simulation of concurrency
An important employment of system synchronization is the assurance of memory or data consistency
Prevent concurrent processes from reading data at unknown statePrevent concurrent processes from writing data at temporarily inaccessible area
Deciding where and when to implement system synchronizationToo many synchronization points will tend to be too close to RTLToo few synchronization points may have inaccurate system execution
-
20
Granularity of System SynchronizationGranularity of System SynchronizationThe granularity of system synchronization in RTL and TLM
S1 and S2 are two system states and synchronization points
FRTL: clocked processes representing system micro-architectureContext switches at the cycle boundaryMore parallel executions of processes
FTLM : sequential execution of programming codes between S1 and S2Context switches at the event boundaryLess parallel executions of processes
A collection of allcycle-accurate
computations to bring S1 to S2
An equivalent function to
bring S1 to S2without any
clocks
-
21
Modeling AccuracyModeling Accuracy
The precision or correctness of the model in replicating the behavior and activities of a system-under-design
Two decisive factorsGranularity of communication
The fineness of the data carried by the communication structure
The transfer of a video IP with a frame-based algorithm
Application Packet: frame-by-frame data transfer
Bus Packet: line- or column-based transfer of video frame
Bus Size: pixel-based transfer of video frame
Timing AccuracyThe fidelity to the intended timing behavior
Two extreme ends: untimed and cycle-accurate
In-between: approximate timed
-
22
Modeling Accuracy (c. 1)Modeling Accuracy (c. 1)
A glimpse at the modeling accuracyCo
mm
unic
atio
n G
ranu
larit
y
Timing Accuracy
-
23
Different Abstraction ModelsDifferent Abstraction Models
ComputationCycle-timed
A: Specification model
B: Component-assembly model
C: Bus-arbitration model
D: Bus-functional model
E: Cycle-accuracy computation model
F: RTL model
B,C,D,E are TLMs
Communication
Un-timed
Un-timed
Approximate-timed
Cycle-timed
Approximate-timed
A B
C
D
E
F
Several abstraction levels can be defined by considering the different timing accuracy in computation and communication
-
24
Specification Model Specification Model –– Functional ViewFunctional View
The system functionality without implementation details
Data transfer is modeled by variable accessing without any concept of channel
-
25
ComponentComponent--Assembly Model Assembly Model –– Architectural View Architectural View
Allocation of concurrent processing elements and mapping of processes
Data transfer is achieved by message passing channels
Message-passing channelsAn abstract implementation of communication focusing on data transaction
No cycle-accurate, pin-accurate details, and no specific bus protocol
-
26
Bus Arbitration Model Bus Arbitration Model –– Architectural View Architectural View Channels between PEs are realized by an abstract bus
Require design decision in both computation and communication
An abstract bus Data transfers are still implemented by message passing Bus protocol is simplified as blocking and non-blocking I/OArbiter is required to resolve bus conflictsNo cycle-accurate, pin-accurate details
-
27
Bus Functional Model Bus Functional Model –– MicroMicro--architectural Viewarchitectural View
Abstract bus channel is inline with a cycle-/pin-accurate protocol channelWires of the bus are instantiated with variables/signalsData transfer follows the time/cycle-accurate sequenceProvide interface functions for all abstract bus transactionsWrappers convert data transfer from higher level of abstraction (PEs) to lower level of abstraction (Protocol Channel)
-
28
CycleCycle--Accurate Computation Accurate Computation –– MicroMicro--architectural View architectural View
The PEs are cycle- and pin-accurate Dedicated hardware IPs are modeled at register transfer level
Programmable processors are modeled by instruction set simulator
Wrappers convert data transfer from higher level of abstraction (abstract bus) to lower level of abstraction (PEs)
-
29
RTL Model RTL Model –– Pure MicroPure Micro--architecture Viewarchitecture View
Both computation and communication are pin- and cycle-accurate
Programmable ProcessorsModeled with Instruction
Set Simulator
Dedicated Hardware Modeled by Pin- and Cycle-accurate
RTL Model
Interconnect StructureModeled by Pin- and
Cycle-accurate RTL Model
-
30
Timing Accuracy of Transaction Level ModelsTiming Accuracy of Transaction Level Models
Cycle-timedCycle-timedRTL
Cycle-timedApproximate-timedCycle-Accurate Computation
Approximate-timedCycle-timedBus Functional
Approximate-timedApproximate-timedBus Arbitration
Approximate-timedUn-timedComponent-assembly
Un-timedUn-timedSpeciation
FunctionalityCommunicationModel
-
31
Work with TLM for SoC DevelopmentWork with TLM for SoC Development
-
32
TLMTLM--based SoC Designbased SoC Design
Requirement Definition
SpecificationDevelopment
SpecificationModel
System ArchitectureModel Development
HardwareRTL Development
Synthesis
Placement andRoute
HW Development
Embedded SoftwareDevelopment
Transaction LevelModel (TLM)
Development forBoth SW/HW
SW/HW Integrationand Co-verification
Based on TLM
Test Chip
SW Development
Chip Fabrication
Model RefinementModel Refinement
Specification
Design Space ExplorationDesign Space Exploration
-
33
Unique Reference for Different TeamsUnique Reference for Different TeamsAlgorithm team
Algorithm development and verification with TLMSoftware team
Functional SW development with untimed TLMReal-time SW development with timed TLM
Hardware teamArchitectural analysis using untimed TLM with functional delays Performance analysis using timed TLM with micro-architecture details
Verification teamGolden model to generate the expected results of test scenarios
-
34
Benefits of TLMBenefits of TLMThe simulation speed is fast while the accuracy is still high due to the fact that unnecessary details are ignored
TLM creates a clear and seamless path from customer requirements to detailed hardware and software specification
TLM helps us to explore the system architecture with the initialsoftware/hardware partition, CPU selection and bus architecture explorationTLM provides the early software developing environment so that software/hardware co-design and co-verification in the early design stages is also possibleTLM also provides the “golden model” for hardware function verification
Hybrid abstraction level modeling and verification are possible so that the details of each module are added incrementally (module refinement)
System integration begins at the early design stages so that the potential problems can be found and solved earlier
-
35
From Specification to MicroFrom Specification to Micro--architecturearchitecture
ComputationCycle-timed
A: Specification model
B: Component-assembly model
C: Bus-arbitration model
D: Bus-functional model
E: Cycle-accuracy computation model
F: RTL model
B,C,D,E are TLMs
Communication
Un-timed
Un-timed
Approximate-timed
Cycle-timed
Approximate-timed
A B
C
D
E
F
System Specification
System Micro-architecture
How to do it?
-
36
Component AssemblyComponent AssemblyBased on the algorithm analyses, we perform the following tasksPartition the algorithm into SW/HW tasks
Select general purpose CPU or DSP based on the SW characteristicsChoose RTOS if necessary
Design IPs or select IPs from library according to the HW tasks Define the functionalities of each IPDefine the interfaces and the data to be exchanged between IPsEstimate functional delays in IPs
-
37
Communication ExplorationCommunication Exploration
Decision of interconnect structureBack-door connections or centralized buses
Assign bus-accessing properties for each IP (master or slave)
Decide the bus arbitration policy
Estimate functional delays in communication
-
38
Protocol RefinementProtocol Refinement
Inline abstract bus with protocol channelDetermine the pin- and cycle-accurate bus protocol
Work out the details of the bus control signals
Wrappers are used to bridge the models of different abstractions
Extract delays in communication from micro-architecture
-
39
IP RefinementIP Refinement
The IPs are refined to pin- and cycle-accuracy
Delays in computation are extracted from micro-architecture
The embedded SW is optimized to achieve real-time performance
-
40
IP ReplacementIP Replacement
Some IPs are modeled with pin- or cycle-accuracyThe IPs are replaced or refined one by one
Wrappers are used to bridge the models of different abstractions
-
41
Communication RefinementCommunication Refinement
Inline abstract bus with protocol channelDetermine the pin- and cycle-accurate bus protocol
Work out the details of the bus control signals
Extract delays in communication from micro-architecture
Wrappers are used to bridge the models of different abstractions
-
42
TLM Modeling ApproachesTLM Modeling Approaches
-
43
Two Fundamental Classes of TLMTwo Fundamental Classes of TLM
Untimed TLM (Programmer’s View, PV)Serve software programmers and verification engineers in early functional SW development and functional verification
Capture no information related to the micro-architecture of the component or IP-under-design
No timing information related to the micro-architecture
No interconnect topology and arbitration law
May have functional delay/timing from system specification
Timed TLM (Programmer’s View Plus Timing, PVT)Serve software programmers and architects for real-time embedded SW development and architectural analysis
Containing essential timing annotations for behavioral and communication specifications
-
44
Untimed Transaction LUntimed Transaction Leevel Modelingvel Modeling
-
45
Untimed TLM Untimed TLM –– Data Flow ModelingData Flow Modeling
There is no clock in an untimed TLMAbsolutely no timing information related to the micro-architecture
Conceptually all processes are executed concurrently Must ensure a correct behavior during the parallel executions
Must respect a certain degree of process execution order
Untimed “module” is suggested to have the following characteristics
Concurrent execution of independent processes
Causal dependencies between processes by system synchronization
Bit-true behavior
Register-accurate or bit-true interface for communication
-
46
Modeling of ComputationModeling of Computation
What are to be modeled?Internal computation at functional or behavioral level (primary efforts)
Input/Output of the block as well as its synchronization
What are NOT to be modeled?Micro-architectural implementation details
E.g., internal pipelines or structures
Modules representing hardware blocks or IPs are suggested to have the following characteristics
Bit-true (bit-accurate) behavior
Register-accurate interface
System synchronizations managed by the component
-
47
System SynchronizationSystem Synchronization
Untimed TLM must characterize the causal relation between its different processes to assure deterministic system behavior
Such dependencies are respected by explicit system synchronization
System synchronization only defines a partial order of executionAny execution order among the processes is permitted as long as the causal dependency are respected
TLM shall conform to functional specification with any legal interleaves of processes
How can one implement system synchronization in TLM?Event, signal, interrupt, polling, mailbox, or an abstract implementation that matches the considered level of abstraction
If any of the mechanisms causes a call to simulation kernel, it enables the scheduler to active the execution of other modules
-
48
Example of System SynchronizationExample of System Synchronization
P1, P2, and P4 are the 3 processes in a given system
Each process denotes a thread for a particular module
The full execution order within each of the 3 processes(a) P11->P12; (b) P21->P22 ; (c) P31->P32
Two occurrences of system synchronization between P1 and P2(d) P11->P22
(e) P22->P12
-
49
Example of System Synchronization (c. 1)Example of System Synchronization (c. 1)
The 3 processes can be executed with any order as long as the constraints from (a) to (e) are followed
P21->P11->P22->P12->P31->P32
P31->P32->P21->P11->P22->P12
P11->P21->P22->P31->P32->P12
-
50
Modeling of System SynchronizationModeling of System Synchronization
Emit synchronizationA process sends out a synchronization that may influence the behavior or state of other processes
Receive synchronizationA process suspend its execution and wait for an incoming event from the system that may influence its behavior or state
Reduce the number of receive synchronization can minimize the number of context switches compared to other modeling approach
-
51
Modeling of System Synchronization (c. 1)Modeling of System Synchronization (c. 1)Handle the constraints of process execution order without timing1. Active or resume a process2. Read input data for control flow and data processing3. Computation4. Write output data if there is any of them5. Return to Step 2 if more computation is required6. Synchronization
1. If it is “emit-synchronization”, return to step 22. If it is “receive-synchronization”, the process will be suspended
The process needs an update that might influence its own behavior
Activate
Behavioral Simulation
Sync.
Sequences of Events
-
52
Insertion of Functional DelaysInsertion of Functional DelaysAn untimed TLM IP may insert functional delays that are parts of the system specification
E.g., a LCD controller with 1/30-second refresh rate
Functional delays should never cause any system inconsistencyFunctional delay can suspend a process to induce the simulation kernel to choose other eligible processes for execution
Timeline
Functional Delay
Sync.
Activate
-
53
Recommendations of Untimed TLM ModelingRecommendations of Untimed TLM Modeling
Model functional specificationNo micro-architectural and clock-based information
Determine data granularity of models (modules) according to the algorithmic accuracy and the expected precision in transfers
E.g., the model of a video IP expecting frame-level input should be modeled with data granularity at frame level, despite the actualcapability of the interconnect in the silicon
TLM wrapper must generate correct memory addresses in case of a mismatch between data granularity and data layout in the memory
Model explicit system synchronization that affects IP behaviorEmploy events within a module for inter-process synchronizationUtilize synchronization protocols for inter-module synchronization
Communications between modules
-
54
Recommendations of Untimted TLM Modeling (c. 1)Recommendations of Untimted TLM Modeling (c. 1)
Model all sorts of communication interfaces at bit-accurate levelParticularly for register modeling
Model all sorts of behavior at bit-accurate level
Avoid implementing process activation based on a regular basisThe process activation based on system activity is compulsory
Ban uses of global variables
Reuse readily standalone C models as much as possibleC models should never be replicated as hardwire copies
They should be reused by means of wrapper or external function calls
-
55
Timed Transaction Level ModelingTimed Transaction Level Modeling
-
56
Timed TLM Timed TLM –– MicroMicro--architecture Modelingarchitecture Modeling
Timed TLM can determine a full order of process execution by specifying the delay between each activation and synchronization
A complete specification of implementation
Main objectivesBenchmark the performance of the micro-architecture
Fine tune the micro-architecture
Optimize the embedded SW to meet real time constraints
-
57
Modeling ApproachModeling Approach
Development of timed TLM must give considerations to the time consumption to the following two aspects
Computational delayThe time amount used to perform specific system behavior or function
Communication delayThe time amount consumed in accessing and transferring data
Physical constraints such as bus size, bus throughput, or memorysize must be considered for timed TLM development
Modeling tacticsAnnotated model
Standalone timed model
-
58
Annotated ModelAnnotated Model
Insert annotated timing delays into an untimed modelThe delay for each possible set of activation-synchronization in a process is defined based on the control flow of the component
The delay are the timing from the micro-architecture
The delay can be the values of the best, mean, or worst cases
Suitable if the structure of the untimed model matches the structure of a micro-architecture model
Annotations are simply wait statements related to the computation time of a specific functionality
Protect the timing annotations with preprocessing directives#ifdef ANNOTATED_MODEL
-
59
Standalone Timed ModelStandalone Timed Model
A detached model incorporated with the timing informationHigh-level analytical timing models without functional information
Timing behavior is modeled in such a way that delays are computed by executing the standalone timed model
Applicable on hardware IPs or processor models
Suitable when the structure of algorithm is very different from that of micro-architecture
Example: consider the example of modeling a video applicationIf modeled at the frame level, only those delays associated with decoding a frame can be annotated. However, the micro-architecture allows both the computation and communication to be interleaved
An untimed model may have multiple standalone timed models for investigating several micro-architecture scenarios
-
60
Standalone Timed Model Standalone Timed Model –– Concepts of OperationsConcepts of Operations
Standalone timed model can be controlled externally if the timing behavior of a component depends on its functional behavior
Behavioral simulation
Micro-architecture timing simulation with delays in both computation
and communication
Activate
Timeline