sistemi embedded - i parte (2011-2012).pdf

313
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 1: Introduction Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 2 Outline Embedded systems overview What are they? Design challenge – optimizing design metrics • Technologies Processor technologies IC technologies Design technologies

Upload: balu-v

Post on 03-Sep-2015

18 views

Category:

Documents


9 download

TRANSCRIPT

  • Embedded Systems Design: A Unified Hardware/Software Introduction

    1

    Chapter 1: Introduction

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    2

    Outline

    Embedded systems overview What are they?

    Design challenge optimizing design metrics Technologies

    Processor technologies IC technologies Design technologies

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    3

    Embedded systems overview

    Computing systems are everywhere Most of us think of desktop computers

    PCs Laptops Mainframes Servers

    But theres another type of computing system Far more common...

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    4

    Embedded systems overview

    Embedded computing systems Computing systems embedded within

    electronic devices Hard to define. Nearly any computing

    system other than a desktop computer Billions of units produced yearly, versus

    millions of desktop units Perhaps 50 per household and per

    automobile

    Computers are in here...

    and here...

    and even here...

    Lots more of these, though they cost a lot

    less each.

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    5

    A short list of embedded systems

    And the list goes on and on

    Anti-lock brakesAuto-focus camerasAutomatic teller machinesAutomatic toll systemsAutomatic transmissionAvionic systemsBattery chargersCamcordersCell phonesCell-phone base stationsCordless phonesCruise controlCurbside check-in systemsDigital camerasDisk drivesElectronic card readersElectronic instrumentsElectronic toys/gamesFactory controlFax machinesFingerprint identifiersHome security systemsLife-support systemsMedical testing systems

    ModemsMPEG decodersNetwork cardsNetwork switches/routersOn-board navigationPagersPhotocopiersPoint-of-sale systemsPortable video gamesPrintersSatellite phonesScannersSmart ovens/dishwashersSpeech recognizersStereo systemsTeleconferencing systemsTelevisionsTemperature controllersTheft tracking systemsTV set-top boxesVCRs, DVD playersVideo game consolesVideo phonesWashers and dryers

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    6

    Some common characteristics of embedded systems

    Single-functioned Executes a single program, repeatedly

    Tightly-constrained Low cost, low power, small, fast, etc.

    Reactive and real-time Continually reacts to changes in the systems environment Must compute certain results in real-time without delay

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    7

    An embedded system example -- a digital camera

    Microcontroller

    CCD preprocessor Pixel coprocessorA2D

    D2A

    JPEG codec

    DMA controller

    Memory controller ISA bus interface UART LCD ctrl

    Display ctrl

    Multiplier/Accum

    Digital camera chip

    lens

    CCD

    Single-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extent

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    8

    Design challenge optimizing design metrics

    Obvious design goal: Construct an implementation with desired functionality

    Key design challenge: Simultaneously optimize numerous design metrics

    Design metric A measurable feature of a systems implementation Optimizing design metrics is a key challenge

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    9

    Design challenge optimizing design metrics

    Common metrics Unit cost: the monetary cost of manufacturing each copy of the system,

    excluding NRE cost

    NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system

    Size: the physical space required by the system Performance: the execution time or throughput of the system Power: the amount of power consumed by the system Flexibility: the ability to change the functionality of the system without

    incurring heavy NRE cost

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    10

    Design challenge optimizing design metrics

    Common metrics (continued) Time-to-prototype: the time needed to build a working version of the

    system

    Time-to-market: the time required to develop a system to the point that it can be released and sold to customers

    Maintainability: the ability to modify the system after its initial release Correctness, safety, many more

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    11

    Design metric competition -- improving one may worsen others

    Expertise with both software and hardware is needed to optimize design metrics Not just a hardware or

    software expert, as is common A designer must be

    comfortable with various technologies in order to choose the best for a given application and constraints

    SizePerformance

    Power

    NRE cost

    Microcontroller

    CCD preprocessor Pixel coprocessorA2D D2A

    JPEG codec

    DMA controller

    Memory controller ISA bus interface UART LCD ctrl

    Display ctrl

    Multiplier/Accum

    Digital camera chip

    lens

    CCD

    Hardware

    Software

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    12

    Time-to-market: a demanding design metric

    Time required to develop a product to the point it can be sold to customers

    Market window Period during which the

    product would have highest sales

    Average time-to-market constraint is about 8 months

    Delays can be costly

    Rev

    enue

    s ($)

    Time (months)

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    13

    Losses due to delayed market entry

    Simplified revenue model Product life = 2W, peak at W Time of market entry defines a

    triangle, representing market penetration

    Triangle area equals revenue

    Loss The difference between the on-

    time and delayed triangle areasOn-time Delayedentry entry

    Peak revenue

    Peak revenue from delayed entry

    Market rise Market fall

    W 2W

    Time

    D

    On-time

    Delayed

    Rev

    enue

    s ($)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    14

    Losses due to delayed market entry (cont.)

    Area = 1/2 * base * height On-time = 1/2 * 2W * W Delayed = 1/2 * (W-D+W)*(W-D)

    Percentage revenue loss = (D(3W-D)/2W2)*100%

    Try some examples

    On-time Delayedentry entry

    Peak revenue

    Peak revenue from delayed entry

    Market rise Market fall

    W 2W

    Time

    D

    On-time

    Delayed

    Rev

    enue

    s ($)

    Lifetime 2W=52 wks, delay D=4 wks (4*(3*26 4)/2*26^2) = 22% Lifetime 2W=52 wks, delay D=10 wks (10*(3*26 10)/2*26^2) = 50% Delays are costly!

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    15

    NRE and unit cost metrics

    Costs: Unit cost: the monetary cost of manufacturing each copy of the system,

    excluding NRE cost NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of

    designing the system total cost = NRE cost + unit cost * # of units per-product cost = total cost / # of units

    = (NRE cost / # of units) + unit cost Example

    NRE=$2000, unit=$100 For 10 units

    total cost = $2000 + 10*$100 = $3000 per-product cost = $2000/10 + $100 = $300

    Amortizing NRE cost over the units results in an additional $200 per unit

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    16

    NRE and unit cost metrics

    $0

    $40,000

    $80,000

    $120,000

    $160,000

    $200,000

    0 800 1600 2400

    ABC

    $0

    $40

    $80

    $120

    $160

    $200

    0 800 1600 2400

    Number of units (volume)

    ABC

    Number of units (volume)

    tota

    l cos

    t (x1

    000)

    per

    pro

    duc

    t cos

    t

    Compare technologies by costs -- best depends on quantity Technology A: NRE=$2,000, unit=$100 Technology B: NRE=$30,000, unit=$30 Technology C: NRE=$100,000, unit=$2

    But, must also consider time-to-market

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    17

    The performance design metric

    Widely-used measure of system, widely-abused Clock frequency, instructions per second not good measures Digital camera example a user cares about how fast it processes images, not

    clock speed or instructions per second

    Latency (response time) Time between task start and end e.g., Cameras A and B process images in 0.25 seconds

    Throughput Tasks per second, e.g. Camera A processes 4 images per second Throughput can be more than latency seems to imply due to concurrency, e.g.

    Camera B may process 8 images per second (by capturing a new image while previous image is being stored).

    Speedup of B over A = Bs performance / As performance Throughput speedup = 8/4 = 2

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    18

    Three key embedded system technologies

    Technology A manner of accomplishing a task, especially using technical

    processes, methods, or knowledge Three key technologies for embedded systems

    Processor technology IC technology Design technology

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    19

    Processor technology

    The architecture of the computation engine used to implement a systems desired functionality

    Processor does not have to be programmable Processor not equal to general-purpose processor

    Application-specific

    Registers

    CustomALU

    DatapathController

    Program memory

    Assembly code for:

    total = 0for i =1 to

    Control logic and State register

    Datamemory

    IR PC

    Single-purpose (hardware)

    DatapathController

    Controllogic

    State register

    Datamemory

    index

    total

    +

    IR PC

    Registerfile

    GeneralALU

    DatapathController

    Program memory

    Assembly code for:

    total = 0for i =1 to

    Control logic and

    State register

    Datamemory

    General-purpose

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    20

    Processor technology

    Processors vary in their customization for the problem at hand

    total = 0for i = 1 to N loop total += M[i]end loop

    General-purpose processor

    Single-purpose processor

    Application-specific processor

    Desired functionality

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    21

    General-purpose processors

    Programmable device used in a variety of applications Also known as microprocessor

    Features Program memory General datapath with large register file and

    general ALU User benefits

    Low time-to-market and NRE costs High flexibility

    Pentium the most well-known, but there are hundreds of others

    IR PC

    Registerfile

    GeneralALU

    DatapathController

    Program memory

    Assembly code for:

    total = 0for i =1 to

    Control logic and

    State register

    Datamemory

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    22

    Single-purpose processors

    Digital circuit designed to execute exactly one program a.k.a. coprocessor, accelerator or peripheral

    Features Contains only the components needed to

    execute a single program No program memory

    Benefits Fast Low power Small size

    DatapathController

    Control logic

    State register

    Datamemory

    index

    total

    +

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    23

    Application-specific processors

    Programmable processor optimized for a particular class of applications having common characteristics Compromise between general-purpose and

    single-purpose processors

    Features Program memory Optimized datapath Special functional units

    Benefits Some flexibility, good performance, size and

    power

    IR PC

    Registers

    CustomALU

    DatapathController

    Program memory

    Assembly code for:

    total = 0for i =1 to

    Control logic and

    State register

    Datamemory

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    24

    IC technology

    The manner in which a digital (gate-level) implementation is mapped onto an IC IC: Integrated circuit, or chip IC technologies differ in their customization to a design ICs consist of numerous layers (perhaps 10 or more)

    IC technologies differ with respect to who builds each layer and when

    source drainchanneloxidegate

    Silicon substrate

    IC package IC

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    25

    IC technology

    Three types of IC technologies Full-custom/VLSI Semi-custom ASIC (gate array and standard cell) PLD (Programmable Logic Device)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    26

    Full-custom/VLSI

    All layers are optimized for an embedded systems particular digital implementation Placing transistors Sizing transistors Routing wires

    Benefits Excellent performance, small size, low power

    Drawbacks High NRE cost (e.g., $300k), long time-to-market

    Har

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    27

    Semi-custom

    Lower layers are fully or partially built Designers are left with routing of wires and maybe placing

    some blocks Benefits

    Good performance, good size, less NRE cost than a full-custom implementation (perhaps $10k to $100k)

    Drawbacks Still require weeks to months to develop

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    28

    PLD (Programmable Logic Device)

    All layers already exist Designers can purchase an IC Connections on the IC are either created or destroyed to

    implement desired functionality Field-Programmable Gate Array (FPGA) very popular

    Benefits Low NRE costs, almost instant IC availability

    Drawbacks Bigger, expensive (perhaps $30 per unit), power hungry,

    slower

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    29

    Moores law

    The most important trend in embedded systems Predicted in 1965 by Intel co-founder Gordon MooreIC transistor capacity has doubled roughly every 18 months

    for the past several decades10,000

    1,000

    100

    10

    1

    0.1

    0.01

    0.001

    Logic transistors per chip

    (in millions)

    Note: logarithmic scale

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    30

    Moores law

    Wow This growth rate is hard to imagine, most people

    underestimate How many ancestors do you have from 20 generations ago

    i.e., roughly how many people alive in the 1500s did it take to make you?

    220 = more than 1 million people (This underestimation is the key to pyramid schemes!)

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    31

    Graphical illustration of Moores law

    1981 1984 1987 1990 1993 1996 1999 2002

    Leading edgechip in 1981

    10,000transistors

    Leading edgechip in 2002

    150,000,000transistors

    Something that doubles frequently grows more quickly than most people realize! A 2002 chip can hold about 15,000 1981 chips inside itself

    b

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    32

    Design Technology

    The manner in which we convert our concept of desired system functionality into an implementation

    Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level.

    Systemspecification

    Behavioralspecification

    RTspecification

    Logicspecification

    To final implementation

    Compilation/Synthesis:Automates exploration and insertion of implementation details for lower level.

    Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels.

    Compilation/Synthesis

    Libraries/IP

    Test/Verification

    Systemsynthesis

    Behaviorsynthesis

    RTsynthesis

    Logicsynthesis

    Hw/Sw/OS

    Cores

    RTcomponents

    Gates/Cells

    Model simulat./checkers

    Hw-Swcosimulators

    HDL simulators

    Gatesimulators

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    33

    Design productivity exponential increase

    Exponential increase over the past few decades

    100,000

    10,000

    1,000

    100

    10

    1

    0.1

    0.01

    1983

    1987

    1989

    1991

    1993

    1985

    1995

    1997

    1999

    2001

    2003

    2005

    2007

    2009

    Prod

    uctiv

    ity(K

    ) Tra

    ns./S

    taff

    M

    o.

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    34

    The co-design ladder

    In the past: Hardware and software

    design technologies were very different

    Recent maturation of synthesis enables a unified view of hardware and software

    Hardware/software codesign Implementation

    Assembly instructions

    Machine instructions

    Register transfers

    Compilers(1960's,1970's)

    Assemblers, linkers(1950's, 1960's)

    Behavioral synthesis(1990's)

    RT synthesis(1980's, 1990's)

    Logic synthesis(1970's, 1980's)

    Microprocessor plus program bits: software

    VLSI, ASIC, or PLD implementation: hardware

    Logic gates

    Logic equations / FSM's

    Sequential program code (e.g., C, VHDL)

    The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no

    fundamental difference between what hardware or software can implement.

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    35

    Independence of processor and IC technologies

    Basic tradeoff General vs. custom With respect to processor technology or IC technology The two technologies are independent

    General-purpose

    processorASIP

    Single-purpose

    processor

    Semi-customPLD Full-custom

    General,providing improved:

    Customized, providing improved:

    Power efficiencyPerformance

    SizeCost (high volume)

    FlexibilityMaintainability

    NRE costTime- to-prototype

    Time-to-marketCost (low volume)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    36

    Design productivity gap

    While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity

    10,000

    1,000

    100

    10

    1

    0.1

    0.01

    0.001

    Logic transistors per chip

    (in millions)

    100,000

    10,000

    1000

    100

    10

    1

    0.1

    0.01

    Productivity(K) Trans./Staff-Mo.IC capacity

    productivity

    Gap

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    37

    Design productivity gap

    1981 leading edge chip required 100 designer months 10,000 transistors / 100 transistors/month

    2002 leading edge chip requires 30,000 designer months 150,000,000 / 5000 transistors/month

    Designer cost increase from $1M to $300M

    10,0001,000

    10010

    10.1

    0.010.001

    Logic transistors per chip

    (in millions)

    100,00010,00010001001010.10.01

    Productivity(K) Trans./Staff-Mo.IC capacity

    productivity

    Gap

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    38

    The mythical man-month

    The situation is even worse than the productivity gap indicates In theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team management

    and communication In the software community, known as the mythical man-month (Brooks 1975) At some point, can actually lengthen project completion time! (Too many cooks)

    10 20 30 400

    100002000030000400005000060000

    43

    24

    1916 15 16

    18

    23

    Team

    Individual

    Months until completion

    Number of designers

    1M transistors, 1designer=5000 trans/month

    Each additional designer reduces for 100 trans/month

    So 2 designers produce 4900 trans/month each

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    39

    Summary

    Embedded systems are everywhere Key challenge: optimization of design metrics

    Design metrics compete with one another

    A unified view of hardware and software is necessary to improve productivity

    Three key technologies Processor: general-purpose, application-specific, single-purpose IC: Full-custom, semi-custom, PLD Design: Compilation/synthesis, libraries/IP, test/verification

    Embedded Systems Design: A Unified Hardware/Software Introduction

    1

    Chapter 10: IC Technology

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    2

    Outline

    Anatomy of integrated circuits

    Full-Custom (VLSI) IC Technology

    Semi-Custom (ASIC) IC Technology

    Programmable Logic Device (PLD) IC Technology

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    3

    CMOS transistor

    Source, Drain Diffusion area where electrons can flow Can be connected to metal contacts (vias)

    Gate Polysilicon area where control voltage is applied

    Oxide Si O2 Insulator so the gate voltage cant leak

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    4

    End of the Moores Law?

    Every dimension of the MOSFET has to scale (PMOS) Gate oxide has to scale down to

    Increase gate capacitance Reduce leakage current from S to D Pinch off current from source to drain

    Current gate oxide thickness is about 2.5-3nm Thats about 25 atoms!!!

    source drainoxidegate

    IC package IC channel

    Silicon substrate

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    5

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    6

    20Ghz +

    FinFET has been manufactured to 18nm Still acts as a very good transistor

    Simulation shown that it can be scaled to 10nm Quantum effect start to kick in

    Reduce mobility by ~10% Ballistic transport become significant

    Increase current by about ~20%

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    7

    NAND

    Metal layers for routing (~10) PMOS dont like 0 NMOS dont like 1 A stick diagram form the basis for mask sets

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    8

    Silicon manufacturing steps

    Tape out Send design to manufacturing

    Spin One time through the manufacturing process

    Photolithography Drawing patterns by using photoresist to form barriers for deposition

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    9

    Full Custom

    Very Large Scale Integration (VLSI) Placement

    Place and orient transistors Routing

    Connect transistors Sizing

    Make fat, fast wires or thin, slow wires May also need to size buffer

    Design Rules simple rules for correct circuit function

    Metal/metal spacing, min poly width

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    10

    Full Custom

    Best size, power, performance Hand design

    Horrible time-to-market/flexibility/NRE cost Reserve for the most important units in a processor

    ALU, Instruction fetch

    Physical design tools Less optimal, but faster

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    11

    Semi-Custom

    Gate Array Array of prefabricated gates place and route Higher density, faster time-to-market Does not integrate as well with full-custom

    Standard Cell A library of pre-designed cell Place and route Lower density, higher complexity Integrate great with full-custom

    d

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    12

    Semi-Custom

    Most popular design style

    Jack of all trade Good

    Power, time-to-market, performance, NRE cost, per-unit cost, area

    Master of none Integrate with full custom for

    critical regions of design

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    13

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    14

    Programmable Logic Device

    Programmable Logic Device Programmable Logic Array, Programmable Array Logic, Field Programmable

    Gate Array All layers already exist

    Designers can purchase an IC To implement desired functionality

    Connections on the IC are either created or destroyed to implement Benefits

    Very low NRE costs Great time to market

    Drawback High unit cost, bad for large volume Power

    Except special PLA slower

    800-6400 usable gates5-15 ns delay, up to 125 MHzFew $s price (2004)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    15

    Xilinx FPGA

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    16

    Configurable Logic Block (CLB)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    17

    I/O Block

  • Embedded Systems Design: A Unified Hardware/Software Introduction

    1

    Chapter 8: State Machine and Concurrent Process Model

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 2

    Outline

    Models vs. Languages State Machine Model

    FSM/FSMD HCFSM and Statecharts Language Program-State Machine (PSM) Model

    Concurrent Process Model Communication Synchronization Implementation

    Dataflow Model Real-Time Systems

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 3

    Describing embedded systems processing behavior Can be extremely difficult

    Complexity increasing with increasing IC capacity Past: washing machines, small games, etc.

    Hundreds of lines of code Today: TV set-top boxes, Cell phone, etc.

    Hundreds of thousands of lines of code Desired behavior often not fully understood in beginning

    Many implementation bugs due to description mistakes/omissions

    English (or other natural language) common starting point Precise description difficult to impossible Example: Motor Vehicle Code thousands of pages long...

    Introduction

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 4

    An example of trying to be precise in English

    California Vehicle Code Right-of-way of crosswalks

    21950. (a) The driver of a vehicle shall yield the right-of-way to a pedestrian crossing the roadway within any marked crosswalk or within any unmarked crosswalk at an intersection, except as otherwise provided in this chapter.

    (b) The provisions of this section shall not relieve a pedestrian from the duty of using due care for his or her safety. No pedestrian shall suddenly leave a curb or other place of safety and walk or run into the path of a vehicle which is so close as to constitute an immediate hazard. No pedestrian shall unnecessarily stop or delay traffic while in a marked or unmarked crosswalk.

    (c) The provisions of subdivision (b) shall not relieve a driver of a vehicle from the duty of exercising due care for the safety of any pedestrian within any marked crosswalk or within any unmarked crosswalk at an intersection.

    All that just for crossing the street (and theres much more)!

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 5

    Models and languages

    How can we (precisely) capture behavior? We may think of languages (C, C++), but computation model is the key

    Common computation models: Sequential program model

    Statements, rules for composing statements, semantics for executing them Communicating process model

    Multiple sequential programs running concurrently State machine model

    For control dominated systems, monitors control inputs, sets control outputs Dataflow model

    For data dominated systems, transforms input data streams into output streams Object-oriented model

    For breaking complex software into simpler, well-defined pieces

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 6

    Models vs. languages

    Computation models describe system behavior Conceptual notion, e.g., recipe, sequential program

    Languages capture models Concrete form, e.g., English, C

    Variety of languages can capture one model E.g., sequential program model C,C++, Java

    One language can capture variety of models E.g., C++ -oriented model, state machine model

    Certain languages better at capturing certain computation models

    Models

    Languages

    Recipe

    SpanishEnglish Japanese

    Poetry Story Sequent.program

    C++C Java

    Statemachine

    Data-flow

    Recipes vs. English Sequential programs vs. C

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 7

    Text versus Graphics

    Models versus languages not to be confused with text versus graphics Text and graphics are just two types of languages

    Text: letters, numbers Graphics: circles, arrows (plus some letters, numbers)

    X = 1;

    Y = X + 1;

    X = 1

    Y = X + 1

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 8

    Introductory example: An elevator controller

    Simple elevator controller Request Resolver

    resolves various floor requests into single requested floor

    Unit Control moves elevator to this requested floor

    Try capturing in C...

    Move the elevator either up or down to reach the requested floor. Once at the requested floor, open the door for at least 10 seconds, and keep it open until the requested floor changes. Ensure the door is never open while moving. Dont change directions unless there are no higher requests when moving up or no lower requests when moving down

    Partial English description

    buttonsinside

    elevator

    UnitControl

    b1

    down

    open

    floor

    ...

    RequestResolver

    ...

    up/downbuttons on

    eachfloor

    b2bN

    up1up2dn2

    dnN

    req

    up

    System interface

    up3dn3

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 9

    Elevator controller using a sequential program model

    Move the elevator either up or down to reach the requested floor. Once at the requested floor, open the door for at least 10 seconds, and keep it open until the requested floor changes. Ensure the door is never open while moving. Dont change directions unless there are no higher requests when moving up or no lower requests when moving down

    Partial English description

    buttonsinside

    elevator

    UnitControl

    b1

    down

    open

    floor

    ...

    RequestResolver

    ...

    up/downbuttons on

    eachfloor

    b2bN

    up1up2dn2

    dnN

    req

    up

    System interface

    up3dn3

    Sequential program model

    void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); up = down = 0; open = 1; delay(10); }}

    void RequestResolver() { while (1) ... req = ... ...}void main() { Call concurrently: UnitControl() and RequestResolver()}

    Inputs: int floor; bit b1..bN; up1..upN-1; dn2..dnN;Outputs: bit up, down, open;Global variables: int req;

    You might have come up with something having even more if statements.

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 10

    Finite-state machine (FSM) model

    Trying to capture this behavior as sequential program is a bit awkward

    Instead, we might consider an FSM model, describing the system as: Possible states

    E.g., Idle, GoingUp, GoingDn, DoorOpen Possible transitions from one state to another based on input

    E.g., req > floor Actions that occur in each state

    E.g., In the GoingUp state, u,d,o,t = 1,0,0,0 (up = 1, down, open, and timer_start = 0)

    Try it...

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 11

    Finite-state machine (FSM) model

    Idle

    GoingUp

    req > floor

    req < floor

    !(req > floor)

    !(timer < 10)

    req < floor

    DoorOpen

    GoingDn

    req > floor

    u,d,o, t = 1,0,0,0

    u,d,o,t = 0,0,1,0

    u,d,o,t = 0,1,0,0

    u,d,o,t = 0,0,1,1

    u is up, d is down, o is open

    req == floor

    !(req

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 13

    Finite-state machine with datapath model (FSMD)

    FSMD extends FSM: complex data types and variables for storing data FSMs use only Boolean data types and operations, no variables

    FSMD: 7-tuple S is a set of states {s0, s1, , sl} I is a set of inputs {i0, i1, , im} O is a set of outputs {o0, o1, , on} V is a set of variables {v0, v1, , vn} F is a next-state function (S x I x V S) H is an action function (S O + V) s0 is an initial state

    I,O,V may represent complex data types (i.e., integers, floating point, etc.) F,H may include arithmetic operations H is an action function, not just an output function

    Describes variable updates as well as outputs Complete system state now consists of current state, si, and values of all variables

    Idle

    GoingUp

    req > floor

    req < floor

    !(req > floor)

    !(timer < 10)

    req < floor

    DoorOpen

    GoingDn

    req > floor

    u,d,o, t = 1,0,0,0

    u,d,o,t = 0,0,1,0

    u,d,o,t = 0,1,0,0

    u,d,o,t = 0,0,1,1

    u is up, d is down, o is open

    req == floor

    !(req floor

    !(req > floor) u,d,o, t = 1,0,0,0

    u,d,o,t = 0,0,1,0

    u,d,o,t = 0,1,0,0

    u,d,o,t = 0,0,1,1

    u is up, d is down, o is open

    req < floor

    req > floor

    req == floor

    req < floor

    !(req

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 15

    State machine vs. sequential program model

    Different thought process used with each model State machine:

    Encourages designer to think of all possible states and transitions among states based on all possible input conditions

    Sequential program model: Designed to transform data through series of instructions that may be iterated and

    conditionally executed State machine description excels in many cases

    More natural means of computing in those cases Not due to graphical representation (state diagram)

    Would still have same benefits if textual language used (i.e., state table) Besides, sequential program model could use graphical representation (i.e., flowchart)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 16

    Try Capturing Other Behaviors with an FSM

    E.g., Answering machine blinking light when there are messages

    E.g., A simple telephone answering machine that answers after 4 rings when activated

    E.g., A simple crosswalk traffic control light Others

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 17

    Capturing state machines in sequential programming language

    Despite benefits of state machine model, most popular development tools use sequential programming language

    C, C++, Java, Ada, VHDL, Verilog, etc. Development tools are complex and expensive, therefore not easy to adapt or replace

    Must protect investment

    Two approaches to capturing state machine model with sequential programming language

    Front-end tool approach Additional tool installed to support state machine language

    Graphical and/or textual state machine languages May support graphical simulation Automatically generate code in sequential programming language that is input to main development tool

    Drawback: must support additional tool (licensing costs, upgrades, training, etc.) Language subset approach

    Most common approach...

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 18

    Language subset approach

    Follow rules (template) for capturing state machine constructs in equivalent sequential language constructs

    Used with software (e.g.,C) and hardware languages (e.g.,VHDL)

    Capturing UnitControl state machine in C

    Enumerate all states (#define) Declare state variable initialized to

    initial state (IDLE) Single switch statement branches to

    current states case Each case has actions

    up, down, open, timer_start Each case checks transition conditions

    to determine next state if() {state = ;}

    #define IDLE0#define GOINGUP1#define GOINGDN2#define DOOROPEN3void UnitControl() {

    int state = IDLE;while (1) {

    switch (state) { IDLE: up=0; down=0; open=1; timer_start=0; if (req==floor) {state = IDLE;} if (req > floor) {state = GOINGUP;} if (req < floor) {state = GOINGDN;} break; GOINGUP: up=1; down=0; open=0; timer_start=0; if (req > floor) {state = GOINGUP;} if (!(req>floor)) {state = DOOROPEN;} break; GOINGDN: up=1; down=0; open=0; timer_start=0; if (req < floor) {state = GOINGDN;} if (!(req

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 19

    General template

    #define S0 0#define S1 1...#define SN Nvoid StateMachine() {

    int state = S0; // or whatever is the initial state.while (1) {

    switch (state) {S0:

    // Insert S0s actions here & Insert transitions Ti leaving S0:if( T0s condition is true ) {state = T0s next state; /*actions*/ }if( T1s condition is true ) {state = T1s next state; /*actions*/ }...if( Tms condition is true ) {state = Tms next state; /*actions*/ }break;

    S1:// Insert S1s actions here// Insert transitions Ti leaving S1break;

    ...SN:

    // Insert SNs actions here// Insert transitions Ti leaving SNbreak;

    }}

    }

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 20

    HCFSM and the Statecharts language

    Hierarchical/concurrent state machine model (HCFSM)

    Extension to state machine model to support hierarchy and concurrency

    States can be decomposed into another state machine

    With hierarchy has identical functionality as Without hierarchy, but has one less transition (z)

    Known as OR-decomposition States can execute concurrently

    Known as AND-decomposition

    Statecharts Graphical language to capture HCFSM timeout: transition with time limit as condition history: remember last substate OR-decomposed

    state A was in before transitioning to another state B Return to saved substate of A when returning from B

    instead of initial state

    A1 z

    B

    A2 z

    x y w

    Without hierarchy

    A1 z

    B

    A2

    x y

    A

    w

    With hierarchy

    C1

    C2

    x y

    CB

    D1

    D2

    u v

    D

    Concurrency

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 21

    UnitControl with FireMode

    FireMode When fire is true, move elevator

    to 1st floor and open door

    Without hierarchy

    Idle

    GoingUp

    req>floor

    reqfloor)

    timeout(10)

    reqfloor

    u,d,o = 1,0,0

    u,d,o = 0,0,1

    u,d,o = 0,1,0

    req==floor!(req1

    u,d,o = 0,1,0

    u,d,o = 0,0,1

    !fire

    FireDrOpenfloor==1

    fire

    u,d,o = 0,0,1

    UnitControl

    fire!fire FireGoingDn

    floor>1

    u,d,o = 0,1,0

    FireDrOpenfloor==1

    fire

    FireMode

    u,d,o = 0,0,1

    With hierarchy

    Idle

    GoingUp

    req>floor

    reqfloor)

    timeout(10)

    reqfloor

    u,d,o = 1,0,0

    u,d,o = 0,0,1

    u,d,o = 0,1,0

    req==floor!(req>floor)

    u,d,o = 0,0,1

    NormalModeUnitControl

    NormalMode

    FireMode

    fire!fire

    UnitControl

    ElevatorController

    RequestResolver

    ...

    With concurrent RequestResolver

    w/o hierarchy: Getting messy! w/ hierarchy: Simple!

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 22

    Program-state machine model (PSM): HCFSM plus sequential program model

    Program-states actions can be FSM or sequential program

    Designer can choose most appropriate Stricter hierarchy than HCFSM used in

    Statecharts transition between sibling states only, single entry Program-state may complete

    Reaches end of sequential program code, OR FSM transition to special complete substate PSM has 2 types of transitions

    Transition-immediately (TI): taken regardless of source program-state

    Transition-on-completion (TOC): taken only if condition is true AND source program-state is complete

    SpecCharts: extension of VHDL to capture PSM model

    SpecC: extension of C to capture PSM model

    up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } }

    NormalMode

    FireModeup = 0; down = 1; open = 0;while (floor > 1);up = 0; down = 0; open = 1;

    fire!fire

    UnitControl

    ElevatorController

    RequestResolver

    ...req = ...

    ...

    int req;

    NormalMode and FireMode described as sequential programs

    Black square originating within FireModeindicates !fire is a TOC transition

    Transition from FireMode to NormalModeonly after FireMode completed

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 23

    Role of appropriate model and language

    Finding appropriate model to capture embedded system is an important step Model shapes the way we think of the system

    Originally thought of sequence of actions, wrote sequential program First wait for requested floor to differ from target floor Then, we close the door Then, we move up or down to the desired floor Then, we open the door Then, we repeat this sequence

    To create state machine, we thought in terms of states and transitions among states When system must react to changing inputs, state machine might be best model

    HCFSM described FireMode easily, clearly

    Language should capture model easily Ideally should have features that directly capture constructs of model FireMode would be very complex in sequential program

    Checks inserted throughout code Other factors may force choice of different model

    Structured techniques can be used instead E.g., Template for state machine capture in sequential program language

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 24

    Concurrent process model

    Describes functionality of system in terms of two or more concurrently executing subtasks

    Many systems easier to describe with concurrent process model because inherently multitasking

    E.g., simple example: Read two numbers X and Y Display Hello world. every X seconds Display How are you? every Y seconds

    More effort would be required with sequential program or state machine model

    Subroutine execution over time

    time

    ReadX ReadYPrintHelloWorld

    PrintHowAreYou

    Simple concurrent process example

    ConcurrentProcessExample() {x = ReadX()y = ReadY()Call concurrently:PrintHelloWorld(x) and PrintHowAreYou(y)

    }PrintHelloWorld(x) {

    while( 1 ) {print "Hello world."delay(x);

    }}PrintHowAreYou(x) {

    while( 1 ) {print "How are you?" delay(y);

    }}

    Sample input and output

    Enter X: 1Enter Y: 2Hello world. (Time = 1 s)Hello world. (Time = 2 s)How are you? (Time = 2 s)Hello world. (Time = 3 s)How are you? (Time = 4 s)Hello world. (Time = 4 s)...

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 25

    Dataflow model

    Derivative of concurrent process model Nodes represent transformations

    May execute concurrently Edges represent flow of tokens (data) from one node to another

    May or may not have token at any given time When all of nodes input edges have at least one token, node may

    fire When node fires, it consumes input tokens processes

    transformation and generates output token Nodes may fire simultaneously Several commercial tools support graphical languages for capture

    of dataflow model Can automatically translate to concurrent process model for

    implementation Each node becomes a process

    modulate convolve

    transform

    A B C D

    Z

    Nodes with more complex transformations

    t1 t2

    +

    *

    A B C D

    Z

    Nodes with arithmetic transformations

    t1 t2

    Z = (A + B) * (C - D)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 26

    Synchronous dataflow

    With digital signal-processors (DSPs), data flows at fixed rate Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this

    Each edge labeled with number of tokens consumed/produced each firing

    Can statically schedule nodes, so can easily use sequential program model

    Dont need real-time operating system and its overhead How would you map this model to a sequential programming

    language? Try it... Algorithms developed for scheduling nodes into single-

    appearance schedules Only one statement needed to call each nodes associated

    procedure Allows procedure inlining without code explosion, thus reducing

    overhead even more

    modulate convolve

    transform

    A B C D

    Z

    Synchronous dataflow

    mt1 ct2

    mA mB mC mD

    tZ

    tt1 tt2

    t1 t2

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 27

    Concurrent processes and real-time systems

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 28

    Concurrent processes

    Consider two examples having separate tasks running independently but sharing data

    Difficult to write system using sequential program model

    Concurrent process model easier

    Separate sequential programs (processes) for each task

    Programs communicate with each other

    Heartbeat Monitoring System

    B[1..4]

    Heart-beat pulse

    Task 1:Read pulseIf pulse < Lo then Activate SirenIf pulse > Hi then Activate SirenSleep 1 secondRepeat

    Task 2:If B1/B2 pressed then Lo = Lo +/ 1If B3/B4 pressed then Hi = Hi +/ 1Sleep 500 msRepeat

    Set-top Box

    Input Signal

    Task 1:Read SignalSeparate Audio/VideoSend Audio to Task 2Send Video to Task 3Repeat

    Task 2:Wait on Task 1Decode/output AudioRepeat

    Task 3:Wait on Task 1Decode/output VideoRepeat

    Video

    Audio

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 29

    Process

    A sequential program, typically an infinite loop Executes concurrently with other processes We are about to enter the world of concurrent programming

    Basic operations on processes Create and terminate

    Create is like a procedure call but caller doesnt wait Created process can itself create new processes

    Terminate kills a process, destroying all data In HelloWord/HowAreYou example, we only created processes

    Suspend and resume Suspend puts a process on hold, saving state for later execution Resume starts the process again where it left off

    Join A process suspends until a particular child process finishes execution

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 30

    Communication among processes

    Processes need to communicate data and signals to solve their computation problem Processes that dont communicate are just

    independent programs solving separate problems Basic example: producer/consumer

    Process A produces data items, Process B consumes them

    E.g., A decodes video packets, B display decoded packets on a screen

    How do we achieve this communication? Two basic methods

    Shared memory Message passing

    processA() {// Decode packet// Communicate packet

    to B}

    }

    void processB() {// Get packet from A// Display packet

    }

    Encoded video packets

    Decoded video packets

    To display

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 31

    Shared Memory

    Processes read and write shared variables No time overhead, easy to implement But, hard to use mistakes are common

    Example: Producer/consumer with a mistake Share buffer[N], count

    count = # of valid data items in buffer

    processA produces data items and stores in buffer If buffer is full, must wait

    processB consumes data items from buffer If buffer is empty, must wait

    Error when both processes try to update count concurrently (lines 10 and 19) and the following execution sequence occurs. Say count is 3.

    A loads count (count = 3) from memory into register R1 (R1 = 3) A increments R1 (R1 = 4) B loads count (count = 3) from memory into register R2 (R2 = 3) B decrements R2 (R2 = 2) A stores R1 back to count in memory (count = 4) B stores R2 back to count in memory (count = 2)

    count now has incorrect value of 2

    mb

    01: data_type buffer[N];02: int count = 0;03: void processA() {04: int i;05: while( 1 ) {06: produce(&data);07: while( count == N );/*loop*/08: buffer[i] = data;09: i = (i + 1) % N;10: count = count + 1;11: }12: }13: void processB() {14: int i;15: while( 1 ) {16: while( count == 0 );/*loop*/17: data = buffer[i];18: i = (i + 1) % N;19: count = count - 1;20: consume(&data);21: }22: }23: void main() {24: create_process(processA); 25: create_process(processB);26: }

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 32

    Message Passing

    Message passing Data explicitly sent from one process to

    another Sending process performs special operation,

    send Receiving process must perform special

    operation, receive, to receive the data Both operations must explicitly specify which

    process it is sending to or receiving from Receive is blocking, send may or may not be

    blocking Safer model, but less flexible

    void processA() {while( 1 ) {

    produce(&data)send(B, &data);/* region 1 */receive(B, &data);consume(&data);

    }}

    void processB() {while( 1 ) {receive(A, &data);transform(&data)send(A, &data);/* region 2 */

    }}

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 33

    Back to Shared Memory: Mutual Exclusion

    Certain sections of code should not be performed concurrently Critical section

    Possibly noncontiguous section of code where simultaneous updates, by multiple processes to a shared memory location, can occur

    When a process enters the critical section, all other processes must be locked out until it leaves the critical section

    Mutex A shared object used for locking and unlocking segment of shared data Disallows read/write access to memory it guards Multiple processes can perform lock operation simultaneously, but only one process

    will acquire lock All other processes trying to obtain lock will be put in blocked state until unlock

    operation performed by acquiring process when it exits critical section These processes will then be placed in runnable state and will compete for lock again

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 34

    Correct Shared Memory Solution to the Consumer-Producer Problem

    The primitive mutex is used to ensure critical sections are executed in mutual exclusion of each other

    Following the same execution sequence as before: A/B execute lock operation on count_mutex Either A or B will acquire lock

    Say B acquires it A will be put in blocked state

    B loads count (count = 3) from memory into register R2 (R2 = 3)

    B decrements R2 (R2 = 2) B stores R2 back to count in memory (count = 2) B executes unlock operation

    A is placed in runnable state again A loads count (count = 2) from memory into register R1 (R1

    = 2) A increments R1 (R1 = 3) A stores R1 back to count in memory (count = 3)

    Count now has correct value of 3

    01: data_type buffer[N];02: int count = 0;03: mutex count_mutex;04: void processA() {05: int i;06: while( 1 ) {07: produce(&data);08: while( count == N );/*loop*/09: buffer[i] = data;10: i = (i + 1) % N;11: count_mutex.lock();12: count = count + 1;13: count_mutex.unlock();14: }15: }16: void processB() {17: int i;18: while( 1 ) {19: while( count == 0 );/*loop*/20: data = buffer[i];21: i = (i + 1) % N;22: count_mutex.lock();23: count = count - 1;24: count_mutex.unlock();25: consume(&data);26: }27: }28: void main() {29: create_process(processA); 30: create_process(processB);31: }

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 35

    Process Communication

    Try modeling req value of our elevator controller Using shared memory Using shared memory and mutexes Using message passing buttonsinside

    elevator

    UnitControl

    b1

    down

    open

    floor

    ...

    RequestResolver

    ...

    up/downbuttons on

    eachfloor

    b2bN

    up1up2dn2

    dnN

    req

    up

    System interface

    up3dn3

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 36

    A Common Problem in Concurrent Programming: Deadlock

    Deadlock: A condition where 2 or more processes are blocked waiting for the other to unlock critical sections of code

    Both processes are then in blocked state Cannot execute unlock operation so will wait forever

    Example code has 2 different critical sections of code that can be accessed simultaneously

    2 locks needed (mutex1, mutex2) Following execution sequence produces deadlock

    A executes lock operation on mutex1 (and acquires it) B executes lock operation on mutex2( and acquires it) A/B both execute in critical sections 1 and 2, respectively A executes lock operation on mutex2

    A blocked until B unlocks mutex2 B executes lock operation on mutex1

    B blocked until A unlocks mutex1 DEADLOCK!

    One deadlock elimination protocol requires locking of numbered mutexes in increasing order and two-phase locking (2PL)

    Acquire locks in 1st phase only, release locks in 2nd phase

    01: mutex mutex1, mutex2;02: void processA() {03: while( 1 ) {04: 05: mutex1.lock();06: /* critical section 1 */07: mutex2.lock();08: /* critical section 2 */09: mutex2.unlock();10: /* critical section 1 */11: mutex1.unlock();12: }13: }14: void processB() {15: while( 1 ) {16: 17: mutex2.lock();18: /* critical section 2 */19: mutex1.lock();20: /* critical section 1 */ 21: mutex1.unlock();22: /* critical section 2 */23: mutex2.unlock();24: }25: }

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 37

    Synchronization among processes

    Sometimes concurrently running processes must synchronize their execution When a process must wait for:

    another process to compute some value reach a known point in their execution signal some condition

    Recall producer-consumer problem processA must wait if buffer is full processB must wait if buffer is empty This is called busy-waiting

    Process executing loops instead of being blocked CPU time wasted

    More efficient methods Join operation, and blocking send and receive discussed earlier

    Both block the process so it doesnt waste CPU time Condition variables and monitors

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 38

    Condition variables

    Condition variable is an object that has 2 operations, signal and wait When process performs a wait on a condition variable, the process is blocked

    until another process performs a signal on the same condition variable How is this done?

    Process A acquires lock on a mutex Process A performs wait, passing this mutex

    Causes mutex to be unlocked Process B can now acquire lock on same mutex Process B enters critical section

    Computes some value and/or make condition true Process B performs signal when condition true

    Causes process A to implicitly reacquire mutex lock Process A becomes runnable

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 39

    Condition variable example:consumer-producer

    2 condition variables buffer_empty

    Signals at least 1 free location available in buffer buffer_full

    Signals at least 1 valid data item in buffer processA:

    produces data item acquires lock (cs_mutex) for critical section checks value of count if count = N, buffer is full

    performs wait operation on buffer_empty this releases the lock on cs_mutex allowing

    processB to enter critical section, consume data item and free location in buffer

    processB then performs signal if count < N, buffer is not full

    processA inserts data into buffer increments count signals processB making it runnable if it has

    performed a wait operation on buffer_full

    01: data_type buffer[N];02: int count = 0;03: mutex cs_mutex;04: condition buffer_empty, buffer_full;06: void processA() {07: int i;08: while( 1 ) {09: produce(&data);10: cs_mutex.lock();11: if( count == N ) buffer_empty.wait(cs_mutex);13: buffer[i] = data;14: i = (i + 1) % N;15: count = count + 1;16: cs_mutex.unlock();17: buffer_full.signal();18: }19: }20: void processB() {21: int i;22: while( 1 ) {23: cs_mutex.lock();24: if( count == 0 ) buffer_full.wait(cs_mutex);26: data = buffer[i];27: i = (i + 1) % N;28: count = count - 1;29: cs_mutex.unlock();30: buffer_empty.signal();31: consume(&data);32: }33: }34: void main() {35: create_process(processA); create_process(processB);37: }

    Consumer-producer using condition variables

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 40

    Monitors

    Collection of data and methods or subroutines that operate on data similar to an object-oriented paradigm

    Monitor guarantees only 1 process can execute inside monitor at a time

    (a) Process X executes while Process Y has to wait

    (b) Process X performs wait on a condition Process Y allowed to enter and execute

    (c) Process Y signals condition Process X waiting on Process Y blocked Process X allowed to continue executing

    (d) Process X finishes executing in monitor or waits on a condition again

    Process Y made runnable again

    Process X

    Monitor

    DATA

    CODE

    (a)

    Process Y

    Process X

    Monitor

    DATA

    CODE

    (b)

    Process Y

    Process X

    Monitor

    DATA

    CODE

    (c)

    Process Y

    Process X

    Monitor

    DATA

    CODE

    (d)

    Process Y

    Waiting

    Waiting

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 41

    Monitor example: consumer-producer

    Single monitor encapsulates both processes along with buffer and count

    One process will be allowed to begin executing first

    If processB allowed to execute first Will execute until it finds count = 0 Will perform wait on buffer_full condition

    variable processA now allowed to enter monitor and

    execute processA produces data item finds count < N so writes to buffer and

    increments count processA performs signal on buffer_full

    condition variable processA blocked processB reenters monitor and continues

    execution, consumes data, etc.

    01: Monitor {02: data_type buffer[N];03: int count = 0;04: condition buffer_full, condition buffer_empty;06: void processA() {07: int i;08: while( 1 ) {09: produce(&data);10: if( count == N ) buffer_empty.wait();12: buffer[i] = data;13: i = (i + 1) % N;14: count = count + 1;15: buffer_full.signal();16: }17: }18: void processB() {19: int i;20: while( 1 ) {21: if( count == 0 ) buffer_full.wait();23: data = buffer[i];24: i = (i + 1) % N;25: count = count - 1;26: buffer_empty.signal();27: consume(&data);28: buffer_full.signal();29: }30: }31: } /* end monitor */32: void main() {33: create_process(processA); create_process(processB);35: }

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 42

    Implementation

    Mapping of systems functionality onto hardware processors:

    captured using computational model(s)

    written in some language(s) Implementation choice independent

    from language(s) choice Implementation choice based on

    power, size, performance, timing and cost requirements

    Final implementation tested for feasibility

    Also serves as blueprint/prototype for mass manufacturing of final product

    The choice of computational

    model(s) is based on whether it

    allows the designer to describe the

    system.

    The choice of language(s) is

    based on whether it captures the computational

    model(s) used by the designer.

    The choice of implementation is

    based on whether it meets power, size, performance and

    cost requirements.

    Sequent.program

    Statemachine

    Data-flow

    Concurrent processes

    C/C++Pascal Java VHDL

    Implementation A Implementation B

    Implementation C

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 43

    Concurrent process model: implementation

    Can use single and/or general-purpose processors (a) Multiple processors, each executing one process

    True multitasking (parallel processing) General-purpose processors

    Use programming language like C and compile to instructions of processor

    Expensive and in most cases not necessary Custom single-purpose processors

    More common

    (b) One general-purpose processor running all processes

    Most processes dont use 100% of processor time Can share processor time and still achieve necessary

    execution rates (c) Combination of (a) and (b)

    Multiple processes run on one general-purpose processor while one or more processes run on own single_purpose processor

    Process1Process2

    Process3Process4

    Processor A

    Processor B

    Processor C

    Processor D Com

    mun

    icat

    ion

    Bus

    (a)

    (b)

    Process1Process2

    Process3Process4

    General Purpose Processor

    Process1Process2

    Process3Process4

    Processor A

    General Purpose

    Processor

    Com

    mun

    icat

    ion

    Bus

    (c)

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 44

    Implementation: multiple processes sharing single processor

    Can manually rewrite processes as a single sequential program Ok for simple examples, but extremely difficult for complex examples Automated techniques have evolved but not common E.g., simple Hello World concurrent program from before would look like:

    I = 1; T = 0;while (1) {

    Delay(I); T = T + 1;if X modulo T is 0 then call PrintHelloWorldif Y modulo T is 0 then call PrintHowAreYou

    }

    Can use multitasking operating system Much more common Operating system schedules processes, allocates storage, and interfaces to peripherals, etc. Real-time operating system (RTOS) can guarantee execution rate constraints are met Describe concurrent processes with languages having built-in processes (Java, Ada, etc.) or a sequential

    programming language with library support for concurrent processes (C, C++, etc. using POSIX threads for example)

    Can convert processes to sequential program with process scheduling right in code Less overhead (no operating system) More complex/harder to maintain

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 45

    Processes vs. threads

    Different meanings when operating system terminology Regular processes

    Heavyweight process Own virtual address space (stack, data, code) System resources (e.g., open files)

    Threads Lightweight process Subprocess within process Only program counter, stack, and registers Shares address space, system resources with other threads

    Allows quicker communication between threads Small compared to heavyweight processes

    Can be created quickly Low cost switching between threads

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 46

    Implementation:suspending, resuming, and joining

    Multiple processes mapped to single-purpose processors Built into processors implementation Could be extra input signal that is asserted when process suspended Additional logic needed for determining process completion

    Extra output signals indicating process done

    Multiple processes mapped to single general-purpose processor Built into programming language or special multitasking library like POSIX Language or library may rely on operating system to handle

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 47

    Implementation: process scheduling

    Must meet timing requirements when multiple concurrent processes implemented on single general-purpose processor

    Not true multitasking Scheduler

    Special process that decides when and for how long each process is executed Implemented as preemptive or nonpreemptive scheduler Preemptive

    Determines how long a process executes before preempting to allow another process to execute

    Time quantum: predetermined amount of execution time preemptive scheduler allows each process (may be 10 to 100s of milliseconds long)

    Determines which process will be next to run Nonpreemptive

    Only determines which process is next after current process finishes execution

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 48

    Scheduling: priority

    Process with highest priority always selected first by scheduler Typically determined statically during creation and dynamically during

    execution FIFO

    Runnable processes added to end of FIFO as created or become runnable Front process removed from FIFO when time quantum of current process is up

    or process is blocked Priority queue

    Runnable processes again added as created or become runnable Process with highest priority chosen when new process needed If multiple processes with same highest priority value then selects from them

    using first-come first-served Called priority scheduling when nonpreemptive Called round-robin when preemptive

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 49

    Priority assignment

    Period of process Repeating time interval the process must complete one execution within

    E.g., period = 100 ms Process must execute once every 100 ms

    Usually determined by the description of the system E.g., refresh rate of display is 27 times/sec Period = 37 ms

    Execution deadline Amount of time process must be completed by after it has started

    E.g., execution time = 5 ms, deadline = 20 ms, period = 100 ms Process must complete execution within 20 ms after it has begun regardless of its period Process begins at start of period, runs for 4 ms then is preempted Process suspended for 14 ms, then runs for the remaining 1 ms Completed within 4 + 14 + 1 = 19 ms which meets deadline of 20 ms Without deadline process could be suspended for much longer

    Rate monotonic scheduling Processes with shorter periods have higher priority Typically used when execution deadline = period

    Deadline monotonic scheduling Processes with shorter deadlines have higher priority Typically used when execution deadline < period

    Process

    ABCDEF

    Period

    25 ms50 ms12 ms100 ms40 ms75 ms

    Priority

    536142

    Process

    GHIJKL

    Deadline

    17 ms50 ms32 ms10 ms140 ms32 ms

    Priority

    523614

    Rate monotonic

    Deadline monotonic

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 50

    Real-time systems

    Systems composed of 2 or more cooperating, concurrent processes with stringent execution time constraints

    E.g., set-top boxes have separate processes that read or decode video and/or sound concurrently and must decode 20 frames/sec for output to appear continuous

    Other examples with stringent time constraints are: digital cell phones navigation and process control systems assembly line monitoring systems multimedia and networking systems etc.

    Communication and synchronization between processes for these systems is critical

    Therefore, concurrent process model best suited for describing these systems

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 51

    Real-time operating systems (RTOS)

    Provide mechanisms, primitives, and guidelines for building real-time embedded systems Windows CE

    Built specifically for embedded systems and appliance market Scalable real-time 32-bit platform Supports Windows API Perfect for systems designed to interface with Internet Preemptive priority scheduling with 256 priority levels per process Kernel is 400 Kbytes

    QNX Real-time microkernel surrounded by optional processes (resource managers) that provide POSIX and

    UNIX compatibility Microkernels typically support only the most basic services Optional resource managers allow scalability from small ROM-based systems to huge multiprocessor systems

    connected by various networking and communication technologies Preemptive process scheduling using FIFO, round-robin, adaptive, or priority-driven scheduling 32 priority levels per process Microkernel < 10 Kbytes and complies with POSIX real-time standard

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 52

    Summary

    Computation models are distinct from languages Sequential program model is popular

    Most common languages like C support it directly State machine models good for control

    Extensions like HCFSM provide additional power PSM combines state machines and sequential programs

    Concurrent process model for multi-task systems Communication and synchronization methods exist Scheduling is critical

    Dataflow model good for signal processing

  • Embedded Systems Design: A Unified Hardware/Software Introduction

    1

    Chapter 2: Custom single-purpose processors

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    2

    Outline

    Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose processor design

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    3

    Introduction

    Processor Digital circuit that performs a

    computation tasks Controller and datapath General-purpose: variety of computation

    tasks Single-purpose: one particular

    computation task Custom single-purpose: non-standard

    task

    A custom single-purpose processor may be Fast, small, low power But, high NRE, longer time-to-market,

    less flexible

    Microcontroller

    CCD preprocessor

    Pixel coprocessorA2D

    D2A

    JPEG codec

    DMA controller

    Memory controller ISA bus interface UART LCD ctrl

    Display ctrl

    Multiplier/Accum

    Digital camera chip

    lens

    CCD

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    4

    CMOS transistor on silicon

    Transistor The basic electrical component in digital systems Acts as an on/off switch Voltage at gate controls whether current flows from

    source to drain Dont confuse this gate with a logic gate

    source drainoxidegate

    IC package IC channel

    Silicon substrate

    gate

    drain

    source

    Conductsif gate at 1

    1

    nMOS transistor

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    5

    CMOS transistor implementations

    Complementary Metal Oxide Semiconductor

    We refer to logic levels Typically 0 : 0V, 1 : 5V or less

    Two basic CMOS types nMOS conducts if gate at 1 pMOS conducts if gate at 0 Hence complementary

    Basic gates Inverter, NAND, NOR

    x F = x'

    1

    inverter

    0

    F = (xy)'

    x1

    x

    y

    y

    NAND gate

    0

    1

    F = (x+y)'

    x y

    x

    y

    NOR gate0

    gate

    drain

    source

    nMOS

    Conductsif gate at 1

    gate

    source

    drain

    pMOS

    Conductsif gate at 0

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    6

    Basic logic gates

    F = x yAND

    F = (x y)NAND

    F = x yXOR

    F = xDriver

    F = xInverter

    x F

    F = x + yOR

    F = (x+y)NOR

    x F

    x

    yF

    Fx

    y

    x

    yF

    xy

    Fxy F

    Fyxx

    0y0

    F0

    0 1 01 0 01 1 1

    x0

    y0

    F0

    0 1 11 0 11 1 1

    x0

    y0

    F0

    0 1 11 0 11 1 0

    x0

    y0

    F1

    0 1 01 0 01 1 1

    x0

    y0

    F1

    0 1 11 0 11 1 0

    x0

    y0

    F1

    0 1 01 0 01 1 0

    x F0 01 1

    x F0 11 0

    F = (x y)XNOR

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    7

    Combinational logic design

    A) Problem description

    y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1.

    D) Minimized output equations

    000

    1

    01 11 100

    1

    0 1 0

    1 1 1

    abcy

    y = a + bc

    000

    1

    01 11 100

    0

    1 0 1

    1 1 1

    z

    z = ab + bc + bc

    abc

    C) Output equations

    y = a'bc + ab'c' + ab'c + abc' + abc

    z = a'b'c + a'bc' + ab'c + abc' + abc

    B) Truth table

    1 0 1 1 11 1 0 1 11 1 1 1 1

    0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0

    00 0 0 0

    Inputsa b c

    Outputsy z

    E) Logic Gates(random logic)

    abc

    y

    z

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    8

    Combinational components

    With enable input e all Os are 0 if e=0

    With carry-in input Ci

    sum = A + B + Ci

    May have status outputs carry, zero, etc.

    Multiplexor

    O =I0 if S=0..00I1 if S=0..01I(m-1) if S=1..11

    Decoder

    O0 =1 if I=0..00O1 =1 if I=0..01O(n-1) =1 if I=1..11

    Adder

    sum = A+B(first n bits)

    carry = (n+1)thbit of A+B

    Comparator

    less = 1 if AB

    ALU

    O = A op Bop determinedby S.

    n-bit, m x 1 Multiplexor

    O

    S0

    S(log m)

    n

    n

    I(m-1) I1 I0

    log n x n Decoder

    O1 O0O(n-1)

    I0I(log n -1)

    n-bitAdder

    nA B

    n

    sumcarry

    n-bitComparator

    n nA B

    less equal greater

    n bit, m function

    ALU

    n nA B

    S0

    S(log m)n

    O

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    9

    Sequential components

    (storage) Register

    Q = 0 if clear=1,I if load=1 and clock=1,Q(previous) otherwise.

    Counter

    Q = 0 if clear=1,Q(prev)+1 if count=1 and clock=1.

    clear

    n-bitRegister

    n

    n

    load

    I

    Q

    shift

    I Q

    n-bitShift register

    n-bitCountern

    Q

    Shift register

    Q = lsb- Content shifted- I stored in msb

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    10

    Sequential logic design

    A) Problem Description

    You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles

    0

    1 2

    3

    x=0

    x=1x=0

    x=0

    a=1 a=1

    a=1

    a=1

    a=0

    a=0

    a=0

    a=0

    B) State Diagram

    C) Implementation Model

    Combinational logic

    State register

    a x

    I0

    I0

    I1

    I1

    Q1 Q0

    D) State Table (Moore-type)

    1 0 1 1 11 1 0 1 11 1 1 0 0

    0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0

    00 0 0 0

    InputsQ1 Q0 a

    OutputsI1 I0

    1

    0

    0

    0

    x

    Given this implementation model Sequential logic design quickly reduces to

    combinational logic design

    gis

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    11

    Sequential logic design (cont.)

    00

    1

    Q1Q0I1

    I1 = Q1Q0a + Q1a + Q1Q0

    0 1

    1

    1

    010

    00 11 10a 01

    0

    0

    0

    1 0 1

    1

    00 01 11a1

    10I0 Q1Q0

    I0 = Q0a + Q0a0

    1

    0 0

    0

    1

    1

    0

    0

    00 01 11 10

    x = Q1Q0

    x

    0

    1

    0

    a

    Q1Q0

    E) Minimized Output Equations F) Combinational Logic

    (random logic)

    a

    Q1 Q0

    I0

    I1

    x

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    12

    Custom single-purpose processor basic model

    controller and datapath

    controller datapath

    externalcontrolinputs

    externalcontrol outputs

    externaldata

    inputs

    externaldata

    outputs

    datapathcontrolinputs

    datapathcontroloutputs

    a view inside the controller and datapath

    controller datapath

    stateregister

    next-stateand

    controllogic

    registers

    functionalunits

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    13

    Example: greatest common divisor

    GCD

    (a) black-box view

    x_i y_i

    d_o

    go_i

    0: int x, y;1: while (1) {2: while (!go_i) ;3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }

    (b) alg. specification

    y = y -x7: x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    15

    Creating the datapath

    Create a register for any declared variable

    Create a functional unit for each arithmetic operation

    Connect the ports, registers and functional units Based on reads and writes Use multiplexors for

    multiple sources

    Create unique identifier for each datapath component

    control input and output

    y = y -x7: x = x - y8:

    6-J:

    x!=y

    5: !(x!=y)

    x

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    17

    Splitting into a controller and datapath

    y_sel = 1y_ld = 1

    7: x_sel = 1x_ld = 1

    8:

    6-J:

    x_neq_y=1

    5:x_neq_y=0

    x_lt_y=1 x_lt_y=0

    6:

    5-J:

    d_ld = 1

    1-J:

    9:

    x_sel = 0x_ld = 13:

    y_sel = 0y_ld = 14:

    1:1

    !1

    2:

    2-J:

    !go_i

    !(!go_i)

    go_i

    0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111 10001001

    1010

    1011

    1100

    ControllerController implementation model

    y_selx_sel

    Combinational logic

    Q3 Q0

    State register

    go_i

    x_neq_yx_lt_y

    x_ldy_ld

    d_ld

    Q2 Q1

    I3 I0I2 I1

    subtractor subtractor7: y-x8: x-y5: x!=y 6: x

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    19

    Completing the GCD custom single-purpose processor design

    We finished the datapath We have a state table for

    the next state and control logic All thats left is

    combinational logic design

    This is not an optimized design, but we see the basic steps

    ard

    a view inside the controller and datapath

    controller datapath

    stateregister

    next-stateand

    controllogic

    registers

    functionalunits

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    20

    We often start with a state machine Rather than algorithm Cycle timing often too central

    to functionality

    Example Bus bridge that converts 4-bit

    bus to 8-bit bus Start with FSMD Known as register-transfer

    (RT) level Exercise: complete the design

    HH

    RT-level custom single-purpose processor design

    Prob

    lem

    Spe

    cific

    atio

    n

    BridgeA single-purpose processor that

    converts two 4-bit inputs, arriving one at a time over data_in along with a

    rdy_in pulse, into one 8-bit output on data_out along with a rdy_out pulse.

    Sender

    data_in(4)

    rdy_in rdy_out

    data_out(8)

    Receiver

    clock

    FSM

    D

    WaitFirst4 RecFirst4Startdata_lo=data_in

    WaitSecond4

    rdy_in=1rdy_in=0

    RecFirst4End

    rdy_in=1

    RecSecond4Startdata_hi=data_in

    RecSecond4End

    rdy_in=1rdy_in=0rdy_in=1

    rdy_in=0

    Send8Startdata_out=data_hi

    & data_lordy_out=1

    Send8Endrdy_out=0

    Bridge

    rdy_in=0Inputsrdy_in: bit; data_in: bit[4];

    Outputsrdy_out: bit; data_out:bit[8]

    Variablesdata_lo, data_hi: bit[4];

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    21

    RT-level custom single-purpose processor design (cont)

    WaitFirst4 RecFirst4Startdata_lo_ld=1

    WaitSecond4

    rdy_in=1rdy_in=0

    RecFirst4End

    rdy_in=1

    RecSecond4Startdata_hi_ld=1

    RecSecond4End

    rdy_in=1rdy_in=0rdy_in=1

    rdy_in=0

    Send8Startdata_out_ld=1

    rdy_out=1

    Send8Endrdy_out=0

    (a) Controller

    rdy_in rdy_out

    data_lodata_hi

    data_in(4)

    (b) Datapathdata_outd

    ata_

    out_

    ldda

    ta_h

    i_ld

    data

    _lo_

    ld

    clk

    to a

    ll re

    gist

    ers

    data_out

    Bridge

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    22

    Optimizing custom single-purpose processors

    Optimization is the task of making design metric values the best possible

    Optimization opportunities original program FSMD datapath FSM

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    23

    Optimizing the original program

    Analyze program attributes and look for areas of possible improvement number of computations size of variable time and space complexity operations used

    multiplication and division very expensive

    Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    24

    Optimizing the original program (cont)

    0: int x, y;1: while (1) {2: while (!go_i) ;3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }

    0: int x, y, r;1: while (1) {2: while (!go_i) ;

    // x must be the larger number3: if (x_i >= y_i) {4: x=x_i; 5: y=y_i;

    }6: else {7: x=y_i; 8: y=x_i;

    }9: while (y != 0) {

    10: r = x % y;11: x = y; 12: y = r; }13: d_o = x; }

    original program optimized program

    replace the subtraction operation(s) with modulo

    operation in order to speed up program

    GCD(42, 8) - 9 iterations to complete the loop

    x and y values evaluated as follows : (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).

    GCD(42,8) - 3 iterations to complete the loop

    x and y values evaluated as follows: (42, 8), (8,2), (2,0)

  • Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis

    25

    Optimizing t