1 ee249 scheduling. 2 ee249 what’s the problem? l have some work to do –know subtasks l have...

64
EE249 1 Scheduling

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

EE2491

Scheduling

EE2492

What’s the problem?

Have some work to do

– know subtasks

Have limited resources

Have some constraints to meet

Want to optimize quality

EE2493

Outline

Overview– shop scheduling

– data-flow scheduling

– real-time scheduling

– OS scheduling

Real-time scheduling RTOS generation

– scheduling

– Communication

Data-flow scheduling– pure

– Petri nets

EE2494

Shop scheduling

Single job, one time

finite and known amount of work

multiple resources of different kind

often minimize lateness

– could add release, precedence, deadlines, ...

SOLUTION: compute the schedule

APPLICATION: manufacturing

EE2495

Data-flow scheduling

Single-job, repeatedly

known amount of work

– simple subtasks

multi-processor

max. throughput, min. latency

SOLUTION: code generation

APPLICATION: signal processing

EE2496

Data-flow scheduling variants

Work

– data dependent (BDF, FCPN)

Resources

– many different execution units (HLS)

Goal

– min. code, min. buffers, min. resources

EE2497

Real-time scheduling

Fixed number of repeating jobs

each job has fixed work

– job is a sub-task

processor(s)

meet individual deadlines

SOLUTION: choose policy, let RTOS implement it

APPLICATION: real-time control

EE2498

RT scheduling variants

Work

– sporadic or event-driven tasks,

– variable (data dependent) work

– coordination between tasks:

– mutual exclusion, precedence, …

Goal

– event loss, input or output correlation, freshness, soft deadlines, ...

EE2499

OS scheduling

Variable number of random tasks

know nothing about sub-tasks

processor + other computer resources

progress of all tasks, average service time

SOLUTION: OS implements time-slicing

APPLICATION: computer systems

EE24910

Outline

Overview– shop scheduling

– data-flow scheduling

– real-time scheduling

– OS scheduling

Real-time scheduling RTOS generation

– Scheduling

– Communication

Data-flow scheduling– pure

– Petri nets

EE24911

RTOS functions

Enable communication between software tasks, hardware and other system resources

Coordinate software tasks

– keep track which tasks are ready to execute

– decide which one to execute: scheduling

EE24912

Outline

Implementing communication through events

Coordination:

– classic scheduling results

– reactive model of real-time systems

– conservative scheduling analysis

– priority assignment

EE24913

The scheduling problem

Given:

– estimates on execution times of each task

– timing constraints

Find:

– an execution ordering of tasks that satisfies constraints

A schedule needs to be:

– constructed

– validated

EE24914

Off-line vs. on-line scheduling

Plus side:

– simpler

– lower overhead

– highly predictable

Minus side

– bad service to urgent tasks

– independent of actual requests

EE24915

Scheduling Algorithms

off-line (pre-run-time, static)

– round-robin, e.g.

– C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4 …

– static cyclic, e. g.

– C1 C2 C3 C2 C4 C1 C2 C3 C2 C4 C1 C2 …

on-line (run-time, dynamic)

– static priority

– dynamic priority

– preemptive or not

EE24916

Static priority scheduling

synthesis:

– priority assignment

– RMS [LL73]

analysis

– Audsley 91

EE24917

Rate Monotonic Scheduling

Liu -Layland [73] consider systems consisting of tasks:

– enabled periodically

– with fixed run time

– that should be executed before enabled again

– scheduled preemptively with statically assigned priorities

EE24918

Rate Monotonic Scheduling

giving higher priority to tasks with shorter period (RMS) is optimal

– if any other static priority assignment can schedule it, them RMS can do it too

define utilization as sum of Ei/Ti

any set of n tasks with utilization of less than n(21/n-1) is schedulable

for n=2,3,…. n(21/n-1) = 0.83, 0.78, … ln(2)=0.69

EE24919

Static Priority Schedule Validation

Audsley [91]:

for a task in Liu-Layland’s model find its worst case execution time

i i ii

k n

run time i WCET i

period i period i

time

i

EE24920

Audsley’s algorithm

let Ei’s be run-times, Ti’s periods

how much can i be delayed by a higher priority task k:

– each execution delays it by Ek

– while i is executing k will be executed ciel(WCETi/ Tk)

WCETi = Ei + SUMk>i ciel(WCETi/ Tk)* Ek

EE24921

Solving implicit equation

iteration

– WCETi,0 = Ei

– WCETi,n+1 = Ei + SUMk>i ciel(WCETi,n/ Tk)* Ek

will converge if processor utilization if less than 1

EE24922

Dynamic priority

Earliest deadline first:

– at each moment schedule a task with the least time before next occurrence

LL have shown that for their model, EDF schedules any feasible set of tasks

EE24923

What’s wrong with LL model?

Liu-Layland model yields strong results but does not model reactivity well

Our model:

– models reactivity directly

– abstracts functionality

– allows efficient conservative schedule validation

EE24924

Computation Model

System is a network of internal and external tasks

External tasks have minimum times between execution

Internal tasks have priorities and run times

20

10

1,2 5,2 3,2

2,1 4,1

EE24925

Computation Model

20

10

1,2 5,2 3,2

2,1 4,1

External task execute at random, respecting the lower bound between executions

Execution of a task enables all its successors

Correct if no events are lost

EE24926

Schedule Validation

To check correctness:

– check whether internal events can be lost

– priority analysis

– check whether external events can be lost

– bound WCET

EE24927

Validation for Internal Events

Simple: if priority of i is less than k, then (i,k) cannot be lost

i k

i k

More general: if fan-ins of i form a tree such that leaves have lower priority than non-leaves and k, then (i,k) cannot be lost

EE24928

Validation for External Events

Compute a bound on the period of time a processor executes task of priority i or higher (i-busy period)

i i

> i

time

> i

< i < i

i-busy period

> i

( i-busy period ) > ( WCET i )

EE24929

Bounding i-busy Period

i-busy period is bounded by:

– initial workload at priority level i or higher caused by execution of some task < i

– workload at priority level i or higher caused by execution of

external tasks during the i-busy period

can find (by simulation) workload at priority level i or higher caused by execution of a single task

can bound the number of occurrences of external tasks in a given period

need to solve a fix-point equation

EE24930

System: Network of CFSMs

CFSM2

CFSM3

C=>G

CFSM1

C=>FB=>C

F^(G==1)

(A==0)=>B

C=>A

C=>B

F

G

CC

BA C=>B

EE24931

Implementations

CFSMs can be implemented:

– in hardware: HW-CFSMs

– in software: SW-CFSMs

– by built-in peripherals (e.g. timer): MP-CFSMs

EE24932

Events: SW to SW

for every event, RTOS maintains

– global values

– local flags

CFSM1

CFSM2

CFSM3

emit x( 3 ) x

x

x

3

detect x

EE24933

Events: atomicity problems

TASK 1 detects y AND NOT x, which is never true

to avoid, need atomic detects

TASK 1detect x

detect y

TASK 2

emit x

TASK 3

if detect x then emit y

EE24934

Events: SW to SW

for atomicity:

– always read from frozen

– others always write to live

– at the beginning of execution, switch

CFSM

frozenlive

EE24935

Events: HW to SW event can be polled or driving an interrupt

for polled events:

– allocate I/O port bits for value, occurrence and acknowledge flags

– generate the polling task that acknowledges and emits all polled events that have occurred

for events driving an interrupt:

– allocate I/O port bits for value,

– allocate an interrupt vector,

– create an interrupt service routine that emits an event

EE24936

Events: interrupts

interrupt service routine:

optional interrupt service routine:

{ emit x}

X RTOS

X

HW-CFSM SW-CFSM

IRQ

{ emit x execute SW-CFSM}

EE24937

Events: SW to HW

allocate I/O port bits for value and occurrence flag

use existing ports or memory-mapped ports

write value to I/O port

create a pulse on occurrence flag

EE24938

Events: SW to/from MP

every peripheral must have a library with

– init function (to be called at initialization time)

– deliver function for each input (to be called by emit)

– detect function for each output (to be called by poll-taker)

– interrupt service routine (containing emit)

EE24939

Coordination

consider SW-CFSM ready to run whenever it has some not consumed input events

choose the next ready SW-CFSM to run:

– scheduling problem

EE24940

Experiments

dashboard

– 6 tasks, 13 events

– 0.1s (8.6s to estimate run times)

shock absorber controller

– 48 tasks, 11 events

– 0.3s (880s to estimate run times)

PATHO RTOS

– orders of magnitude faster than timed automata

– scales linearly

EE24941

Open Problems

Propagation of constraints from external I/O behavior to each CFSM

–probabilistic: Markov chains

–exact: FSM state traversal

Satisfaction of constraints within a single transition

(e.g., software-driven bus interface protocol)

Automatic choice of scheduling algorithm, based on performance estimation and constraints

Scheduling for verifiability

EE24942

Outline

Overview– shop scheduling

– data-flow scheduling

– real-time scheduling

– OS scheduling

Real-time scheduling RTOS generation

– scheduling

– Communication

Data-flow scheduling– pure

– Petri nets

EE24943

Data-flow scheduling

Functionality usually represented with a data-flow graph

Kahn’s conditions allow scheduling freedom

– if a computation is specified with actors (operators) and data dependency, and

– every actor waits for data on all inputs before firing, and

– no data is lost

– then the firing order doesn’t matter

EE24944

Data-flow graphs

Schedule: a firing order that respects data-flow constraints and returns the graph to initial state

A, 1 B, 2

D, 1C, 3

EE24945

Schedule implementation

Static scheduling (cyclic executive, round robin)

A, B, C, D are processes

RTOS schedules them repeatedly in order A D B C

simple, but context-switching overhead large

A, 1 B, 2

D, 1C, 3

A D B C

A schedule:

EE24946

Schedule implementation

Code synthesis (OS generation)

A, B, C, D are subroutines

generate: forever{ call A; call D; call B; call C; }

less robust, better overhead

A, 1 B, 2

D, 1C, 3

A D B C

A schedule:

EE24947

Schedule implementation

In-lined code synthesis

A, B, C, D are code fragments

generate: forever{A; D; B; C; }

even less robust, even better overhead

A, 1 B, 2

D, 1C, 3

A D B C

A schedule:

EE24948

Data-flow scheduling

Resources

fixed or arbitrary number of processors

Goal:

max. throughput given a fixed number of processors

min. processors to achieve required throughput

EE24949

Data-flow scheduling goals

Max. throughput given a fixed number of processors

it is NP-hard to determine max. achievable throughput

Min. processors to achieve required throughput

if there are loops than there is a fundamental upper bound

easy to compute

EE24950

Throughput bound

1/maxloops(Time/Delay)

A, 1 B, 2

D, 1C, 3

N+2’nd output of A can be computed at least 7 time units after the Nth

EE24951

Scheduling heuristics

Non-overlapped scheduling

Look at one iteration

Use list scheduling algorithm (developed for shop scheduling)

Overlapped scheduling

less developed

EE24952

Inter-iteration constraints

Remove delayed edges

List scheduling:

– maintain list of tasks that could be scheduled

– schedule one with longest path

A, 1 B, 2

D, 1C, 3

EE24953

List scheduling

Assume two processors

A, 1 B, 2

D, 1C, 3

3

3

EE24954

List scheduling

A, 1 B, 2

D, 1C, 3

C

A

P1

P2

EE24955

List scheduling

A, 1 B, 2

D, 1C, 3

C

A

P1

P2 D

EE24956

List scheduling

A, 1 B, 2

D, 1C, 3

C

A

P1

P2 D B

EE24957

Inter-iteration constraints

Unfold k iteration (e.g. k=2)

Do list scheduling

A1, 1 B1, 2

D1, 1C1, 3

A2, 1 B2, 2

D2, 1C2, 3

EE24958

List scheduling

Rate optimal (not true in general)

A1, 1 B1, 2

D1, 1C1, 3

A2, 1 B2, 2

D2, 1C2, 3

C1

A1

P1

P2 D1 B1

A2

D2

C2

B2

EE24959

Static data flow

Loop scheduling

Code size

Buffer size

A B C20 20 1010

EE24960

Loop scheduling

ABCBCCC

A (2 B (2 C))

A (2 B) (4 C)

A (2 B C) (2 C)

A B C20 20 1010

EE24961

Loop scheduling and code size

A (2 B (2 C))A;for i = 1 … 2 {

B;for i = 1 … 2 {

C;}

}

single appearance schedules minimize in-lined code size

A B C20 20 1010

EE24962

Buffer size

ABCBCCC 20 30

A (2 B (2 C)) 20 20

A (2 B) (4 C) 20 40

A (2 B C) (2 C) 20 30

A B C20 20 1010

EE24963

Data-flow scheduling

Perfect design-time information

Fixed amount of repeating work

– data-independent

Input streams from the environment always available

Simple global constraints

data dependency => Petri nets

timing constraints => real-time scheduling

EE24964

Other scheduling models

Problem: computation result may depend on dynamic schedule

Synchronous languages: no scheduler needed

(but inapplicable to HW/SW heterogeneous systems)

Data Flow networks: deterministic computation

(but blocking read is unsuitable for reactive systems)

Can we obtain determinism without losing efficiency ?