instruction cache memory issues in real-time systems licentiate dissertation filip sebek october 11...

43
Instruction Cache Memory Issues in Real-Time Systems Licentiate dissertation Filip Sebek October 11 th , 2002 Opponent: Axel Jantsch (KTH) Examinator: Lars Wanhammar (LiTH)

Post on 20-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Instruction Cache Memory Issuesin Real-Time Systems

Licentiate dissertation

Filip Sebek October 11th, 2002

Opponent: Axel Jantsch (KTH)

Examinator: Lars Wanhammar (LiTH)

2

Outline of this dissertation Seminar

About this thesis (Lennart Lindh)

Thesis presentation (Filip Sebek)

Comments and questions (Axel Jantsch and Filip Sebek)

Questions from the audience

Consideration (Lars Wanhammar, Axel Jantsch, and Lennart Lindh)

Festivity (?) at the department

3

Organisation

RT Systems Design Lab Comp. Architecture Lab Computer Science Lab

Graduate Education

Lic school

Int’l MSc school

Undergraduate Education

4

Mohammed El Shobaki: System

Monitoring/Debugging of S/Multiprocessor Systems

Tommy Klevin: Bus analyzer (RealFast)

Stefan Sjöberg: Design ASIC/FPGA with Top Down Design Flow and VHDL

(RealFast ABB)

Joakim Persson: Redundant System

(ProTang, KK)

Johan Stärner: Multiprocessor

Architecture (KK)

Leif Enblom (ABB APR): Multiprocessor system

for (ABB KK)

Filip Sebek: Instruction Cache Memory Issues in

RTS

Filip Sebek: Instruction Cache Memory Issues in

RTS

Stefan Stjernen: IP Design (RealFast,

Industrial ResearchSchool: Electronic Design )

Raimo Haukilahti KTH/MDH: Low-Power Techniques for HW-RTOS

(KTH)

5

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

6

Automatic control – Real-time system

1. Get input – sample…2. Compute – execute instructions3. Actuate – control the process…4. = Action!

A real-time system must produce correct results in time

Examples Air bag in action An armored tank in movement shoots Supertanker turns Toaster

7

Real-time system implementation Often as many ”small” cyclic programs

– tasks or processes – that communicate with each other

Alarm task

Sample task

Computation

Actuate

8

What Real-Time research is about:

Predicting execution time (of a task) Difficult – Many parameters

– Input data sensitive

– Program design

– Hardware dependant

– Compiler dependent

Several methods

Scheduling tasks static or dynamic may allow pre-emption

9

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

10

What is a cache memory?

Cache memories are faster than primary memory and keeps pace

with CPU speed

Reduce congesting bus-traffic

Saves energy

Instruction fetch time becomes variable with caches;

hit-time and miss-penalty

CPUI/O

MEM

CACHE

Fast (~95%)

Slow (~5%)

11

How does a cache memory work?

Cache hit and cache miss

Locality Temporal locality;

– memory references close in time

– loops and functions

Spatial locality; – memory references close in space

– cache block and wide data bus

int funk(int term){ int vector[SIZE]; int i, sum=0; for(i=0;i<SIZE;i++) { vector[i] +=term; sum +=vector[i]; } return sum;}

12

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

13

Cache memories and real-time Cache memories make execution time variable

Sample, execute, actuate – action! Sample, execute, actuate – action! Sample, execute, actuate – action!

Analysis is non-trivial; cache contents depends on execution path execution path depends on cache contents

Missed deadline?

14

Predicting cache behavior Avoidance and simplifications

Disable cache! Special designed processors and caches

Static analysis + no probe effects + safe overestimation - modern hardware (Paper C)

Simulation + simple - simulator must model correctly

Real measurement + measure on complex systems - probe effect (Papers A, B, D)

15

The title and the questions

Title:Instruction Cache Memory Issues

in Real-Time Systems

Initial questions How do I measure the cache-related preemption

delay in a real-time system?

Is a cache memory really a problem in real-time

systems?

16

Measurement and probe effect Most measurement affect the measured object when included or

removed from the measured environment. Examples:

A warm thermometer measures a glass of cold water

A computer monitoring system measures CPU load

Reduce the intrusion (probe effect) to a minimum!

17

Facts and Problems Solutions

18

Exploit the performance monitor that is equipped on CPU 4 registers on MPC750 Counts events

L1 Instruction fetch miss Branch miss Processor clocks Completed instructions Completed Load/Stores …

The Built-in Performance Monitor

NON-INTRUSIVE !

19

SARA CPU Card

20

SARA MP-system and MAMon

21

My questions revised

Initial questions: How do I measure the cache-related preemption delay in a

real-time system?

Is a cache memory really a problem in real-time systems?

Modified questions: Is there a simple(r) way to predict or measure cache misses

in a real-time system?

Can an instruction cache cause a missed deadline when it is enabled?

How much is the cache-related pre-emption delay in absolute and relative terms?

22

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

23

Current state in presentation:

We have 3 questions!

We have an experimental system!

We can measure on it with a small intrusion!

Q: Measure on what program?

24

Code generation: size Workbench

Standard benchmark? (Rhealstone, EEMBC etc.) Measure worst-case situations

Synthetic code – size specific One big loop

addis r3,r3,0x0000 = 4 bytes

Not representative code – no problem! Swap out cache contents – find maximum cost

– Code size measured in “cache size”

25

Code generation: miss-ratio One (out of several methods)

”Play with spatial locality”

– Method: Jump instructions breaks spatial locality

– Requirements: code size 2×cache size– Result: 1/block size – 100% cache misses

L1: nop (m)nop (h)nop (h)nop (h)

L2: nop (m)nop (h)nop (h)nop (h)

L1: J L2 (m)n.u.n.u.n.u.

L2: J L3 (m)n.u.n.u.n.u.

L1: nop (m)J L2 (h)n.u.n.u.

L2: nop (m)J L3 (h)n.u.n.u.

L1: nop (m)nop (h)J L2 (h)n.u.

L2: nop (m)nop (h)J L3 (h)n.u.

25% 100% 50% 33%

26

Analysis!

27

1.Code interpretation: miss-ratio

misshithithit

misshit--

-misshithitmiss---

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

1/41/41/41/4

1/21/2--

-1/31/31/31/1---

4/10 = 40% miss-ratio

misshithithit

hithit--

-misshithithit---

1/61/61/61/6

1/61/6--

-1/41/41/41/4---

2/10 = 20% miss-ratio

Block size = 4 words

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

Block size = 8 words

(reversed process to generate code with a fix miss-ratio)

28

1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio

misshithithit

misshit--

-misshithitmiss---

i1i2i3i4

i5beq 10i7i8

i9i10i11i12jmp 18i14i15i16

1/41/41/41/4

1/21/2--

-1/31/31/31/1---

Line size = 4 words

misshithithit

1/41/41/41/4

misshithithit

1/21/21/41/4

missmisshithit

1/41/31/31/3

misshithithit

1/11/41/41/4

(reversed process to generate code with a fix miss-ratio)

29

1.Code interpretation: miss-ratio1.Code interpretation: miss-ratio Determine the worst-case cache miss-ratio (WCCMR) The highest frequency of misses possible for a

program! Depends on execution path (actually input data)

> Miss% < Miss%

The WCCMR-path is the most energy consuming! Optimize for

– Speed or Size

– Energy consumption

30

1.Key concepts bounding WCCMR

Spatial locality analysis Determine instruction’s ”local miss-ratio”

Search Find the execution path with the highest

cache miss-ratio

Execution path analysis Determine the weight of each

basic block (loop dependent)

31

1.Result (finding WCCMR)

Path Miss ratio # Instr Executiontime

1 20.6% 43 132

2 18.9% 37 107

3 19.7% 40 119

4 17.6% 34 94

5 21.6% 87 275

6 18.0% 43 121

...

if(a>b) {

...

...

do{

...

}while(c>d);

}

else {

...

...

while(e<3){

...

}

}

...

max !!

(1) (2) (3) (4) (5) (6)

32

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

33

2.When is a cache memory beneficial? On cache misses, the complete cache block is loaded

If cache block > instruction size miss-penalty

A cache can reduce system performance! High miss-ratio AND long miss-penalty

Experiment: Generate code with fix miss-ratio Measure time Plot the average execution time

34

2.Threshold miss-ratio level (@CPX2000)

Execution time (ns/instruction)

Cach

e m

iss-ra

tio (%

)

Cac

he d

isab

led

Cache enabled

Threshold-level (84%)

35

2.When is a cache memory beneficial? Concluding question:

“When is instruction caching beneficial?”

Answer: ”Always” (!!) “No code is so jumpy” “No missed deadlines” “Safe!”

(New Q&As) ”Why 84% miss?” ”Low refill penalty” ”Why?” ”Burst refill!”

CPUI/O

MEM

CACHE

CPUI/O

MEM

CACHE

Request MISS!Refill block

HITRequest

36

Outline of this presentation Introduction

The cache memory and real-time

Measurement and probe effect

CPX2000 – “SARA system”

My own questions

Synthetic code generation

Analysis

Determine worst-case cache miss-ratio of a program

Measure instruction execution time w/wo cache

Measure cache related preemption delay

Conclusion and future work

37

Extrinsic cache behavior - Task interference Non-preemptive systems

Preemptive systems

– Cache Related Preemption Delay - CRPD

T1 T2Mis

s-ra

tio

Time

T1 T2 T1

T2 preempts T1 T1 resumes

Mis

s-ra

tio

Time

3.Cache Related Preemption Delay

38

3.CRPDmax measurement

T1 T2 T1

T2 preempts T1 T1 resumes

Mis

s-ra

tio

Time

non-preempted preempted

iteration 1 iteration 2 i3 i4 i4 (cont.)

39

3.CRPDmax measurement

CRPD = ((e - d) + (c - b)) – (b - a) = 195 500 ns = 195,5 s915

399

425

918

791

225

921

219

825

921

592

925

922

751

625

non-preempted preempted

OS:43-87 s

40

3.CRPD (@CPX2000)

T1 Task size (cache size %)

CRPD

(micr

o se

cond

s)

195,5 s

41

Conclusions and summary of results

1. The worst-case cache miss-ratio of a program can be identified to quantify the energy usage of the memory system

2. The CPX2000 system cannot miss any deadline because of an enabled instruction cache.

3. Synthetic workbenches can force a system into a worst-case state

• The cache related preemption delay has been measured as a function of task size.

42

Future Work

None!

Develope the analysis method of worst-case cache miss-ratio levels by including temporal locality

Data caches (Generate synthetic code) Measure CRPD Measure threshold miss-ratio level

43

Acknowledgements Research was funded by

KK-stiftelsen Department of Computer Science and Engineering

(Mälardalen University)

Thank you… Supervisor Professor Dr. Ing. Lennart Lindh All people at the Computer Architecture Lab My family