1 chapter 10 virtual memory 2 contents background demand paging performance of demand paging page...

1

Chapter 10Virtual Memory

2

Contents

• Background• Demand Paging• Performance of Demand Paging • Page Replacement• Page-Replacement Algorithms• Allocation of Frames • Thrashing• Other Considerations• Demand Segmentation

3

Background

• Virtual memory – separation of user logical memory from physical memory.– Only part of the program needs to be in memory for execution.– Logical address space can therefore be much larger than

physical address space.– Need to allow pages to be swapped in and out.

• Virtual memory can be implemented via:– Demand paging – Demand segmentation

§10.1

4

Demand Paging• Similar to a paging system with swapping. (Fig. 10.2)• A lazy swapper never swaps a page into memory

unless that page will be needed.

programA

programB

swap out

swap in

§10.2需求分頁

置換程式

5

Valid-Invalid Bit• Valid: the page is both legal and in memory.

Invalid: not valid or currently on the disk.

6

• Access invalid bit causes a page-fault trap. The procedure of handling it:– 1. Check internal table for this process, to determine whether the

reference was a valid or invalid memory access.– 2. If the reference was invalid, we terminate the process. If it was valid, but

we have not yet brought in that page, we now page it in.– 3. Find a free frame.– 4. Schedule a disk operation to read the desired page into the newly

allocated frame.– 5. When the disk read is complete, modify the internal table kept with the

process and the page table to indicate that the page is now in memory.– 6. We restart the instruction that was interrupted by the illegal address

trap. The process can now access the page as though it had always been in memory.

Page fault procedure分頁錯誤

7

Fig. 10.4 Steps in handling a page fault

free frame

load Mi

page is onbacking store

Operatingsystem

reference

restartinstruction page table

reset pagetable

physicalmemory

bring inmissing page

trap

12

3

45

5

8

Software supports

• Need to be able to restart any instruction after a page fault.– If page fault occurs on the instruction fetch, we can

restart by fetching the instruction again.– If a page fault occurs while fetching an operand,

must fetch and decode the instruction again, and then fetch the operand.

9

Example – ADD

• ADD the content of A to B placing the result in C:1. Fetch and decode the instruction (ADD)2. Fetch A.3. Fetch B.4. Add A and B.5. Store the sum in C.

• If faulted at step 5, the whole instruction must be repeated again.

Not much! Only one instruction.

10

Major Difficulties• Major difficulty occurs when one instruction may modify

several different locations.– IBM System 360/370 MVC (move char.) instruction can move

256 bytes at a time. A page fault might occur after the move is partially done.

• Solution:– microcode computes and attempts to access both ends of

both blocks. If a page fault is going to occur, it will happen at this step, before anything is modified.

– Uses temporary registers to hold the values of overwritten locations.

11

Performance of Demand Paging• Memory access time (ma) now ranges from 10 to 200

nanoseconds.• Page Fault Rate 0 p 1.0

– if p = 0 no page faults (effective access time = ma)– if p = 1, every reference is a fault

• Effective Access Time (EAT) EAT = (1 – p) x ma + p x page fault time

• Example: EAT = (1 – p) x (100) + p x (25 milliseconds) = 100 + 24999900 x p

Directly proportional to the page-fault rate.

If one access out of 1000 causes a page fault, the effective access time is 25 microseconds.

=> slow down by a factor of 250

§10.2.2

12

Performance of Demand Paging

• If we want less than 10-percent degradation:110 > 100 + 25000000 x p10 > 25000000 x pp < 0.0000004

• It is important to keep the page-fault rate low in a demand-paging system. Otherwise, the effective access time increases, slowing process execution dramatically.

Can allow only less than 1 memory access out of 2500000 to page fault.

13

Performance of Demand Paging

• If we want less than 10-percent degradation:110 > 100 + 25000000 x p10 > 25000000 x pp < 0.0000004

• It is important to keep the page-fault rate low in a demand-paging system. Otherwise, the effective access time increases, slowing process execution dramatically.

Can allow only less than 1 memory access out of 2500000 to page fault.

Multiple-Choices Question:( ) It is important to keep the page-fault rate _____ in

a demand-paging system. (A) low (B) high (C) large (D) legal

Anwser: A

14

Process Creation

• Two techinques made available by virtual memory that enhance creating and running processes:– Copy-on-Wirte– Memory-Mapped Files

§10.3

15

Copy-on-Write

• Process creation using the fork() system call may initially bypass the need for demand paging by using page sharing.

• Traditionaly fork() worked by creating a copy of the parent’s address space for the child, however, for the child invoke exec() system call immediately after creation, it may not be necessary.

§10.3.1

16

Copy-on-Write

• Copy-on-write works by allowing the parent and child processes to initially share the same pages.

• If either process writes to the shared page, a copy of the shared page is created so that it will modify the new page without affecting others.

• Used by Windows 2000, Linux, and Solaris.

Copy-on-write pages

17

Virtual Memory Fork

• vfork() different from fork() with copy-on-write.• With vfork() the parent process is suspended and the

child process uses the address space of the parent.• If the child process changes any pages of the parent’s,

the altered pages will be visible to the parent once it resumes.

• Intended to be used when the child calls exec() immediately after creation: no copying of pages takes place ... efficient.

Should be used with caution.

18

Memory-Mapped Files

• System calls (open(), read(), write()) are used when access a file.

• Memory mapping a file treat I/O as routine memory accesses by mapping a disk block to pages in memory.

• Initial access to the file proceeds using ordinary demand paging, subsequent accesses are handled as routing memory accesses ... simplify the file manipulation.

§10.3.2

19

Memory-Mapped Files

• Mutiple processes may be allowed to map the same file into the virtual memory of each, to allow sharing of data.

• Writes by any of the processes modify the data in virtual memory and can be seen by all others that map the same section of the file.

• The memory-mapping system calls can also support copy-on-wirte functionality, allowing processes to share a file in read-only mode, byt to have their own copies of any data they modify.

20

Memory-Mapped Files

21

Page Replacement

• Over-allocation of memory when increasing the degree of multiprogramming (Fig. 10.5)

• OS may:1. terminate the user process. (not the best choice)2. swap out a process (section 10.5)3. page replacement

§10.4

過度配置

22

0 H

1 load M

2 J

3 M

3 v

4 v

5 v

i

0 monitor

1

2 D

3 H

4 load M

5 J

6 A

7 E

valid-invalid bit

frame

logical memoryfor user 1

page tablefor user 1

0 A

1 B

2 D

3 E

6 v

v

2 v

7 i

logical memoryfor user 2

page tablefor user 2

B

M

PC

Fig. 10.5 Need for page replacement

23

Basic Scheme• modifying page-fault service routine to include page

replacement.1. Find the location of the desired page on the disk2. Find a free frame: a. If there is a free frame, use it. b. If there is no free frame, use a page-replacement algorithm to select a victim frame. c. Write the victim page to the disk; change the page and frame tables accordingly.3. Read the desired page into the (newly) free frame; change the page and frame tables.4. Restart the user process.

• Two page transfers are required.

§10.4.1

24

0 I

f vvictimf

valid-invalid bit

frame

page table

change toinvalid

reset pagetable fornew page

swap outvictimpage

swapdesiredpage in

3

1

4

2

physical memory

Fig. 10.6 Page replacement

25

Reduce overhead• Use modify (dirty) bit to reduce overhead of

page transfers – only modified pages are written to disk.

• Can be used for read-only pages also – they may be discarded when desired.

• Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

26

Reduce overhead• Use modify (dirty) bit to reduce overhead of

page transfers – only modified pages are written to disk.

• Can be used for read-only pages also – they may be discarded when desired.

• Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

Short-Answer Question:

The modify (dirty) bit can be used to reduce overhead of page transfers, Please explain how.

27

Page-Replacement Algorithms• We want to select a page replacement algorithm with

the lowest page-fault rate.• Evaluate algorithm by running it on a particular string of

memory references (reference string) and computing the number of page faults on that string.

• Reference string can be generated artificially or trace and record the memory references of real system.

Produces a large number of data. To reduce it:1. Consider only the page number instead of the entire address.2. Immediate following references to the same referenced page

will not fault.

參考串

28

Reference String• 1,4,1,6,1,6,1,6,1,6,1• If we had 3 or more frames: 3 page faults.• If only one frame available: 11 page faults.• In general, we expect:

16

14

12

10

8

6

4

2

1 2 3 4 5 6 7 8number of frames

number of page

faults

29

FIFO Page Replacement• Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1

• Associate with each page the time when that page was brought into memory or maintain a FIFO queue to hold all pages in memory.

• When a page must be replaced, the oldest page is chosen.

• 3 frames: 15 faults

§10.4.2

122222333330001111

0011111222223330000

77700000004442222777

10710212303240302107referencestring

pageframes

30

FIFO Page Replacement• Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1

• Associate with each page the time when that page was brought into memory or maintain a FIFO queue to hold all pages in memory.

• When a page must be replaced, the oldest page is chosen.

• 3 frames: 15 faults

§10.4.2

122222333330001111

0011111222223330000

77700000004442222777

10710212303240302107referencestring

pageframes

Multiple-Choices Question:( ) In order to implement FIFO page replacement, a

timer can be associated with each page or maintain a _______ to hold all pages in memory (A) binrary tree (B) linked list (C) FIFO queue (D) pop-up stack

Anwser: C

31

FIFO Page Replacement

• Easy to understand and program.• Performance is not always good.• Example: Count contain a heavily used variable

that was initialized early and is in constant use.• Even if actively used page is chosen, everything

still works correctly – a fault occurs immediately to retrieve the active page back.

A bad replacement choice increases

the page-fault rate and slows process

execution, but does not cause incorrect

execution.

32

FIFO Page Replacement

• Easy to understand and program.• Performance is not always good.• Example: Count contain a heavily used variable

that was initialized early and is in constant use.• Even if actively used page is chosen, everything

still works correctly – a fault occurs immediately to retrieve the active page back.

A bad replacement choice increases

the page-fault rate and slows process

execution, but does not cause incorrect

execution.

True-false Question:( ) A bad page replacement choice increases the

page-fault rate and slows process execution, the process eventually will reach execution error.

Anwser: x

33

Belady’s Anomaly• Reference string: 1,2,3,4,1,2,5,1,2,3,4,5• The number of faults for 4 frames (10) is greater than

the number of faults for three frames (9)!

• For some page-replacement algorithms, the page fault rate may increase as the number of allocated frames increases.

16

14

12

10

8

6

4

2

number of frames

number of page

faults

1 2 3 4 5 6 7

異常現象

34

Optimal Page Replacement• OPT has the lowest page-fault rate of all algorithms. It never

suffer from Belady’s anomaly.• Replace page that will not be used for longest period of time.

• Only 9 page faults.• Unfortunately, OPT is difficult to implement – requires future

knowledge of the reference string.• Used for measuring how well your algorithm performs

(comparison studies).

§10.4.3

referencestring

pageframes

7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1

7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 7 7

0 0 0 0 0 0 4 4 4 0 0 0 0 0 0 0 0 0 0

1 1 1 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1

35

Optimal Page Replacement• OPT has the lowest page-fault rate of all algorithms. It never

suffer from Belady’s anomaly.• Replace page that will not be used for longest period of time.

• Only 9 page faults.• Unfortunately, OPT is difficult to implement – requires future

knowledge of the reference string.• Used for measuring how well your algorithm performs

(comparison studies).

§10.4.3

referencestring

pageframes

7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1

7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 7 7

0 0 0 0 0 0 4 4 4 0 0 0 0 0 0 0 0 0 0

1 1 1 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1

Multiple-Choices Question:( ) OPT is difficult to implement since it requires

______ knowledge of the reference string (A) past (B) future (C) human (D) intelligent

Anwser: B

36

LRU Page Replacement• If we use the recent past as an approximation of the

near future, then we will replace the page that has not been used for the longest period of time.

• This is the OPT looking backward in time.• 12 faults, still much better than 15 of FIFO• LRU is often used and is considered to be good.

referencestring

pageframes

7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1

7 7 7 2 2 2 2 4 4 4 0 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 3 3 3 3 3 3 0 0 0 0 0

1 1 1 3 3 3 2 2 2 2 2 2 2 2 2 7 7 7

§10.4.4

37

Implement LRU• Counters: associate with each page-table entry a time-of-use

field, and add to the CPU a logical clock or counter. Whenever a reference to a page is made, the contents of the clock register are copied to the time-of-use field. Replace the page with the smallest time value.

• Stack: Whenever a page is referenced, it is removed from the stack and put on the top. The top of the stack is always the most recently used page and the bottom is the LRU page.

2

1

0

7

4

7

2

1

0

4

4 7 0 7 1 0 1 2 1 2 7 1 2

38

LRU Approximation Page Replacement• Reference bit

– With each page associate a bit, initially -= 0– When page is referenced bit set to 1.– Replace the one which is 0 (if one exists). We do not know

the order, however.• Additional-Reference-Bits Algorithm

– gain additional ordering info. by recording the reference bits at regular intervals.

– shift register contains 00000000 means page has not been used for eight time periods. 11001011 has been used more recently than 01110111.

– The page with the lowest number is the LRU page.

§10.4.5

§10.3.5.1

參考位元

額外參考位元

39

• Second-chance algorithm (Clock Algorithm)– FIFO with reference bit.– If page to be replaced has reference bit = 0,

proceed to replace the page.– If page to be replaced has reference bit = 1.

then:• set reference bit 0.• leave page in memory.• replace next page (in clock order), subject to same

rules.– If a page is used often enough to keep its

reference bit set, it will never be replaced.– Degenerates to FIFO if all bits are set.

0

0

1

1

0

0

1

1

Next

Victim

referencebits

pages

Circular queueof pages

§10.4.5.2二次機會

40

• Enhanced second-chance algorithm– considering both the reference bit and the modify bit

as an ordered pair. Four classes:1. (0,0) neither recently used nor modified – best page to replace2. (0,1) not recently used but modified – not quite as good, because the page will need to be written out before replacement.3. (1, 0) recently used but clean – it probably will be used again soon.4. (1,1) recently used and modified – it probably will be used again soon, and the page will be need to be written out to disk before it can be replaced.

– Replace the first page encountered in the lowest nonempty class.

– Give preference to pages that have been modified to reduce the number of I/Os required.

§10.4.5.3

加強二次機會

41

Counting-Based Page Replacement• Keep a counter of the number of references.• Least frequently used (LFU) page-replacement algorithm

requires that the page with the smallest count be replaced.– suffers from the situation in which a page is used heavily during

the initial phase of a process, but then is never used again.– One solution: shift the counts right by 1 bit at regular intervals,

forming exponentially decaying count.• Most frequently used (MFU): the page with the smallest

count was probably just brought in and has yet to be used.

§10.4.6

以計數為基礎

42

Page-Buffering Algorithm1. Maintain a modified pages. Whenever the paging

device is idle, a modified page is selected, written to the disk, and reset the modify bit.

– increases the probability that a page will be clean when it is selected for replacement.

2. Keep a pool of free frames, but to remember which page was in each frame. Since the frame contents are not modified when a frame is written to the disk, the old page can be reused directly form the free-frame pool if it is needed before that frame is reused.

§10.4.7

43

Page-Buffering Algorithm1. Maintain a modified pages. Whenever the paging

device is idle, a modified page is selected, written to the disk, and reset the modify bit.

– increases the probability that a page will be clean when it is selected for replacement.

2. Keep a pool of free frames, but to remember which page was in each frame. Since the frame contents are not modified when a frame is written to the disk, the old page can be reused directly form the free-frame pool if it is needed before that frame is reused.

§10.4.7

True-false Question:( ) With free-frame pool, the frame contents are not

modified when a frame is written to the disk, therefore the old page can be reused directly and no I/O is needed.

Anwser: o

44

Allocation of Frames• Besides the frame a single instruction is located, its reference

may require another frame. More frames are needed if multi-level of indirection is used.

• Each process needs a minimum number of frames, which is defined by the computer architecture.

• Move instruction of PDP-11 and IBM 370 (worst case)• 6 pages to handle SS MOVE instruction:• instruction span 2 pages.• 2 pages each for its two operands.

• Worst case: multiple level of indirection causes entire virtual memory must be in physical memory.

• A limit on the levels of indirection must be placed

§10.5

45

Allocation Algorithms• Equal allocation – e.g., if 100 frames and 5

processes, give each 20 pages.• Proportional allocation – Allocate according to the

size of process.

mS

spa

m

sS

ps

iii

i

ii

for allocation

frames of number total

process of size

5762137

127

462137

10

127

10

62

2

1

2

1

a

a

s

s

m

Both processes share the available frames according to their “needs,” rather than

equally.

46

Global vs. Local Allocation

• Global replacement – process selects a replacement frame from the set of all frames; one process can take a frame from another.

• Local replacement – each process selects from only its own set of allocated frames.

• Problem with global replacement: a process cannot control its own page-fault rate. The same process may perform quite differently due to external circumstances.

全域與區域配置

47

Thrashing

• If a process does not have “enough” pages, the page-fault rate is very high. This leads to:– low CPU utilization.– operating system thinks that it needs to increase the

degree of multiprogramming.– another process added to the system.

• Thrashing a process is busy swapping pages in and out.

§10.6輾轉

48

Thrashing Diagram

• The effect of thrashing can be limited by using a local replacement algorithm: if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash also.

At this point, to increase CPU

utilization and stop thrashing, we must decrease the degree of multiprogramming.

49

Locality Model• To prevent thrashing, must provide a process as many

frames as it needs.• Locality model

– As a process executes, it moves from locality to locality.– A locality is a set of pages that are actively used together. (Fig.

10.15)– A program is generally composed of several different localities,

which may overlap. • Localities are defined by the program structure and its data

structures. The locality model states that all programs will exhibit this basic memory reference structure.

局部、區域

51

Working-Set Model working-set window• The set of pages in the most recent page references is the

working set.

• The accuracy of the working set depends on the selection of . too small: will not encompass the entire locality too large: may overlap several localities. is infinite: working set is the set of pages touched during

the process execution.

§10.6.2

….261577775162341234443434441323443444

t1 t2WS(t1)={1,2,5,6,7} WS(t2)={3,4}

工作集合模式

52

Working-Set Model• WSSi (working set size of Process Pi)

D = WSSi

where D is the total demand for frames. m = Total number of available frames

• if D > m Thrashing will occur, because some processes will not have enough frames. Then suspend one of the processes.

• Working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. optimizes CPU utilization.

53

Keeping Track of the Working Set• The working-set window is a moving window. (evolves

with time)• Approximate with a fixed interval timer + a reference bit• Example: = 10,000 and timer interrupts every 5000 time units.

– Keep in memory 2 bits for each page.– Whenever a timer interrupts copy and sets the values of all reference bits to

0.– If one of the bits in memory = 1 page in working set.

• Not completely accurate. (Don’t know where within 5000 references)

• Improvement = 10 bits and interrupt every 1000 time units.

54

Page-Fault Frequency• Needs better way than working set to control

thrashing.• PFF establish “acceptable” page-fault rate:

– If actual rate too low, process loses frame.– If actual rate too high, process gains frame.

55

Operating-System Examples

• Windows NT– Clustering: bringing in not only the faulted page, but

also pages surrounding it.– Working-set minimum and working-set maximum.– Automatic working-set trimming

• Solaris 2– Kernel maintain sufficient free memory. If free

memory is not enough, use pageout procedure.

§10.7

56

Other Considerations• Prepaging: prevent initial large number of page faults by bring

into memory at one time all the pages needed.– Cost less than servicing corresponding page faults?

• Page size selection considerations:– Because each active process must have its own copy of the page table, a

large page size is desirable.– To minimize internal fragmentation, we need a small page size.– A desire to minimize I/O time argues for a larger page size.– A smaller page size should result in less I/O and less total allocated

memory.– To minimize the number of page faults, we need to have a large page

size.

§10.8

The problem has no best answer. However, the trend is toward larger

page size.

57

Other Consideration (Cont.)• Program structure: System performance can be improved if the user (or

compiler) has an awareness of the underlying demand paging.– int A[][] = new int [128,128];– Each row is stored in one page– One frame – Program 1:

for (int j = 0; j <A.length; j++)for (int i = 0; i <A.length; i++)A[i][j] := 0;

128 x 128 page faults – Program 2 :

for (int i = 0; i < A.length; i++)for (int j = 0; j < A.length; j++)A[i][j] := 0;

128 page faults

58

Other Consideration (Cont.)• Careful selection of data structures and programming structures

can increase locality and hence lower the page-fault rate– Stack, good. Hash table, bad.

• Compiler and loader can have effect on paging.– Separating code and data and generating reentrant code for read only

code pages. (Don’t need to be paged out for replacing.)– Loader can avoid placing routines across page boundaries, keep each

routine completely in one page.

• The choice of programming language– C uses pointers, will randomize access to memory. – Java has no pointers, better locality of reference than C or C++.

59

Other Consideration (Cont.)• I/O Interlock

– When demand paging is used, we sometimes need to allow some of the pages to be locked in memory.

– One situation: I/O on user memory. (Fig. 10.20)– A lock bit is associated with every frame. If the frame is locked,

it cannot be selected for replacement.– Frequently, some or all of the OS kernel is locked into memory.– frame read in by low-priority process may be selected for

replacement by a high-priority process immediately while entering the ready queue and wait to be selected by the CPU scheduler. Use lock bit can prevent this situation.

61

Other Consideration (Cont.)

• Virtual memory is the antithesis of real-time computing, because it can introduce unexpected, long-term delays in the execution of a process while pages are brought into memory.

• Real-time systems almost never have virtual memory.

1 chapter 10 virtual memory 2 contents background demand paging performance of demand paging page...

Documents