1 chapter 10 virtual memory 2 contents background demand paging performance of demand paging page...
Post on 20-Dec-2015
214 views
TRANSCRIPT
1
Chapter 10Virtual Memory
2
Contents
• Background• Demand Paging• Performance of Demand Paging • Page Replacement• Page-Replacement Algorithms• Allocation of Frames • Thrashing• Other Considerations• Demand Segmentation
3
Background
• Virtual memory – separation of user logical memory from physical memory.– Only part of the program needs to be in memory for execution.– Logical address space can therefore be much larger than
physical address space.– Need to allow pages to be swapped in and out.
• Virtual memory can be implemented via:– Demand paging – Demand segmentation
§10.1
4
Demand Paging• Similar to a paging system with swapping. (Fig. 10.2)• A lazy swapper never swaps a page into memory
unless that page will be needed.
programA
programB
swap out
swap in
§10.2需求分頁
置換程式
5
Valid-Invalid Bit• Valid: the page is both legal and in memory.
Invalid: not valid or currently on the disk.
6
• Access invalid bit causes a page-fault trap. The procedure of handling it:– 1. Check internal table for this process, to determine whether the
reference was a valid or invalid memory access.– 2. If the reference was invalid, we terminate the process. If it was valid, but
we have not yet brought in that page, we now page it in.– 3. Find a free frame.– 4. Schedule a disk operation to read the desired page into the newly
allocated frame.– 5. When the disk read is complete, modify the internal table kept with the
process and the page table to indicate that the page is now in memory.– 6. We restart the instruction that was interrupted by the illegal address
trap. The process can now access the page as though it had always been in memory.
Page fault procedure分頁錯誤
7
Fig. 10.4 Steps in handling a page fault
free frame
load Mi
page is onbacking store
Operatingsystem
reference
restartinstruction page table
reset pagetable
physicalmemory
bring inmissing page
trap
12
3
45
5
8
Software supports
• Need to be able to restart any instruction after a page fault.– If page fault occurs on the instruction fetch, we can
restart by fetching the instruction again.– If a page fault occurs while fetching an operand,
must fetch and decode the instruction again, and then fetch the operand.
9
Example – ADD
• ADD the content of A to B placing the result in C:1. Fetch and decode the instruction (ADD)2. Fetch A.3. Fetch B.4. Add A and B.5. Store the sum in C.
• If faulted at step 5, the whole instruction must be repeated again.
Not much! Only one instruction.
10
Major Difficulties• Major difficulty occurs when one instruction may modify
several different locations.– IBM System 360/370 MVC (move char.) instruction can move
256 bytes at a time. A page fault might occur after the move is partially done.
• Solution:– microcode computes and attempts to access both ends of
both blocks. If a page fault is going to occur, it will happen at this step, before anything is modified.
– Uses temporary registers to hold the values of overwritten locations.
11
Performance of Demand Paging• Memory access time (ma) now ranges from 10 to 200
nanoseconds.• Page Fault Rate 0 p 1.0
– if p = 0 no page faults (effective access time = ma)– if p = 1, every reference is a fault
• Effective Access Time (EAT) EAT = (1 – p) x ma + p x page fault time
• Example: EAT = (1 – p) x (100) + p x (25 milliseconds) = 100 + 24999900 x p
Directly proportional to the page-fault rate.
If one access out of 1000 causes a page fault, the effective access time is 25 microseconds.
=> slow down by a factor of 250
§10.2.2
12
Performance of Demand Paging
• If we want less than 10-percent degradation:110 > 100 + 25000000 x p10 > 25000000 x pp < 0.0000004
• It is important to keep the page-fault rate low in a demand-paging system. Otherwise, the effective access time increases, slowing process execution dramatically.
Can allow only less than 1 memory access out of 2500000 to page fault.
13
Performance of Demand Paging
• If we want less than 10-percent degradation:110 > 100 + 25000000 x p10 > 25000000 x pp < 0.0000004
• It is important to keep the page-fault rate low in a demand-paging system. Otherwise, the effective access time increases, slowing process execution dramatically.
Can allow only less than 1 memory access out of 2500000 to page fault.
Multiple-Choices Question:( ) It is important to keep the page-fault rate _____ in
a demand-paging system. (A) low (B) high (C) large (D) legal
Anwser: A
14
Process Creation
• Two techinques made available by virtual memory that enhance creating and running processes:– Copy-on-Wirte– Memory-Mapped Files
§10.3
15
Copy-on-Write
• Process creation using the fork() system call may initially bypass the need for demand paging by using page sharing.
• Traditionaly fork() worked by creating a copy of the parent’s address space for the child, however, for the child invoke exec() system call immediately after creation, it may not be necessary.
§10.3.1
16
Copy-on-Write
• Copy-on-write works by allowing the parent and child processes to initially share the same pages.
• If either process writes to the shared page, a copy of the shared page is created so that it will modify the new page without affecting others.
• Used by Windows 2000, Linux, and Solaris.
Copy-on-write pages
17
Virtual Memory Fork
• vfork() different from fork() with copy-on-write.• With vfork() the parent process is suspended and the
child process uses the address space of the parent.• If the child process changes any pages of the parent’s,
the altered pages will be visible to the parent once it resumes.
• Intended to be used when the child calls exec() immediately after creation: no copying of pages takes place ... efficient.
Should be used with caution.
18
Memory-Mapped Files
• System calls (open(), read(), write()) are used when access a file.
• Memory mapping a file treat I/O as routine memory accesses by mapping a disk block to pages in memory.
• Initial access to the file proceeds using ordinary demand paging, subsequent accesses are handled as routing memory accesses ... simplify the file manipulation.
§10.3.2
19
Memory-Mapped Files
• Mutiple processes may be allowed to map the same file into the virtual memory of each, to allow sharing of data.
• Writes by any of the processes modify the data in virtual memory and can be seen by all others that map the same section of the file.
• The memory-mapping system calls can also support copy-on-wirte functionality, allowing processes to share a file in read-only mode, byt to have their own copies of any data they modify.
20
Memory-Mapped Files
21
Page Replacement
• Over-allocation of memory when increasing the degree of multiprogramming (Fig. 10.5)
• OS may:1. terminate the user process. (not the best choice)2. swap out a process (section 10.5)3. page replacement
§10.4
過度配置
22
0 H
1 load M
2 J
3 M
3 v
4 v
5 v
i
0 monitor
1
2 D
3 H
4 load M
5 J
6 A
7 E
valid-invalid bit
frame
logical memoryfor user 1
page tablefor user 1
0 A
1 B
2 D
3 E
6 v
v
2 v
7 i
logical memoryfor user 2
page tablefor user 2
B
M
PC
Fig. 10.5 Need for page replacement
23
Basic Scheme• modifying page-fault service routine to include page
replacement.1. Find the location of the desired page on the disk2. Find a free frame: a. If there is a free frame, use it. b. If there is no free frame, use a page-replacement algorithm to select a victim frame. c. Write the victim page to the disk; change the page and frame tables accordingly.3. Read the desired page into the (newly) free frame; change the page and frame tables.4. Restart the user process.
• Two page transfers are required.
§10.4.1
24
0 I
f vvictimf
valid-invalid bit
frame
page table
change toinvalid
reset pagetable fornew page
swap outvictimpage
swapdesiredpage in
3
1
4
2
physical memory
Fig. 10.6 Page replacement
25
Reduce overhead• Use modify (dirty) bit to reduce overhead of
page transfers – only modified pages are written to disk.
• Can be used for read-only pages also – they may be discarded when desired.
• Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.
26
Reduce overhead• Use modify (dirty) bit to reduce overhead of
page transfers – only modified pages are written to disk.
• Can be used for read-only pages also – they may be discarded when desired.
• Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.
Short-Answer Question:
The modify (dirty) bit can be used to reduce overhead of page transfers, Please explain how.
27
Page-Replacement Algorithms• We want to select a page replacement algorithm with
the lowest page-fault rate.• Evaluate algorithm by running it on a particular string of
memory references (reference string) and computing the number of page faults on that string.
• Reference string can be generated artificially or trace and record the memory references of real system.
Produces a large number of data. To reduce it:1. Consider only the page number instead of the entire address.2. Immediate following references to the same referenced page
will not fault.
參考串
28
Reference String• 1,4,1,6,1,6,1,6,1,6,1• If we had 3 or more frames: 3 page faults.• If only one frame available: 11 page faults.• In general, we expect:
16
14
12
10
8
6
4
2
1 2 3 4 5 6 7 8number of frames
number of page
faults
29
FIFO Page Replacement• Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
• Associate with each page the time when that page was brought into memory or maintain a FIFO queue to hold all pages in memory.
• When a page must be replaced, the oldest page is chosen.
• 3 frames: 15 faults
§10.4.2
122222333330001111
0011111222223330000
77700000004442222777
10710212303240302107referencestring
pageframes
30
FIFO Page Replacement• Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
• Associate with each page the time when that page was brought into memory or maintain a FIFO queue to hold all pages in memory.
• When a page must be replaced, the oldest page is chosen.
• 3 frames: 15 faults
§10.4.2
122222333330001111
0011111222223330000
77700000004442222777
10710212303240302107referencestring
pageframes
Multiple-Choices Question:( ) In order to implement FIFO page replacement, a
timer can be associated with each page or maintain a _______ to hold all pages in memory (A) binrary tree (B) linked list (C) FIFO queue (D) pop-up stack
Anwser: C
31
FIFO Page Replacement
• Easy to understand and program.• Performance is not always good.• Example: Count contain a heavily used variable
that was initialized early and is in constant use.• Even if actively used page is chosen, everything
still works correctly – a fault occurs immediately to retrieve the active page back.
A bad replacement choice increases
the page-fault rate and slows process
execution, but does not cause incorrect
execution.
32
FIFO Page Replacement
• Easy to understand and program.• Performance is not always good.• Example: Count contain a heavily used variable
that was initialized early and is in constant use.• Even if actively used page is chosen, everything
still works correctly – a fault occurs immediately to retrieve the active page back.
A bad replacement choice increases
the page-fault rate and slows process
execution, but does not cause incorrect
execution.
True-false Question:( ) A bad page replacement choice increases the
page-fault rate and slows process execution, the process eventually will reach execution error.
Anwser: x
33
Belady’s Anomaly• Reference string: 1,2,3,4,1,2,5,1,2,3,4,5• The number of faults for 4 frames (10) is greater than
the number of faults for three frames (9)!
• For some page-replacement algorithms, the page fault rate may increase as the number of allocated frames increases.
16
14
12
10
8
6
4
2
number of frames
number of page
faults
1 2 3 4 5 6 7
異常現象
34
Optimal Page Replacement• OPT has the lowest page-fault rate of all algorithms. It never
suffer from Belady’s anomaly.• Replace page that will not be used for longest period of time.
• Only 9 page faults.• Unfortunately, OPT is difficult to implement – requires future
knowledge of the reference string.• Used for measuring how well your algorithm performs
(comparison studies).
§10.4.3
referencestring
pageframes
7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1
7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 7 7
0 0 0 0 0 0 4 4 4 0 0 0 0 0 0 0 0 0 0
1 1 1 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1
35
Optimal Page Replacement• OPT has the lowest page-fault rate of all algorithms. It never
suffer from Belady’s anomaly.• Replace page that will not be used for longest period of time.
• Only 9 page faults.• Unfortunately, OPT is difficult to implement – requires future
knowledge of the reference string.• Used for measuring how well your algorithm performs
(comparison studies).
§10.4.3
referencestring
pageframes
7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1
7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 7 7
0 0 0 0 0 0 4 4 4 0 0 0 0 0 0 0 0 0 0
1 1 1 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1
Multiple-Choices Question:( ) OPT is difficult to implement since it requires
______ knowledge of the reference string (A) past (B) future (C) human (D) intelligent
Anwser: B
36
LRU Page Replacement• If we use the recent past as an approximation of the
near future, then we will replace the page that has not been used for the longest period of time.
• This is the OPT looking backward in time.• 12 faults, still much better than 15 of FIFO• LRU is often used and is considered to be good.
referencestring
pageframes
7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1
7 7 7 2 2 2 2 4 4 4 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 3 3 3 3 3 3 0 0 0 0 0
1 1 1 3 3 3 2 2 2 2 2 2 2 2 2 7 7 7
§10.4.4
37
Implement LRU• Counters: associate with each page-table entry a time-of-use
field, and add to the CPU a logical clock or counter. Whenever a reference to a page is made, the contents of the clock register are copied to the time-of-use field. Replace the page with the smallest time value.
• Stack: Whenever a page is referenced, it is removed from the stack and put on the top. The top of the stack is always the most recently used page and the bottom is the LRU page.
2
1
0
7
4
7
2
1
0
4
4 7 0 7 1 0 1 2 1 2 7 1 2
38
LRU Approximation Page Replacement• Reference bit
– With each page associate a bit, initially -= 0– When page is referenced bit set to 1.– Replace the one which is 0 (if one exists). We do not know
the order, however.• Additional-Reference-Bits Algorithm
– gain additional ordering info. by recording the reference bits at regular intervals.
– shift register contains 00000000 means page has not been used for eight time periods. 11001011 has been used more recently than 01110111.
– The page with the lowest number is the LRU page.
§10.4.5
§10.3.5.1
參考位元
額外參考位元
39
• Second-chance algorithm (Clock Algorithm)– FIFO with reference bit.– If page to be replaced has reference bit = 0,
proceed to replace the page.– If page to be replaced has reference bit = 1.
then:• set reference bit 0.• leave page in memory.• replace next page (in clock order), subject to same
rules.– If a page is used often enough to keep its
reference bit set, it will never be replaced.– Degenerates to FIFO if all bits are set.
0
0
1
1
0
0
1
1
Next
Victim
referencebits
pages
Circular queueof pages
§10.4.5.2二次機會
40
• Enhanced second-chance algorithm– considering both the reference bit and the modify bit
as an ordered pair. Four classes:1. (0,0) neither recently used nor modified – best page to replace2. (0,1) not recently used but modified – not quite as good, because the page will need to be written out before replacement.3. (1, 0) recently used but clean – it probably will be used again soon.4. (1,1) recently used and modified – it probably will be used again soon, and the page will be need to be written out to disk before it can be replaced.
– Replace the first page encountered in the lowest nonempty class.
– Give preference to pages that have been modified to reduce the number of I/Os required.
§10.4.5.3
加強二次機會
41
Counting-Based Page Replacement• Keep a counter of the number of references.• Least frequently used (LFU) page-replacement algorithm
requires that the page with the smallest count be replaced.– suffers from the situation in which a page is used heavily during
the initial phase of a process, but then is never used again.– One solution: shift the counts right by 1 bit at regular intervals,
forming exponentially decaying count.• Most frequently used (MFU): the page with the smallest
count was probably just brought in and has yet to be used.
§10.4.6
以計數為基礎
42
Page-Buffering Algorithm1. Maintain a modified pages. Whenever the paging
device is idle, a modified page is selected, written to the disk, and reset the modify bit.
– increases the probability that a page will be clean when it is selected for replacement.
2. Keep a pool of free frames, but to remember which page was in each frame. Since the frame contents are not modified when a frame is written to the disk, the old page can be reused directly form the free-frame pool if it is needed before that frame is reused.
§10.4.7
43
Page-Buffering Algorithm1. Maintain a modified pages. Whenever the paging
device is idle, a modified page is selected, written to the disk, and reset the modify bit.
– increases the probability that a page will be clean when it is selected for replacement.
2. Keep a pool of free frames, but to remember which page was in each frame. Since the frame contents are not modified when a frame is written to the disk, the old page can be reused directly form the free-frame pool if it is needed before that frame is reused.
§10.4.7
True-false Question:( ) With free-frame pool, the frame contents are not
modified when a frame is written to the disk, therefore the old page can be reused directly and no I/O is needed.
Anwser: o
44
Allocation of Frames• Besides the frame a single instruction is located, its reference
may require another frame. More frames are needed if multi-level of indirection is used.
• Each process needs a minimum number of frames, which is defined by the computer architecture.
• Move instruction of PDP-11 and IBM 370 (worst case)• 6 pages to handle SS MOVE instruction:• instruction span 2 pages.• 2 pages each for its two operands.
• Worst case: multiple level of indirection causes entire virtual memory must be in physical memory.
• A limit on the levels of indirection must be placed
§10.5
45
Allocation Algorithms• Equal allocation – e.g., if 100 frames and 5
processes, give each 20 pages.• Proportional allocation – Allocate according to the
size of process.
mS
spa
m
sS
ps
iii
i
ii
for allocation
frames of number total
process of size
5762137
127
462137
10
127
10
62
2
1
2
1
a
a
s
s
m
Both processes share the available frames according to their “needs,” rather than
equally.
46
Global vs. Local Allocation
• Global replacement – process selects a replacement frame from the set of all frames; one process can take a frame from another.
• Local replacement – each process selects from only its own set of allocated frames.
• Problem with global replacement: a process cannot control its own page-fault rate. The same process may perform quite differently due to external circumstances.
全域與區域配置
47
Thrashing
• If a process does not have “enough” pages, the page-fault rate is very high. This leads to:– low CPU utilization.– operating system thinks that it needs to increase the
degree of multiprogramming.– another process added to the system.
• Thrashing a process is busy swapping pages in and out.
§10.6輾轉
48
Thrashing Diagram
• The effect of thrashing can be limited by using a local replacement algorithm: if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash also.
At this point, to increase CPU
utilization and stop thrashing, we must decrease the degree of multiprogramming.
49
Locality Model• To prevent thrashing, must provide a process as many
frames as it needs.• Locality model
– As a process executes, it moves from locality to locality.– A locality is a set of pages that are actively used together. (Fig.
10.15)– A program is generally composed of several different localities,
which may overlap. • Localities are defined by the program structure and its data
structures. The locality model states that all programs will exhibit this basic memory reference structure.
局部、區域
50
51
Working-Set Model working-set window• The set of pages in the most recent page references is the
working set.
• The accuracy of the working set depends on the selection of . too small: will not encompass the entire locality too large: may overlap several localities. is infinite: working set is the set of pages touched during
the process execution.
§10.6.2
….261577775162341234443434441323443444
t1 t2WS(t1)={1,2,5,6,7} WS(t2)={3,4}
工作集合模式
52
Working-Set Model• WSSi (working set size of Process Pi)
D = WSSi
where D is the total demand for frames. m = Total number of available frames
• if D > m Thrashing will occur, because some processes will not have enough frames. Then suspend one of the processes.
• Working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. optimizes CPU utilization.
53
Keeping Track of the Working Set• The working-set window is a moving window. (evolves
with time)• Approximate with a fixed interval timer + a reference bit• Example: = 10,000 and timer interrupts every 5000 time units.
– Keep in memory 2 bits for each page.– Whenever a timer interrupts copy and sets the values of all reference bits to
0.– If one of the bits in memory = 1 page in working set.
• Not completely accurate. (Don’t know where within 5000 references)
• Improvement = 10 bits and interrupt every 1000 time units.
54
Page-Fault Frequency• Needs better way than working set to control
thrashing.• PFF establish “acceptable” page-fault rate:
– If actual rate too low, process loses frame.– If actual rate too high, process gains frame.
55
Operating-System Examples
• Windows NT– Clustering: bringing in not only the faulted page, but
also pages surrounding it.– Working-set minimum and working-set maximum.– Automatic working-set trimming
• Solaris 2– Kernel maintain sufficient free memory. If free
memory is not enough, use pageout procedure.
§10.7
56
Other Considerations• Prepaging: prevent initial large number of page faults by bring
into memory at one time all the pages needed.– Cost less than servicing corresponding page faults?
• Page size selection considerations:– Because each active process must have its own copy of the page table, a
large page size is desirable.– To minimize internal fragmentation, we need a small page size.– A desire to minimize I/O time argues for a larger page size.– A smaller page size should result in less I/O and less total allocated
memory.– To minimize the number of page faults, we need to have a large page
size.
§10.8
The problem has no best answer. However, the trend is toward larger
page size.
57
Other Consideration (Cont.)• Program structure: System performance can be improved if the user (or
compiler) has an awareness of the underlying demand paging.– int A[][] = new int [128,128];– Each row is stored in one page– One frame – Program 1:
for (int j = 0; j <A.length; j++)for (int i = 0; i <A.length; i++)A[i][j] := 0;
128 x 128 page faults – Program 2 :
for (int i = 0; i < A.length; i++)for (int j = 0; j < A.length; j++)A[i][j] := 0;
128 page faults
58
Other Consideration (Cont.)• Careful selection of data structures and programming structures
can increase locality and hence lower the page-fault rate– Stack, good. Hash table, bad.
• Compiler and loader can have effect on paging.– Separating code and data and generating reentrant code for read only
code pages. (Don’t need to be paged out for replacing.)– Loader can avoid placing routines across page boundaries, keep each
routine completely in one page.
• The choice of programming language– C uses pointers, will randomize access to memory. – Java has no pointers, better locality of reference than C or C++.
59
Other Consideration (Cont.)• I/O Interlock
– When demand paging is used, we sometimes need to allow some of the pages to be locked in memory.
– One situation: I/O on user memory. (Fig. 10.20)– A lock bit is associated with every frame. If the frame is locked,
it cannot be selected for replacement.– Frequently, some or all of the OS kernel is locked into memory.– frame read in by low-priority process may be selected for
replacement by a high-priority process immediately while entering the ready queue and wait to be selected by the CPU scheduler. Use lock bit can prevent this situation.
60
61
Other Consideration (Cont.)
• Virtual memory is the antithesis of real-time computing, because it can introduce unexpected, long-term delays in the execution of a process while pages are brought into memory.
• Real-time systems almost never have virtual memory.