Как linux работает с памятью — Вячеслав Бирюков
Post on 15-Jul-2015
68.835 Views
Preview:
TRANSCRIPT
-
Linux
-
?
3
-
?
?
?
?
MySQL MongoDB?
?
4
-
x86_64
Linux Kernel 2.6.32
5
-
(resident memory) , (RAM).
(anonymous memory) (without backing store).
Page fault (trap) . .
6
-
, .
4KB.
Huge Pages 2MB ( ).
7
page
page
page
page
page
0x0
0xFFFFFFFF
4KB
page
-
8
RAM
Swap
;
;
;
.
Paging/swapping
-
vercommit
:
sysctl vm.overcommit_memory 0 (default), 1, 2
sysctl vm.overcommit_ratio / vm.overcommit_kbytes
overcommit:
# cat /proc/meminfo
CommitLimit: 32973320 kB Committed_AS: 5510988 kB
9
-
NUMA SMP(UMA)
10
CPU 1
System Bus
:
# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 32735 MB node 0 free: 434 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 32768 MB node 1 free: 101 MB node distances: node 0 1 0: 10 21 1: 21 10
interconnect
mem bus mem bus
SMP
NUMA
CPU 2
CPU 1 CPU 2
RAM 1 RAM 2
RAM 1 RAM 2
-
NUMA
:
# numactl --interleave all command
11
Node 1 Node 2
56GB
30GB
Memory Nodes
-
Memory Zones
- , .
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
:
# grep zone /proc/zoneinfo
Node 0, zone DMA Node 0, zone DMA32 Node 0, zone Normal Node 1, zone Normal
12
-
Page Cache
.
Page Cache.
:
# free -m total used free shared buffers cached Mem: 64401 64101 299 0 161 60339 -/+ buffers/cache: 3600 60800Swap: 0 0 0
# grep Cached /proc/meminfo Cached: 61638200 kB
13
-
Read Page Cache
14
Disk Storage
read() syscall
Page Cache
no, miss
yes
Page Cache.
.
mincore Page Cache.
vmtouch Page Cache:
# vmtouch /var/lib/db/index Files: 1 Directories: 0 Resident Pages: 21365/21365 83M/83M 100% Elapsed: 0.004477 seconds
hit
-
Write Page Cache Page Cache ( open() c O_SYNC).
(dirty).
(writeback):
vm.dirty_expire_centisecs (fsflush/pdflush);
(kswapd);
fsync() msync();
(vm.dirty_ratio ). # grep Dirty /proc/meminfo Dirty: 9604 kB 15
-
:
stack; mmap; heap; bss; init data; text.
16
Stack (grows downwards)
Text (program code)
Initialized data
Uninitialized data (bss)
Heap (grows upwards)
unallocated memoryprogram break
(brk)
top of stack
mmap region
RLIMIT_STACK
-
ps
top
cat /proc//status VmPeak: 8908 kB VmSize: 8908 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 356 kB VmRSS: 356 kB VmData: 180 kB VmStk: 136 kB VmExe: 44 kB VmLib: 1884 kB VmPTE: 36 kB VmSwap: 0 kB
17
-
Virtual Memory Area (VMA)
(virtual memory area VMA) ( 08048000-0804c000).
:
(r);
(w);
(e).
:
(p);
(s).
18
-
VMA
:
# pmap -x
Address RSS Dirty Mode Mapping 00007f0356b23000 76 76 rwx-- [ anon ] 00007f0356b38000 392 392 rwx-- [ anon ]00007f0356bb9000 34708 0 r-xs- some_mapped_file00007f0359272000 21876 0 r-xs- some_mapped_file2
VMA :
# cat /proc//maps
:
# cat /proc//smaps
19
-
20
Private Shared
Anonymous stack malloc() mmap(ANON, PRIVATE) brk()/sbrk()
mmap(ANON, SHARED)
File-backed mmap(fd, PRIVATE) binary/shared libraries mmap(fd, SHARED)
-
malloc() free()
glibc malloc() :
heap (128KB);
mmap() .
free() .
-
malloc() brk()
22
Heap (grows upwards)
program break (brk)
unallocated memory
Heap (grows upwards)
new program break
(brk)unallocated memory
1. 2.
110 KB100 KB
heap brk(), heap.
-
mmap() munmap()
23
mmap area
/var/lib/db/index
mmap(fd, )
mmap() .munmap() .
-
mmap()
:
MAP_PRIVATE ;
MAP_SHARED .
:
PROT_READ;
PROT_WRITE.
24
-
Linux .
25
-
Page fault (demand paging)
26
Allocated and mapped memory
Only allocated
Unallocated
Address space of a process
Pagewrite syscall
Page Table
MMU
TLB
translate to physical
RAMpage fault
Pagepage mapping
Minor Page Fault .
-
Page Fault
Minor ;
major ;
invalid (segmentation fault).
27
-
Page fault
:
1. Unallocated;
2. Allocated, but unmapped (not yet faulted);
3. Allocated, and mapped to main memory (RAM);
4. Allocated, and mapped to the physical swap device (disk);
:
RSS 3- ;
Virtual Memory Size : 2 + 3 + 4.
28
-
Copy On Write (COW)
29
#0
#2
#1
free#3
#0
#1
#2
#3#4
Real Memory
free#4
#0
#1
#2
#3#4
Parent Child
1. fork().
#0
#2
#1
change#3
change
#1
#2
#3#4
Real Memory
free#4
#0
#1
#2
#3#4
Parent Child
2. .
-
30
-
malloc()
31
free
read(fd, buf, 8192)
Kernelfree
freefree
/bin/ls
find
Page Cache
Heap pages
1. /var/m.log. 2. .
miss
m.log#0free
/bin/ls
Page Cache
libc.so
3. .
free
m.log#1
filledfilled
Heap
4. user space
Kernel
KernelDisk
Storage
libc.so
-
malloc()
.
user space CPU .
32
-
mmap
33
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
Page Cache.
#2
-
mmap minor page fault
34
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
m.log#2
, Page Cache.
minor page fault
-
mmap major page fault (1)
35
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
free
, Page Cache
major page fault
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
m.log#2
Disk Storage
1. Page Cache major page fault.
2. .
-
mmap major page fault (2)
36
#0#1
m.log#0free
/bin/ls
Page Cache
libc.so
m.log#1
mmap area
mmap()
#2
m.log#2
3. Page Cache.
-
mmap()
37
.
Lazy loading.
.
.
.
-
sar
-B: paging statistics:
02:46:04 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff02:46:05 0,00 134,00 1743,00 0,00 5978,00 0,00 0,00 0,00 0,0002:46:06 0,00 108,00 9094,00 0,00 11801,00 0,00 0,00 0,00 0,00
-r: memory utilization: 02:41:50 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact02:41:51 346644 65599996 99,47 191340 61669768 5410704 8,20 34115072 2938446402:41:52 345900 65600740 99,48 191340 61669956 5410596 8,20 34114568 29384568
-R: memory statistics:
02:44:50 frmpg/s bufpg/s campg/s 02:44:51 393,00 4,00 45,00 02:44:52 -200,00 1,00 35,00
38
-
Page Cache1. Page Cache:
open(fd, O_DIRECT) ( MySQL InnoDB).
2. , :
posix_fadvide(fd, POSIX_FADV_DONTNEED);
madvise(addr, MADV_DONTNEED);
mincore().
3. vmtouch ( posix_fadvide):
vmtouch -e /var/lib/db/index
39
-
readahead
readahead :
readahead();
madvise();
posix_fadvise();
blockdev --reportblockdev --setra .
40
-
(page reclaiming)
:
unreclaimable;
swappable;
syncable;
discardable.
41
-
free list
42
Free page list
Memory request
Page Cache Swap (kswapd) Kernel memory (slab allocator)
OOM Killer
vm.swappiness0 100
swap aggressivelyswap only to avoid an OOM
-
Page Scanning (kswapd)
43
min pages
high pages
low pages
background
synchronous
time
size ofavailable
free memory
vm.min_free_kbytes
-
LRU/2
44
Active List
Inactive Listhead tail
headtail
free page
Free List
referenced
referenced
tailhead
page allocation
free pages
reclaim
-
LRU
45
memory Node Zone cgroup (kernel 3.3):
Active anon;
Inactive anon;
Active file;
Inactive file;
Unevictable.
File backend LRU .
# cat /proc/meminfo Active: 32714084 kB Inactive: 30755444 kB Active(anon): 1612548 kB Inactive(anon): 264 kB Active(file): 31101536 kB Inactive(file): 30755180 kB
-
Out Of Memory Killer (OOM)
:
grep -i kill /var/log/messages*
(-16 15, -17 ):
echo -17 > /proc//oom_adj
pid:
cat /proc//oom_score 0
46
-
Memory cgroup :
;
+ swap;
OOM;
swappiness.
:
# cat memory.stat inactive_anon 0 active_anon 0 inactive_file 0 active_file 0 unevictable 0
47
-
Cgroup page reclaiming
Global reclaiming.
Target reclaiming.
48
-
49
Systems Performance: Enterprise and the Cloud
Linux Kernel DevelopmentLinux System Programming: Talking Directly to the Kernel and C Library
-
!
top related