santopia design features 컴퓨터. 소프트웨어연구소. 자료저장시스템워크샵 배경...
Post on 16-Jan-2016
220 Views
Preview:
TRANSCRIPT
SANtopiaDesign Features
컴퓨터 .소프트웨어연구소 ETRI
자료저장시스템워크샵
배경 인터넷의 확산으로 인한 데이터의 폭발적 증가
대용량 저장장치의 요구사항 증가 확장 (Scalability) 가능한 저장매체 필요
0
200
400
600
800
1,000
1,200
1,400
1,600
1998 1999 2000 2001 2002
Source: IDC(PetaB
ytes)
(Years)
저장매체 용량의 수요예측
자료저장시스템워크샵
배경 기존 서버 중심 환경의 문제점
성능상의 문제 발생 (Performance Bottleneck) 확장성의 한계 ( 저장장치 , 컴퓨팅 파워 )
Application Server
DB Server
ClientClient
Internet
Web Server
기존에 사용되던 서버 중심 저장장치 환경
자료저장시스템워크샵
SAN 기반 저장장치 수 많은 저장장치를 고속의 전용 네트워크 (Fiber Channel) 에 연결하여 대용량의 공유 저장매체를 제공하는 새로운 개념의 저장장치
Internet
Storage Area NetworkStorage Area Network
Web Server
Appl. Server
DB Server
FC Switch
ClientClient
RAID Tape Driver
RAID RAIDDisk Disk
FC Switch
배경
자료저장시스템워크샵
배경 SAN 의 수요 확대
년 평균 증가율 : 85% 스토리지 수요 증가율 (87%) 과 비슷한 증가율
01,5003,0004,5006,0007,5009,00010,50012,00013,50015,000
1996 1997 1998 1999 2000 2001 2002
Source: IDC
(Million
s of $)
(Years)
SAN(Storage Area Network) 시장 예측
자료저장시스템워크샵
배경SAN 은 대용량 저장장치를 지원하기 위한 새로운 개념의 저장장치 H/W 기술
SAN 하드웨어 기술- 데이터 공유- 성능 병목 해결- 사고 발생시 복구- 통합 관리
SAN의 가치를 더욱 높이기 위해서는 SAN Virtualization을 지원하는 시스템 소프트웨어가 제공되어야 함
대용량 공유 파일 시스템의 지원
H/W 독립적인 논리적 저장장치 지원
중앙집중식 시스템 매니지먼트 지원
- 대용량 저장 매체 지원- 저장장치 확장성 지원
추가 요구사항
자료저장시스템워크샵
배경 SAN Virtualization 시장 예측
SAN H/W 증가율보다 높은 증가율을 나타냄 (100% 이상 ) SAN Virtualization 시장규모는 SAN H/W 의 10%
수준
0
200
400
600
800
1,000
1,200
1,400
1997 1998 1999 2000 2001 2002 2003
Source: IDC
(Million
s of $)
(Years)
SAN Virtualization 시장 예측
자료저장시스템워크샵
SANtopia 란 ? S/W to provide SAN Virtualization
High Performance
• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing
High Performance
• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing
High Availability
• Fast recovery• Online backup• Snapshot
High Availability
• Fast recovery• Online backup• Snapshot
SANInfrastructure
SANInfrastructure
Shared File System
Shared File System
Logical VolumeDriver
Logical VolumeDriver System
ManagementSystem
Management
SANtopiaSANtopiaSANtopiaSANtopia
High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing
High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing
자료저장시스템워크샵
Features of SANtopia 64-bit File and File System Global File Sharing
Provide Global buffer
Open SAN File System Storages Cluster File System Centralized Lock Manager with Load Balancing
Not use device lock Integration of Buffer Manager and Lock Manager
Software RAID(0, 1, 0+1, 5, Concatenate) Comprised of three parts
Logical Volume Manager Global Shared File System Lock and Buffer Manager
자료저장시스템워크샵
SANtopia 구조
DiskDiskDiskDisk
File Manager
Global Lock & Buffer Manager
VNODE Interface System Call Interface
IP over SAN SCSI over SAN
• Mapping Management• Configuration Management
Logical Volume Manager
System Management• Performance Monitor• Online Backup• Scalability Management
• I/O Management• Mapping Management
• Inode Management• Log Management
• Recovery Management• BitMap Management• Transaction Management• File Operation Management
IP over SAN
자료저장시스템워크샵
SANtopia Logical Volume Manager
자료저장시스템워크샵
Features of LVM Volume Create/Remove On-line Volume Resize Dynamic Reconfiguration Software RAID(0, 1, 0+1, 5, Concatenation)
disk1 disk2 disk3 disk4
disk5 disk6 disk7 disk8
Volume 1 : Striping (RAID 0)
Volume 2 : Concatenation
Volume 3 : Striped parity (RAID 5)
Volume 4 : Striped Mirroring (RAID 0+1)
자료저장시스템워크샵
A Disk Layout
label
Private partition(physical partition)
Public partition(physical partition)
Logical Partition Information Disk Identifier Information about Logical
Volume
Allocation Bitmap
Mapping Info.
Logical partition
Logical partition
Logical partition
자료저장시스템워크샵
Volume Resize
Extend/Shrink Unit = Logical Partition
When a Volume is Striped Add Row Add Column
• Data Relocation Needed
자료저장시스템워크샵
Free Space Manager
Physical Allocation Bitmap Divide into fixed size units Each unit controlled by separate locks Entire bitmap is duplicated
Effects Increase Parallelism Get scalability Avoid bottleneck Reduce metadata search time
physical allocation bitmap
Logical partition
Logical partition
자료저장시스템워크샵
Mapping Manager Virtualization of Physical Storage
provide flexibility enable data movement between
Logical Partitions enable snapshot
Each Mapping Information Covered by one host Chained declustered for safety Same effects as Free Space Manager Flexible to fail-over
Host C
Logical partition
Host D
Logical partition
Host A
Logical partition
Host B
Logical partition
자료저장시스템워크샵
I/O Manager
Load Balancing of I/O Read Policy
Round-Robin Policy§ In case of same Capability
Preferred-Plex Policy§ In case of different Capability
자료저장시스템워크샵
SANtopia File Manager
자료저장시스템워크샵
Features of SANtopia File Mgr
Extent Based 64-bit File System 64-bit Address Support Large File
Dynamic inode allocation Multi-Level inode
Support Large Directory Extensible Hash based directory management
Fast Recovery Metadata Journaling
Inode Stuffing
자료저장시스템워크샵
SANtopia File System Layout
Boot Super Allocation Blocks ExtentBlock Block (inode, directory, data block) Bitmap
0 264-1
Extent based allocation Super Block : SANtopia file system information Allocation Block
No preallocated area for inode, directory entry, data block Extent based allocation (4KB ~ 64KB)
Extent bitmap Located end of address space(file system size) Need to distinguish from object type in Extent Allocation Bitmap Use 2 bit : 00 – not used, 01 – inode 10 – dir entry, 11 – data block
자료저장시스템워크샵
inode Dynamic allocation inode
No limitation of inode number No preallocated inode area Cf) ext2 file system
: 1 node per 4KB
Each inode size is 1 extent Fragmentation Stuffed inode for space efficiency
64-bit inode number Using unique ID in SANtopia
inode number(inode information)
file or directoryinformation
Data Block Pointer
or
Stuffed Data
Extent
자료저장시스템워크샵
inode structure
Dinode Info.
Double Indirect blocks
Double Indirect blocks
: Extent
Single Indirect blocks
Single Indirect blocks
……
…
Dynamic Multi-Level Inode Allocation
자료저장시스템워크샵
Directory(Extendible Hash)
DirInfo.
00
01
10
11
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Directory
Node(Extent)
Indirect
hash
roothash
2
4
자료저장시스템워크샵
Recovery
Journaling 기법 사용 Write in-core log buffer to
log-disk when metadata updates.
Log disk is circular buffer
Metadata modification operations(transaction)
create, remove, unlink, link, allocation, truncate, rename, …
Log
TransactionManager
TransactionManager
RecoveryManager
RecoveryManager
LogManager
LogManager
MetadataManager
MetadataManager Metadata
File Operation
(transaction)System
Manager(system recovery)
자료저장시스템워크샵
SANtopia Buffer and Lock Manager
자료저장시스템워크샵
Features of Buffer Manager(I)
Support Global File Sharing
Reduce disk I/O Sharing each buffer
Split distributed BM GBM are distributed(partitioned) on several nodes
Manage Global Buffer List and Local Buffer List Communication vs. Space overhead
Manage the logical global buffer Weak correctness of global buffer list Safe but not up-to-date
자료저장시스템워크샵
Features of Buffer Mgr(II)
Integration of buffer and lock message Overlapped with global lock manager Piggyback the buffer lists over lock messages
Reduce the number messages
Adopt write invalidation scheme For the sake of simplicity
Support buffer forwarding scheme Enlarging the performance by reducing the disk I/O
자료저장시스템워크샵
Structure of Buffer Manager
Local and Global Buffer Manager Decision of GBM : Inode hash
SANtopia Host SANtopia Host SANtopia Host
SANtopia Host(Global Buffer Server)
LBM(Local Buffer
Manager)
GBM(Global Buffer
Manager)
LBM(Local Buffer
Manager)
. . . . .
. . . . .
SANtopia Host(Global Buffer Server)
SAN(Sotrage Area Network)
LBM(Local Buffer
Manager)
LBM(Local Buffer
Manager)
LBM(Local Buffer
Manager)
GBM(Global Buffer
Manager)
자료저장시스템워크샵
Operations between GBM and LBM
Buffer list information GBM Server Failure
Local
Buffer
Manager
Global
Buffer
Manager
Buffer List for GBM
(Lock Message)
Buffer List for LBM
(Lock Message)
Local
Buffer
Manager
Global
Buffer
Manager
Buffer List for new GBM Server
Modifies Buffer Server Table for LBM
자료저장시스템워크샵
Features of Lock Manager Lock Mode
Shred lock and Exclusive lock
Lock Object 64bits inode - File Lock
Distributed(partitioned) on several nodes Host-based locking Overlapped with global buffer manager Global Lock Manager(GLM) vs. Local Lock Manager(LLM)
Delayed Lock Free Callback scheme for lock free
Callback by lock server No lock entrance after receiving a callback message
Recovery on host failure I/O Fencing Rebuild lock table: take locks from the failed host
자료저장시스템워크샵
Integration of Lock Mgr and Buffer Mgr
SANtopia Host SANtopia Host SANtopia Host
SANtopia Host(Global Buffer & Lock Server)
LBM(Local Buffer)
GBM(Global Buffer)
LBM
. . . . .
. . . . .LLM
LBM
LLM
LBM
LLM
LLM(Local Lock Table)
GLM(Global Lock
Table)
SANtopia Host(Global Buffer & Lock Server)
LBM(Local Buffer)
GBM(Global Buffer)
LLM(Local Lock Table)
GLM(Global Lock
Table)
SAN(Sotrage Area Network)
자료저장시스템워크샵
Operational Design(I)
local buffer manager
bufferbuffer
bufferbuffer
buffer
buffer
…
local lock manager
local lock 1
……
local lock 2 …
global buffer manager
bufferbuffer
bufferbuffer
buffer
buffer
…
global lock manager
host 1
……
host 2 …
globallock 1
lock(lock_id,mode,local_buffer_list), unlock(lock_id,local_buffer_list)
lock_grant( lock_id, mode, host_related_global_buffer_list)
buffer forwarding
call_back( lock_id, hrgbl)
invalidate at unlock
자료저장시스템워크샵
Operational Design (II)
Global Lock Manager Upon receiving lock request
Update global buffer list using the local_buffer_list
Upon receiving unlock request Grant lock before processing the unlock request Update global buffer list using the local_buffer_list
Upon granting lock Piggyback a part of global buffer list concerned with the host
Upon sending callback Piggyback a part of global buffer list concerned with the host
자료저장시스템워크샵
Operational Design (III)
Local Lock Manager Upon sending lock request
Piggyback the local buffer list Upon sending unlock request
Invalidate buffer related with the lock Piggy back the local buffer list
Upon receiving lock grant Save the piggybacked global buffer list
Upon receiving callback Prohibit the lock counter from being increased Unlock as soon as possible
자료저장시스템워크샵
Operational Design (IV)
Local Buffer Manager Upon receiving forward request
Send the requested buffer without validity check Of course, check whether the requested block is still cached If the buffer is already flushed, send an acknowledge signal
top related