santopia design features 컴퓨터. 소프트웨어연구소. 자료저장시스템워크샵 배경...

Post on 16-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SANtopiaDesign Features

컴퓨터 .소프트웨어연구소 ETRI

자료저장시스템워크샵

배경 인터넷의 확산으로 인한 데이터의 폭발적 증가

대용량 저장장치의 요구사항 증가 확장 (Scalability) 가능한 저장매체 필요

0

200

400

600

800

1,000

1,200

1,400

1,600

1998 1999 2000 2001 2002

Source: IDC(PetaB

ytes)

(Years)

저장매체 용량의 수요예측

자료저장시스템워크샵

배경 기존 서버 중심 환경의 문제점

성능상의 문제 발생 (Performance Bottleneck) 확장성의 한계 ( 저장장치 , 컴퓨팅 파워 )

Application Server

DB Server

ClientClient

Internet

Web Server

기존에 사용되던 서버 중심 저장장치 환경

자료저장시스템워크샵

SAN 기반 저장장치 수 많은 저장장치를 고속의 전용 네트워크 (Fiber Channel) 에 연결하여 대용량의 공유 저장매체를 제공하는 새로운 개념의 저장장치

Internet

Storage Area NetworkStorage Area Network

Web Server

Appl. Server

DB Server

FC Switch

ClientClient

RAID Tape Driver

RAID RAIDDisk Disk

FC Switch

배경

자료저장시스템워크샵

배경 SAN 의 수요 확대

년 평균 증가율 : 85% 스토리지 수요 증가율 (87%) 과 비슷한 증가율

01,5003,0004,5006,0007,5009,00010,50012,00013,50015,000

1996 1997 1998 1999 2000 2001 2002

Source: IDC

(Million

s of $)

(Years)

SAN(Storage Area Network) 시장 예측

자료저장시스템워크샵

배경SAN 은 대용량 저장장치를 지원하기 위한 새로운 개념의 저장장치 H/W 기술

SAN 하드웨어 기술- 데이터 공유- 성능 병목 해결- 사고 발생시 복구- 통합 관리

SAN의 가치를 더욱 높이기 위해서는 SAN Virtualization을 지원하는 시스템 소프트웨어가 제공되어야 함

대용량 공유 파일 시스템의 지원

H/W 독립적인 논리적 저장장치 지원

중앙집중식 시스템 매니지먼트 지원

- 대용량 저장 매체 지원- 저장장치 확장성 지원

추가 요구사항

자료저장시스템워크샵

배경 SAN Virtualization 시장 예측

SAN H/W 증가율보다 높은 증가율을 나타냄 (100% 이상 ) SAN Virtualization 시장규모는 SAN H/W 의 10%

수준

0

200

400

600

800

1,000

1,200

1,400

1997 1998 1999 2000 2001 2002 2003

Source: IDC

(Million

s of $)

(Years)

SAN Virtualization 시장 예측

자료저장시스템워크샵

SANtopia 란 ? S/W to provide SAN Virtualization

High Performance

• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing

High Performance

• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing

High Availability

• Fast recovery• Online backup• Snapshot

High Availability

• Fast recovery• Online backup• Snapshot

SANInfrastructure

SANInfrastructure

Shared File System

Shared File System

Logical VolumeDriver

Logical VolumeDriver System

ManagementSystem

Management

SANtopiaSANtopiaSANtopiaSANtopia

High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing

High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing

자료저장시스템워크샵

Features of SANtopia 64-bit File and File System Global File Sharing

Provide Global buffer

Open SAN File System Storages Cluster File System Centralized Lock Manager with Load Balancing

Not use device lock Integration of Buffer Manager and Lock Manager

Software RAID(0, 1, 0+1, 5, Concatenate) Comprised of three parts

Logical Volume Manager Global Shared File System Lock and Buffer Manager

자료저장시스템워크샵

SANtopia 구조

DiskDiskDiskDisk

File Manager

Global Lock & Buffer Manager

VNODE Interface System Call Interface

IP over SAN SCSI over SAN

• Mapping Management• Configuration Management

Logical Volume Manager

System Management• Performance Monitor• Online Backup• Scalability Management

• I/O Management• Mapping Management

• Inode Management• Log Management

• Recovery Management• BitMap Management• Transaction Management• File Operation Management

IP over SAN

자료저장시스템워크샵

SANtopia Logical Volume Manager

자료저장시스템워크샵

Features of LVM Volume Create/Remove On-line Volume Resize Dynamic Reconfiguration Software RAID(0, 1, 0+1, 5, Concatenation)

disk1 disk2 disk3 disk4

disk5 disk6 disk7 disk8

Volume 1 : Striping (RAID 0)

Volume 2 : Concatenation

Volume 3 : Striped parity (RAID 5)

Volume 4 : Striped Mirroring (RAID 0+1)

자료저장시스템워크샵

A Disk Layout

label

Private partition(physical partition)

Public partition(physical partition)

Logical Partition Information Disk Identifier Information about Logical

Volume

Allocation Bitmap

Mapping Info.

Logical partition

Logical partition

Logical partition

자료저장시스템워크샵

Volume Resize

Extend/Shrink Unit = Logical Partition

When a Volume is Striped Add Row Add Column

• Data Relocation Needed

자료저장시스템워크샵

Free Space Manager

Physical Allocation Bitmap Divide into fixed size units Each unit controlled by separate locks Entire bitmap is duplicated

Effects Increase Parallelism Get scalability Avoid bottleneck Reduce metadata search time

physical allocation bitmap

Logical partition

Logical partition

자료저장시스템워크샵

Mapping Manager Virtualization of Physical Storage

provide flexibility enable data movement between

Logical Partitions enable snapshot

Each Mapping Information Covered by one host Chained declustered for safety Same effects as Free Space Manager Flexible to fail-over

Host C

Logical partition

Host D

Logical partition

Host A

Logical partition

Host B

Logical partition

자료저장시스템워크샵

I/O Manager

Load Balancing of I/O Read Policy

Round-Robin Policy§ In case of same Capability

Preferred-Plex Policy§ In case of different Capability

자료저장시스템워크샵

SANtopia File Manager

자료저장시스템워크샵

Features of SANtopia File Mgr

Extent Based 64-bit File System 64-bit Address Support Large File

Dynamic inode allocation Multi-Level inode

Support Large Directory Extensible Hash based directory management

Fast Recovery Metadata Journaling

Inode Stuffing

자료저장시스템워크샵

SANtopia File System Layout

Boot Super Allocation Blocks ExtentBlock Block (inode, directory, data block) Bitmap

0 264-1

Extent based allocation Super Block : SANtopia file system information Allocation Block

No preallocated area for inode, directory entry, data block Extent based allocation (4KB ~ 64KB)

Extent bitmap Located end of address space(file system size) Need to distinguish from object type in Extent Allocation Bitmap Use 2 bit : 00 – not used, 01 – inode 10 – dir entry, 11 – data block

자료저장시스템워크샵

inode Dynamic allocation inode

No limitation of inode number No preallocated inode area Cf) ext2 file system

: 1 node per 4KB

Each inode size is 1 extent Fragmentation Stuffed inode for space efficiency

64-bit inode number Using unique ID in SANtopia

inode number(inode information)

file or directoryinformation

Data Block Pointer

or

Stuffed Data

Extent

자료저장시스템워크샵

inode structure

 

Dinode Info.    

Double Indirect blocks

Double Indirect blocks

: Extent

Single Indirect blocks

Single Indirect blocks

……

Dynamic Multi-Level Inode Allocation

자료저장시스템워크샵

Directory(Extendible Hash)

DirInfo.

00

01

10

11

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Directory

Node(Extent)

Indirect

hash

roothash

2

4

자료저장시스템워크샵

Recovery

Journaling 기법 사용 Write in-core log buffer to

log-disk when metadata updates.

Log disk is circular buffer

Metadata modification operations(transaction)

create, remove, unlink, link, allocation, truncate, rename, …

Log

TransactionManager

TransactionManager

RecoveryManager

RecoveryManager

LogManager

LogManager

MetadataManager

MetadataManager Metadata

File Operation

(transaction)System

Manager(system recovery)

자료저장시스템워크샵

SANtopia Buffer and Lock Manager

자료저장시스템워크샵

Features of Buffer Manager(I)

Support Global File Sharing

Reduce disk I/O Sharing each buffer

Split distributed BM GBM are distributed(partitioned) on several nodes

Manage Global Buffer List and Local Buffer List Communication vs. Space overhead

Manage the logical global buffer Weak correctness of global buffer list Safe but not up-to-date

자료저장시스템워크샵

Features of Buffer Mgr(II)

Integration of buffer and lock message Overlapped with global lock manager Piggyback the buffer lists over lock messages

Reduce the number messages

Adopt write invalidation scheme For the sake of simplicity

Support buffer forwarding scheme Enlarging the performance by reducing the disk I/O

자료저장시스템워크샵

Structure of Buffer Manager

Local and Global Buffer Manager Decision of GBM : Inode hash

SANtopia Host SANtopia Host SANtopia Host

SANtopia Host(Global Buffer Server)

LBM(Local Buffer

Manager)

GBM(Global Buffer

Manager)

LBM(Local Buffer

Manager)

. . . . .

. . . . .

SANtopia Host(Global Buffer Server)

SAN(Sotrage Area Network)

LBM(Local Buffer

Manager)

LBM(Local Buffer

Manager)

LBM(Local Buffer

Manager)

GBM(Global Buffer

Manager)

자료저장시스템워크샵

Operations between GBM and LBM

Buffer list information GBM Server Failure

Local

Buffer

Manager

Global

Buffer

Manager

Buffer List for GBM

(Lock Message)

Buffer List for LBM

(Lock Message)

Local

Buffer

Manager

Global

Buffer

Manager

Buffer List for new GBM Server

Modifies Buffer Server Table for LBM

자료저장시스템워크샵

Features of Lock Manager Lock Mode

Shred lock and Exclusive lock

Lock Object 64bits inode - File Lock

Distributed(partitioned) on several nodes Host-based locking Overlapped with global buffer manager Global Lock Manager(GLM) vs. Local Lock Manager(LLM)

Delayed Lock Free Callback scheme for lock free

Callback by lock server No lock entrance after receiving a callback message

Recovery on host failure I/O Fencing Rebuild lock table: take locks from the failed host

자료저장시스템워크샵

Integration of Lock Mgr and Buffer Mgr

SANtopia Host SANtopia Host SANtopia Host

SANtopia Host(Global Buffer & Lock Server)

LBM(Local Buffer)

GBM(Global Buffer)

LBM

. . . . .

. . . . .LLM

LBM

LLM

LBM

LLM

LLM(Local Lock Table)

GLM(Global Lock

Table)

SANtopia Host(Global Buffer & Lock Server)

LBM(Local Buffer)

GBM(Global Buffer)

LLM(Local Lock Table)

GLM(Global Lock

Table)

SAN(Sotrage Area Network)

자료저장시스템워크샵

Operational Design(I)

local buffer manager

bufferbuffer

bufferbuffer

buffer

buffer

local lock manager

local lock 1

……

local lock 2 …

global buffer manager

bufferbuffer

bufferbuffer

buffer

buffer

global lock manager

host 1

……

host 2 …

globallock 1

lock(lock_id,mode,local_buffer_list), unlock(lock_id,local_buffer_list)

lock_grant( lock_id, mode, host_related_global_buffer_list)

buffer forwarding

call_back( lock_id, hrgbl)

invalidate at unlock

자료저장시스템워크샵

Operational Design (II)

Global Lock Manager Upon receiving lock request

Update global buffer list using the local_buffer_list

Upon receiving unlock request Grant lock before processing the unlock request Update global buffer list using the local_buffer_list

Upon granting lock Piggyback a part of global buffer list concerned with the host

Upon sending callback Piggyback a part of global buffer list concerned with the host

자료저장시스템워크샵

Operational Design (III)

Local Lock Manager Upon sending lock request

Piggyback the local buffer list Upon sending unlock request

Invalidate buffer related with the lock Piggy back the local buffer list

Upon receiving lock grant Save the piggybacked global buffer list

Upon receiving callback Prohibit the lock counter from being increased Unlock as soon as possible

자료저장시스템워크샵

Operational Design (IV)

Local Buffer Manager Upon receiving forward request

Send the requested buffer without validity check Of course, check whether the requested block is still cached If the buffer is already flushed, send an acknowledge signal

top related