distributed distributed systemscs.brown.edu/courses/csci1380/s20/lectures/l22_2020.pdf · 2020. 4....

Distributed Distributed Systems

L22: Distributed File SystemsTheophilus BensonCS1380 Spring 20

Todays Agenda

• General Distributed File Systems

• Industry Use Cases• Google File System (GFS)

• Next Class• MongoDB (Guest lecture)• Kafka (LinkedIn’s Queue Processing)

WhatisaFile?

• Ablobofbinary?

• Asetofblobs?• Thinkabook:TableofContents+ chapters

• Indexà inode (maprangestodatablocks)• Chaptersà Datablocks

• Howaboutdirectories?• Howaboutfilepermissions?

Data21010101010100100

File11010101010100100

File1Data1Data2

Data11010101010100100

WhatisaDirectory?

• Directory->mapnamestoFileIDs• DirectoryalsocontainsDirectory• InLinuxadirectory--->alsoafile

RootDirectory• File1->IDX• File2à IDY• Dir1à IDZ

Dir1• File3->IDM• File4->IDC

File1Data1Data2

File5Data8Data9

WhatisaFileSystem?

• Filesystemsà systemthatmanagesfiles

• Provides• APIforApplications tointeractw/files• Algorithms forsecuringfiles(access control)• Maintainmetadataaboutafile

Application Application

FileSystem

File1 FileN

FileMetaData• Filelength(size)• Timestamp• Location• Referencecount• Type• accesscontrol• Owner

ModifiableByAPP

ModifiableByFileSystem

Distributed File Systems (DFS)

Local Versus Distributed File System

• Failure Implications:• Local: all components are down• Distributed: only some components are down others keep operating

• Performance Implications• Local: interactions are function calls à very fast• Distributed: interactions are RPC calls à variable speed

Client

Storage

Storage Storage Storage

Client

50ms 50ms100ms

Semantics

At-least-once(1 or more calls)

At-most-once(0 or 1 calls)

Transparency Properties of a Distributed File System (DFS)

Client Program/API• Access --> same API for remote/local files• Location à same ``name’’ for remote/local file• Mobility à client should be unaware of files moving

System level Performance• Performance à as workload grows: performance is OK• Scalability à# of files grow: performance is OK Storage Storage Storage

Client

50ms 50ms100ms

Performance Optimizations

• Caching: Client Versus Server Side• Client: minimizes load on server and improves read latency• Server: improves performances

Server-side Caching: Write Issues

• Write-through caching• Write-through: on every write, write to mem à disk à report OK• All writes persist to Disk because Ack. Which provides poor perf. But good consistency

StorageClient

BlockBlock Block

Server-side Caching: Write Issues

• Commits• Commit: on file close, commit/flush all writes to disk.• Writes are to memory until commit à ensures performance but consistency issues

StorageClient

BlockBlock BlockCommit

• Server Side:• Write Caching: potential consistency issues

• Commit: on file close, commit/flush all writes to disk.• Writes are to memory until commit à ensures performance but consistency issues• Write-through: on every write, write to mem à disk à report OK• All writes persist to Disk because Ack. Which provides poor perf. But good consistency

• Read Caching:• Store recently read blocks in memory for fast/quick access

Client-Caching: Issues

• Locks/Leases• Write/reads are local provided you have a lock/lease

• Two types of locks/leases• Writes: only one client have have this lock• Read: multiple clients have can a read lock. When a write lock is granted, read locks

are revoked

Storage

Client

BlockBlockBlock

CacheBlockBlockBlock

Client-Caching Tradeoffs

Storage

Client

BlockBlockBlock

CacheBlockBlockBlock

Locks Versus Leases

• Locks: client requests, server grants• Client explicitly revokes/gives up lock

• Failure recovery requires tracking locks• Server must track all clients (heartbeats)• On client failure, need complicated

procedure to recover locks (revoke locks)

• Leases: time limit on how long you can hold a resource

• Client Must periodically renew lease• If client does not renew, lock is lost

• Failure recovery is easy• Server doesn’t need to track clients, just leases• On client failure, only need to wait until lease

time out before handing off resource to someone else

Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency

Storage

Client getLease(K)

lease(K, 60s)

renewLease(K)

Storage

Client getLock(K)

revokeLock(K)

Periodically renew or loose access

Microsoft’s Opportunistic Lock (not to be confused with optimistic locking)

https://blogs.msdn.microsoft.com/openspecification/2009/05/22/client-caching-features-oplock-vs-lease/

Opportunistic Locking

Opportunistic LockingOpportunistic because server only grants the locks if/when convenient.

• Server Side:• Write Caching: potential consistency issues

• Commit: on file close, commit/flush all writes to disk.• Writes are to memory until commit à ensures performance but consistency issues• Write-through: on every write, write to mem à disk à report OK• All writes persist to Disk because Ack. Which provides poor perf. But good consistency

• Read Caching:• Store recently read blocks in memory for fast/quick access

• Client side:• Locks/leases are used to balance consistency versus performance.

Security and Access Control

• Approaches• Capabilities: client provided a security token which encodes client security.

Server validates the the token is correct and use the permissions in the token to control access to resources

• Access Lists: server maintains a list of permission, on every access the server consults this list to verify that client has permissions

• Approaches in DFS

• On open validate and give client a ‘capability’• Client uses ‘capability’ with all future requests

• For every request, client includes identify information

StorageClientRPC(API + capability)

StorageClient

ACL List

RPC(API + credentials)

Validate credentials have access to API

Capability includes access information

Potential Security Trade-offs: Capabilities versus ACL

• Capabilities: hard to revoke/change permissions• Permissions only checked at the beginning• Must send a revocation list and force reissue

• ACL: since centralized list --- easy to change and adopt• Every API call uses the list so changes reflected on the next call

StorageClientRPC(API + capability)

StorageClient

ACL List

RPC(API + credentials)

Validate credentials have access to API

Capability includes access information

Revoke List

GFS: Google File System

• Two types of nodes!• Master • Chuck servers

• Masters (few API calls)• Metadata operations

• Chunk servers (most client API calls)• Stores actual data

GFS Master

• Single/Centralized Master• Never store the files contents• Only store metadata/attributes

• Benefits of centralization• Easy to write code• Can implement sophisticated algorithms • Store all metadata in memory à perf. Boost

• Issues with centralization• Single point of failure à 2 backups

• Replicate to backups before responding to client• Not enough memory à buy more!!!

ChunkServer

Client

GFSMaster

GFSMasterShadow

Master

BackUp masters

GFS Attributes, Data, Metadata

Chunks-> server

Dir->FilesFiles->Chucks

DATA (i.e., Chunks)

GFS Attributes, Data, Metadata

• Chunk->server mappings not stored at master

• ChunkServer can die• Operator can manually change

ChunkServer• ChunkServer is the authoritative

voice on what it stores

• ChunkServer includes list of chucks in heartbeat msgs

• Master rebuilds a map of chunk->server locations after receiving heartbeat msgs

Chunks-> server

ChunkServer

Client

GFSMaster

GFSMasterShadow

Master

BackUp masters

DATA (i.e., Chunks)

Chunks-> server

GFS Consistency Semantics

• Types of API calls• Metadata operations: create/delete/rename• Data operations: read/writes

• All metadata à Masters: linearizable because master gives global ordering

• Read/writes à ChuckServer à potential consistency issues Chunks-> server

DATA (i.e., Chunks)

• Use heart beats to detect failures• Maintain three replicas of each chunk

• On failed server create a new replica

• Monitor load on each server• Periodically move replica/chunks around to

balance load

• Single masters provides global total ordering on meta data operations

• Master give leases to coordinate writes on data.

• One replica is denoted master for other replicas

• Use heart beats to detect failures• Maintain three replicas of each chunk

• On failed server create a new replica

• Monitor load on each server• Periodically move replica/chunks around to

balance load

• Single masters provides global total ordering on meta data operations

• Master give leases to coordinate writes on data.

• One replica is denoted master for other replicas

GFSMaster

GFSMasterShadow

Master

BackUp masters

ChunkServer

ChunkServerLeader for chunk

replica

Client

List of chuckservers

Writes

writes

HeartBeats(Chunk List)

LeaderLease

Open()

• Distributed files systems • Caching: performance versus consistency• Locks V. Leases: opportunistic locking• Server v. client side caches

• GFS: Google File Systems• Centralized Masters• Consistency semantics

distributed distributed systemscs.brown.edu/courses/csci1380/s20/lectures/l22_2020.pdf · 2020. 4....

Documents

radio over fiber distributed antenna systems for wireless...

distributed systems - george mason...

distributed systems vs compositionality

assignment 5 : distributed file systems (dfs)

distributed systems

unit 1 introduction to distributed information systems

distributed systems and parallel processing

p2p systems and distributed hash tables

middleware and distributed systems cluster and grid … ·...

cloud architecture & distributed systems trivia

ruby and distributed storage systems

distributed systems 第10章 distributed object-based...

ระบบปฏิบัติการ...

nfs & distributed systems issues

modul 6 kuliah distributed control systems

distributed systems chapter 5: synchronization

distributed antenna systems - connectivity wireless

cis 6930: distributed multimedia systems

distributed database systems dr. mohamed osman hegazi

unreliable failure detectors for reliable distributed...