1 distributed file system nhóm báo cáo : lê tu ấ n anh nguy ễ n h ả i duy Đ ặ ng thanh...

68
1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tuấn Anh Nguyễn Hải Duy Đặng Thanh Linh Trần Trung Hiếu 50500892 Nguyễn Hoàng Nam Computer Science Distributed file system.

Upload: marcus-harrison

Post on 24-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

1

DISTRIBUTED FILE SYSTEM

Nhóm báo cáo :

Lê Tuấn Anh

Nguyễn Hải Duy

Đặng Thanh Linh

Trần Trung Hiếu 50500892

Nguyễn Hoàng Nam

Computer Science Distributed file system.

Page 2: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

2

Content:

I. Distributed file system design.

II. Distributed file system Implementation

III. Network file system (NFS)

IV. Trends in distributed file system.

Computer Science Distributed file system.

Page 3: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

What’s Distributed File System?

Distributed File System (DFS) is a mechanism for sharing files

DFS is used to make files distributed across multiple servers appear to users as if they reside in one place on the network

DFS provides a mechanism to create logical views of folders and files regardless of where those files are physically located on the network

Computer Science Distributed file system.

Page 4: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

What’s Distributed File System?(cont.)

Computer Science Distributed file system.

Page 5: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

File Service

Specify what the file system offers to its clients to manipulate on shared files ex: read,write…on files

Implemented by a user/kernel process called file server

A system may have one or several file servers running at the same time

Computer Science Distributed file system.

Page 6: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

File Service (cont.)

Two models for file servicesupload/download: files move between server and clients, few operations (read file & write file), simple, requires storage at client, good if whole file is accessed

remote memory access: files stay at server, reach interface for many operations, less space at client, efficient for small accesses

Computer Science Distributed file system.

Page 7: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

File Service (cont.)

Computer Science Distributed file system.

Page 8: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Directory Service

Provide operations for :

creating and deleting directories

naming and renaming files

moving files from one directory to another

entering, removing, looking up files in one directory

Computer Science Distributed file system.

Page 9: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Naming Transparency

Naming is the mapping between logical and physical objects. Ex: a user filename maps to <cylinder,sector> In a conventional file system, it's understood where the file actually resides; the system and disk are known. In a transparent DFS, the location of a file, somewhere in the network, is hidden File replication means multiple copies of a file; mapping returns a SET of locations for the replications.

 

Computer Science Distributed file system.

Page 10: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Naming

Transparency(cont.)

Location transparency: the path name gives no hint as to where the file (or other object) is located.ex: /server1/dir1/x specifies x is located on server1 but

it does not tell where that server1 is located -> server can move the file in the network without changing the path

Location independence: possible to remove one file among servers which not change the path name.

Computer Science Distributed file system.

Page 11: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Naming Schemes

Machine + path naming, such as /machine/path

Mounting remote file system onto the local file hierarchy

A single name space that looks the same on all machines

Computer Science Distributed file system.

Page 12: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Two level naming

Symbolic name (external), e.g. prog.c; binary name (internal), e.g. local i-node number as in Unix

Directories provide the translation from symbolic to binary names

Binary name formati-node: no cross references among servers

(server, i-node): a directory in one server can refer to a file on a different server

{binary_name}: binary names refer to the original file and all of its backups when looking up

Computer Science Distributed file system.

Page 13: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

File Sharing Semantics

UNIX semantics: total ordering of R/W events easy to achieve in a non-distributed system

in a distributed system with one server and multiple clients with no caching at client, total ordering is also easily achieved since R and W are immediately performed at server

Session semantics: writes are guaranteed to become visible only when the file is closed

if two or more clients simultaneously write: one file (last one or non-deterministically) replaces the other

Computer Science Distributed file system.

Page 14: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

File Sharing Semantics (cont.)

Immutable files: create and read file operations (no write)

writing a file means to create a new one and enter it into the directory replacing the previous one with the same name: atomic operations

two processes try to replace the same file at the same time: last copy or nondeterministically

what happens if a file is replaced while another process is busy reading it

Transaction semantics: mutual exclusion on file accesses; either all file operations are completed or none is. Good for banking systemsComputer Science Distributed file system.

Page 15: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

15Computer Science Distributed file system.

II.DFS Implementation

File usage- Measurements.

- File Usage Pattern(Observed in a study by Satyanarayanan ).

System Structure- File-server and Directory-server Organization.

- Special attention to alternative approaches.

Page 16: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

16Computer Science Distributed file system.

File usage- Measurements

- Static measurements:

* Represent a snapshot of the system at a certain instant.

* Made by examining the disk to see what is on it.

- Dynamic measurements:

* Modifying the file system to record all operations to a log for subsequent analysis

Page 17: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

17Computer Science Distributed file system.

File usage- Measurements

- Static measurements:

The distribution of files size.

The distribution of file types.

The amount of storage occupied by files of various types and size.

- Dynamic measurements:

The relative frequency of various operations

The number of files open at any moment

The amount of sharing that takes place

Page 18: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

18Computer Science Distributed file system.

File Usage- Measurement Problems

- How typical the observed user population is?

Satyanarayanan's measurements were made at a university -> Also apply to industrial research lab or office automation project or banking system?

- Watching out for artifacts of the system being measured

Ex: Distribution of file names in an MS-DOS system- File names are never more than 8 characters( plus an optional three- characters extension)

- Made on more-or-less traditional UNIX systems. Whether or not they can be transferred or extrapolated to distributed systems

Page 19: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

19Computer Science Distributed file system.

File Usage- File Usage Pattern

Observed in a study by Satyanarayanan (1981)- Most files are small (< 10K)

- Reading is much more frequent than writing

- Most R&W accesses are sequential (random access is rare)

- Most files have a short lifetime -> create the file on the client

- File sharing is unusual -> caching at client

- The average process uses only a few files

Page 20: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

20Computer Science Distributed file system.

Server System Structure

Are client and server different?

- Some system, all machines run the same basic software -> any machine can offer file-service to the public- offer names of selected directories so that other machines can access them.

- The other systems, the file server and directory server are just user programs-> run client and server software on the same machines or no

Page 21: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

21Computer Science Distributed file system.

Server System Structure

Are client and server different?

- The other extreme systems have clients and server are on different machine.

Page 22: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

22Computer Science Distributed file system.

Server System Structure

File + directory service: combined or not ?

- Combine file service and directory service into a single server that handles all the directory and file calls.

- Keep file service and directory service separate: Directory-server map symbolic name onto its binary name.File-server with the binary name to read or write the file.

Page 23: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

23Computer Science Distributed file system.

Server System Structure

Separating File + directory service-Advantage

Produce simpler software

-Disadvantage

Require more communications

Page 24: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

24Computer Science Distributed file system.

Server System Structure

Separating File + directory service Example: Look-up a/b/c

Client sends a symbolic name to the directory-server-> binary name given by file-serverDirectory-hierarchy be partitioned among multiple servers: -1st directory on sever 1 contain an entry a for another directory on server 2.- 2nd directory on sever 2 contain an entry b for another directory on server 3.- 3rd directory on sever 3 contain an entry c for a file.- File with its binary name.

Page 25: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

25Computer Science Distributed file system.

Server System Structure

-Client send a message -> server 1-Server 1 finds a and sees the binary name refers to another server -> (1) tell the client which hold b•Requires the client to know which server holds which directory -> require more messages.

Separating File + directory service Example: Look-up a/b/c

Page 26: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

26Computer Science Distributed file system.

Server System Structure

-Client send a message -> server 1-Server 1 finds a and sees the binary name refers to another server -> (2) forward the remainder of the request to server 2.• Efficient • Can not use RPC (Remote Procedure Call) because the process which the client sends the message to is not one that sends the reply

Separating File + directory service Example: Look-up a/b/c

Page 27: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

27Computer Science Distributed file system.

Server System Structure

Separating File + directory service

Problem

Path names look up, especially with multiple directory servers can be expensive.

Cache directory hints at client to accelerate the path name look up – directory and hints must be kept coherent

Page 28: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

28Computer Science Distributed file system.

Server System Structure

Another question

Whether or not file, directory and other servers should keep state information about clients ?

- Yes Stateful server.

- No Stateless server.

Page 29: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

29Computer Science Distributed file system.

Server System Structure

requests are self-contained better fault tolerance open/close at client (fewer

messages) no space reserved for tables thus, no limit of open files no problem if client crashes

shorter messages better performance (info in

memory until close) open/close at server file locking possible read ahead possible

Stateless Server Stateful ServersStateless vs. Stateful

Page 30: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

30

Main memoryMain memory

Caching

Definition: A cache is a block of memory for temporary storage of data likely to be used again.

Index Data

0 xyz

1 pdq

2 abc

3 ght

Cache MemoryCache Memory

Index Tag Data

0 2 abc

1 0 xyz

Computer Science Distributed file system.

Page 31: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

Caching

There are four potential places to store files, or parts of files:

-The Server’s disk.

-The Server’s main memory.

-The Client disk.

-The Client ‘s main memory.

These different storage locations all have different properties .

Computer Science Distributed file system.

Page 32: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

32

Caching

Computer Science Distributed file system.

Page 33: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

33

Caching-Store all file in the server’s disk.

Advantages:

-Plenty of space.

-The file are accessible to all clients .

-Have one copy of each file ->no consistency problems arises.

Problem:

-Performance: the file must be transferred from the server’s disk to the server’s main memory,and then again over the network to the client’s main memory.

Computer Science Distributed file system.

Page 34: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

34

Caching files in the server's main memory.Advantages:

-Eliminates the disk transfer.

-Keep its memory and disk copies synchronized

Problems:

-Network transfer still has to be done.

-What is the unit the cache manages?(whole files or disk blocks ).

-What to do when the cache fills up and something must be evicted.(one of algorithm :LRU).

Computer Science Distributed file system.

Page 35: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

35

Caching at client’s disk (if available):

-The disk holds more but is slower.

- If large amounts of data are being used, a client disk cache may be better.

- This method isn’t used in practice.

- In any event, most systems that do client caching do it in the client's main memory.

Computer Science Distributed file system.

Page 36: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

36

Cache in the client's main memory:

There are three options to decide where to put files:

-Inside each process address space: no sharing at client, it is effective only if individual processes open and close files repeatedly

-In the kernel: kernel involvement on hits, a kernel call is needed in all cases

-In a separate user-level cache manager: flexible and efficient if paging can be controlled from user-level

Computer Science Distributed file system.

Page 37: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

37

Cache in the client's main memory

Computer Science Distributed file system.

Page 38: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

38

Cache Consistency.

-Two clients simultaneously read the same file and then both modify it.

-Two files are written back to the server, the one written last will overwrite the other one.

- Client caching has to be thought out fairly carefully

-There are several ways to solve the consistency problem:

- Write through; Delayed write; Write on close;

Centralized control

Computer Science Distributed file system.

Page 39: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

39

Cache Consistency- Write-through algorithm

-When a cache entry (file or block) is modified, the new

value is kept in the cache, but is also sent immediately

to the server

-> high traffic, requires cache managers to check (modification time) with server before can provide cached content to any client

Computer Science Distributed file system.

Page 40: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

40

Cache Consistency -Delayed write

-Delayed write: coalesces multiple writes; better performance but ambiguous semantics .

*the client just makes a note that a file has been updated. Once every 30 seconds or so, all the file updates are gathered together and sent to the server all at once.

*entire sequence happens before time to send all modified files back to the server

Computer Science Distributed file system.

Page 41: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

41

Cache Consistency -Write-on-close

-Write-on-close: implements session semantics, write a file back to the server only after it has been closed.

Computer Science Distributed file system.

Page 42: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

42

Cache Consistency -Central control

-Central control: file server keeps a directory of open/cached files at clients -> Unix semantics, but problems with robustness and scalability; problem also with invalidation messages because clients did not solicit them

Computer Science Distributed file system.

Page 43: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

43

Replication:

-Multiple copies of selected files.

1. To increase reliability by having independent backups of each file.

2. To allow file access to occur even if one file server is down. A server crash should not bring the entire system down until the server can be rebooted.

3. To split the workload over multiple .By having files replicated on two or more servers, the least heavily loaded one can be used.

Computer Science Distributed file system.

Page 44: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

44

Replication transparency

Replication transparency

-explicit file replication: programmer controls replication

-lazy file replication: copies made by the server in background

-use group communication: all copies made at the same time in the foreground

Computer Science Distributed file system.

Page 45: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

45Computer Science Distributed file system.

Page 46: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

46

Replication-Update protocols:

Updating all replicas using a coordinator works but is not robust (if coordinator is down, no updates can be performed) => Voting: updates (and reads) can be performed if some specified # of servers agree.

Voting Protocol:A version # (incremented at write) is associated with each file

To perform a read, a client has to assemble a read quorum of Nr servers; similarly, a write quorum of Nw servers for a write

If Nr + Nw > N, then any read quorum will contain at least one most recently updated file version

For reading, client contacts Nr active servers and chooses the file with largest version #

For writing, client contacts Nw active servers asking them to write. Succeeds if they all say yes.

Computer Science Distributed file system.

Page 47: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

47

Replication-Update protocols:

Nr is usually small (reads are frequent), but Nw is usually close to N (want to make sure all replicas are updated). Problem with achieving a write quorum in the presence of server failures

Voting with ghosts: allows to establish a write quorum when several servers are down by temporarily creating dummy (ghost) servers (at least one must be real)

Ghost servers are not permitted in a read quorum (they don’t have any files)

When server comes back it must restore its copy first by obtaining a read quorum

Computer Science Distributed file system.

Page 48: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

III.Network file system (NFS)

Three aspects of NFS:

The architecture

The protocol

The implementation

Computer Science Distributed file system.

Page 49: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

49Computer Science Distributed file system.

NFS Architecture

Basic idea NFS: An arbitrary collection of clients and servers.

Server export one or more directory for access by remote client.

List of director is maintained /etc/exports/

Page 50: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

50Computer Science Distributed file system.

NFS Architecture

Clients access exported directories by mounting them.

Clients diskless can mount on remote root directory and else.

To programs running on clients is no difference between a file located.

So, the basic architectural characteristic NFS is server exported directory and clients mount them remotely.

Page 51: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

51Computer Science Distributed file system.

NFS Protocol

The goal of NFS is to support heterogeneous system.

To accomblishing that must to define two client-server protocol.

The first NFS protocol handle mounting.

The second NFS protocol is for directory and file access.

Page 52: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

52Computer Science Distributed file system.

NFS Protocol: Mounting

Clients send the path name to a server and request to mount.

If legal, server return handle file to client else.

Handle file contains all information of file and directory.

Many clients contain /etc/rc to not manual intervention.

Page 53: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

53Computer Science Distributed file system.

NFS Protocol: Automounting

Allows a set remote directories to be associated with the local directory.

First time client sent a message to each of server and first one to reply wins.

Advantages:-If server down, it is possible to bring client up.

-allowing client to try to a set of servers in parallel.

Other, automounting most often used for read-only file and rarely change.

Page 54: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

54Computer Science Distributed file system.

NFS Protocol: Accessing

Clients send the message to server to manipulate and read and write file.

Most of UNIX system calls supported NFS exception OPEN and CLOSE.

To READ, clients send message to server and receive file handle.

To WRITE, clients only need a file handle, offset and the number of file desired.

Page 55: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

55Computer Science Distributed file system.

NFS Protocol: Accessing

Advantages

Servers don’t remember any information between calls to open connection

Stateless, not efficient when server crashes and recovers

In contrast, statefull

Page 56: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

56Computer Science Distributed file system.

NFS Protocol: Security

Problem: in stateless, locks can’t associated with open file

NFS uses UNIX protection mechanism with “rwx” bit

Other, use public key cryptography

Information about all of keys are maintained by NIS (Network Information Services)

NIS’s function is to store (key, value) and mapping between user name to password, machine name to network address

Page 57: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

57Computer Science Distributed file system.

NFS Inplementation

Page 58: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

58Computer Science Distributed file system.

NFS Inplementation

System call layerThis handle calls like OPEN, READ and CLOSE.

Virtual file system layer (VFS)Maintain a table with one entry for each open file

Entry is v-node (virtual, i-node)

Page 59: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

59Computer Science Distributed file system.

NFS Inplementation: Usage v-node

Mount The system administrator Call mount program

Make a MOUNT system call

Kernel asked NFS client to create r-node (remote, i-node) in internal table to hold the file handle

V-node point to r-node

Page 60: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

60Computer Science Distributed file system.

NFS Inplementation: Usage v-node

OPENKernel base on some point during parsing the name.

Kernel asked NFS client code to OPEN file

NFS client lookup in remain table and report back to VFS layer

Put in its table a v-node that point to r-node

Page 61: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

61Computer Science Distributed file system.

NFS Inplementation: Usage v-node

READThe caller is given a file descriptor for the remote file

VFS locates the corresponding v-node

Transfers between client and serverMake in large chunks, normally 8192 bytes

caching

Page 62: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

62Computer Science Distributed file system.

IV.Trends In Distributed File Systems

Some Problem make changes in File System :

New Hardware

Scalability

WAN

Mobile Users

Fault Tolerance

Mulimedia

Page 63: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

63Computer Science Distributed file system.

New Hardware

Well Designed Hardware can help solve problem :

Page 64: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

64Computer Science Distributed file system.

Scalability

Distributed file systems is toward lager . Old algorithm may not work and may cause bottle neck problem

A general way to solve this problem is partition the systems into smaller units which are relatively independent

Page 65: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

65Computer Science Distributed file system.

WAN

Most current work now on distributed systems focuses on LAN-based systems but it will be interconnected to form transparent distributed systems covering countries and continent . So what kind of file system would be need to serve all the world ?

A larger system lead to a large variety encounter for example what format one should use for files containing floating-pint numbers .

Page 66: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

66Computer Science Distributed file system.

Mobile Users

Laptop ,pocket pc , smart phone can be found every where these days and they are multiplying like rabbits . However the connection may not good at all .

And solution is based on caching.

Remote control

Page 67: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

67Computer Science Distributed file system.

Fault Tolerance

If the a system goes down for an hour there are many serious problem so the demand for systems that essentially never fail will grow.

File replication become an essential requirement .

Page 68: 1 DISTRIBUTED FILE SYSTEM Nhóm báo cáo : Lê Tu ấ n Anh Nguy ễ n H ả i Duy Đ ặ ng Thanh Linh Tr ầ n Trung Hi ế u 50500892 Nguy ễ n Hoàng Nam Computer ScienceDistributed

68Computer Science Distributed file system.

Multimedia

Real time conference , video on demand or multimedia will need completely different file system .