sistemes distribuïts en xarxa (sdx) facultat d’informàtica de barcelona (fib) universitat...

76
Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

Upload: alden-haskins

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

Sistemes Distribuïts en Xarxa (SDX)Facultat d’Informàtica de Barcelona (FIB)Universitat Politècnica de Catalunya (UPC)2013/2014 Q2

9. Peer-to-Peer Systems

Page 2: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (2) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems

• Structured P2P systems

Page 3: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (3) Peer-to-peer systems

Introduction

• Client-server model– Traditional model in distributed computing

Page 4: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (4) Peer-to-peer systems

Introduction

• Client-server model– Successful architecture until now

• New need for global scale computation– Modern Internet is characterized by the

instability of nodes and links• Sudden spikes in demand for services

– Scalability and robustness problems with client-server model• Centralized resources are potential performance

bottlenecks and points of failure

Page 5: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (5) Peer-to-peer systems

Introduction

• Peer-to-Peer (P2P) model

Page 6: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (6) Peer-to-peer systems

Introduction

• Peer-to-Peer (P2P) model– A P2P computer network refers to any

network that does not have fixed clients and servers, but a number of peer nodes that function as both clients and servers to other nodes on the network• Wikipedia.org

– Nodes are symmetric in function– No centralized control– Typically many nodes, but unreliable,

dynamic and heterogeneous• Machines can come and go any time

Page 7: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (7) Peer-to-peer systems

Introduction

• P2P advantages– Aggregate tremendous amount of

computation and storage resources– High availability & scalability

• Hundred of thousands or millions of machines• No bottlenecks, no single points of vulnerability

– Low latency• Exploit locality whenever possible

– Resilience to failures, spikes in demand, surges in traffic or intentional attacks

Page 8: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (8) Peer-to-peer systems

Introduction

• Basic interface in P2P systemsA. Insert / publish a resource / object / file /

serviceB. Find / lookup a resource / object / file /

service• Content location is the main challenge in P2P

systems

AB

C

D

E

F

E?

Page 9: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (9) Peer-to-peer systems

Introduction

• Two approaches for content location1. Unstructured P2P systems

• No (or little) control over the P2P network topology

• No (or little) control over resource placement• Insert is easy, search is hard or imprecise• Examples: Napster, Gnutella, KaZaA

2. Structured P2P systems• P2P network topology is tightly controlled • Resources stored at specified locations • Insert is hard, search is easy• Examples: DHT-based systems

– CAN, Chord, Pastry, Tapestry

Page 10: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (10) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems

• Structured P2P systems

Page 11: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (11) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems– Centralized model– Decentralized model– Hierarchical model

• Structured P2P systems

Page 12: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (12) Peer-to-peer systems

Centralized model: Napster

A. Query a centralized index system that returns the machine/s that stores the required file

B. Transfer the file from given machine/s

AB

C

D

E

F

m1m2

m3

m4

m5

m6

m1 Am2 Bm3 Cm4 Dm5 Em6 F

E?m5

E? E

Insert(“E”, m5)

Lookup(“E”)

Page 13: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (13) Peer-to-peer systems

Centralized model: Napster

• Advantages– Simple and locates files quickly and

efficiently– Easy to implement sophisticated search

engines on top of the index system• Disadvantages

– Lack of robustness: single point of failure– Scalability / performance bottleneck– Resources are distributed, but index service

is not• Easy to detect copyright infringements• For this reason, Napster could be shut down

Page 14: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (14) Peer-to-peer systems

READING REPORT

• Cohen03. Incentives Build Robustness in BitTorrent

Page 15: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (15) Peer-to-peer systems

Centralized model: BitTorrent

• Collaborative system providing efficient content distribution using file swarming– Each file split into smaller pieces– Nodes request desired pieces from

neighbors– Encourages contribution by all nodes

• Combines a client-server model for locating the pieces with a P2P download protocol– Usually does not perform all the functions of

a typical P2P system, like searching• Written by Bram Cohen (in Python) in

2001

Page 16: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (16) Peer-to-peer systems

Centralized model: BitTorrent

1. Peers obtain a .torrent file, hosted in a web server

2. Peers connect to the tracker specified there3. Tracker responds with contact information

about the peers that are downloading the same file

4. Peers then use this information to connect to each other and distribute file pieces among them

Page 17: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (17) Peer-to-peer systems

Centralized model: BitTorrent

• .torrent file contains:– Metadata about the file to be shared

• Name, length, piece length, SHA-1 hash of each piece

– URL of tracker• Tracker

– Keeps track of all the peers who have the file (seeds/leechers) and which chunks each peer has

– When asked for a list of peers, the tracker returns a random subset of the peers, typically 50 peers

– Can receive statistics about what peers are doing

Page 18: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (18) Peer-to-peer systems

Centralized model: BitTorrent

• Seeds– Peers that have a complete copy of the file– They are usually selfish and do not want to

wait after they get the file– Initial seed: a peer that provides the initial

copy– At least the initial seed has to stay to serve

one complete copy of the file• Leechers

– Any peer who does not have a complete file– He stays a leecher and eventually becomes

a seed

Page 19: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (19) Peer-to-peer systems

BitTorrent: Piece selection

• Each file split into smaller pieces– This is the unit for uploading, typically

256Kb• Pieces are further divided in sub-pieces

– This is the unit for downloading, typically 16Kb

• The order in which pieces are selected is critical for good performance– Avoid ending with peers having all the same

set of available pieces, and none of the missing ones

– If the original seed is prematurely taken down, then the file cannot be completely downloaded

Page 20: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (20) Peer-to-peer systems

BitTorrent: Piece selection

• Rarest Piece First– This is the general rule: download first the

pieces that are most rare among your peers• Most commonly available pieces are left until the

end to download– Increases diversity in the pieces

downloaded• You are going to be useful to others• Avoids the case where all peers have exactly the

same pieces– Increases likelihood that all pieces are still

available even if the original seed leaves

Page 21: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (21) Peer-to-peer systems

BitTorrent: Piece selection

• The behavior of the ‘Rarest Piece First’ policy can be modified by three additional policies:– Strict Priority, Random First Piece, Endgame

Mode

• Strict Priority– Once a single sub-piece has been

requested, the other sub-pieces from that piece are requested before sub-pieces from any other piece• As only complete pieces can be sent, it is

important to minimize the number of partially-received pieces

Page 22: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (22) Peer-to-peer systems

BitTorrent: Piece selection

• Random First Piece– Special case at the beginning of the

download– A peer has nothing to trade so it is

important to get a complete piece as soon as possible

– ‘Rarest Piece First’ is not a good option• Rare pieces are present on few peers, so they

would be downloaded slower– Select a random piece of the file and

download it– When first complete piece is assembled,

switch to ‘Rarest Piece First’

Page 23: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (23) Peer-to-peer systems

BitTorrent: Piece selection

• End Game Mode– Special case at the end of the download– The completion of a download can be

delayed due to a single peer with a slow transfer rate

– When all missing sub-pieces have been requested, those not received yet are requested from every peer containing them

– When a sub-piece arrives, the pending requests for that sub-piece are cancelled

– Some bandwidth is wasted, but in practice, this is not too much

Page 24: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (24) Peer-to-peer systems

BitTorrent: Choking

• Want to encourage all peers to contribute– Guarantee a reasonable level of upload and

download reciprocation– Avoid ‘free riders’

• Choking is a temporary refusal to upload– A chokes B if A decides not to upload to B

• A given peer can unchoke only 4 remote peers at a given time– Not because you are connected to

somebody you can download from him

Page 25: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (25) Peer-to-peer systems

BitTorrent: Choking

• Decision about unchoking is reconsidered periodically:– Every 10 seconds, the interested remote

peers are ordered according to their upload rate to the local peer and the 3 fastest peers are unchoked• i.e. initially they should have unchoked the local

peer– Every 30 seconds, one additional interested

remote peer from the remaining connections is unchoked at random • This is called optimistic unchoke• Allows to replace peers with better upload

capacity and bootstrap new peers that do not have any piece to share

Page 26: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (26) Peer-to-peer systems

Centralized model: BitTorrent

• Advantages– Very good download performance

• Slow nodes do not make slow other nodes• Proficient in utilizing partially downloaded files

– Discourages ‘freeloading’ by rewarding fastest uploaders

– ‘Rarest Piece First’ extends the lifetime of swarm, increasing the likelihood all pieces are available

– Users use well-known, trusted sources to locate content, as there is not search feature• Avoids the pollution problem, where garbage is

passed off as authentic content

Page 27: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (27) Peer-to-peer systems

Centralized model: BitTorrent

• Disadvantages– Assumes all interested peers active at same

time: performance deteriorates if swarm “cools off”

– The centralized tracker is a single point of failure • New nodes cannot enter swarm if tracker goes

down• BitTorrent variants without a centralized tracker

– e.g. Vuze (former Azureus)– Now, official client also supports distributed

tracking» Mainline DHT (based on Kademlia)

– Lack of search feature• Users need to resort to out-of-band search

Page 28: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (28) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems– Centralized model– Decentralized model– Hierarchical model

• Structured P2P systems

Page 29: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (29) Peer-to-peer systems

Decentralized model: Gnutella

• Based on query flooding

• Hot to find a file:– Node sends Query

descriptor to all neighbors

• All nodes connected to it

– Neighbors recursively multicast the descriptor

– Eventually a peer that has the file receives the descriptor, and sends back a QueryHit (retracing the path of Query)

– Node requests file to this peer (direct connection)

Page 30: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (30) Peer-to-peer systems

Decentralized model: Gnutella

• Example– Assumption: m1’s neighbors are m2 and

m3; m3’s neighbors are m4 and m5;…

AB

C

D

E

F

m1m2

m3

m4

m5

m6

E?

E?

E?E?

E

Lookup(“E”)

Page 31: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (31) Peer-to-peer systems

Decentralized model: Gnutella

• Advantages– Highly decentralized– Location service distributed over peers

• Peers have similar responsibilities– No centralized directory

• System harder to shut down

• Disadvantages– Slow information discovery– Scalability problem due to excessive query

traffic• Queries for content that are not widely replicated

must be sent to a large fraction of peers• Worst case: O(N) messages per lookup

Page 32: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (32) Peer-to-peer systems

Decentralized model: Gnutella

• Need some mechanisms to control flooding

A. Add TTL (Time-to-Live) to each message– Number of hops that message may travel

before giving up (default in Gnutella is 7)– How to adjust properly TTL value?

• For popular objects, small TTLs suffice• For rare objects, large TTLs are necessary

– Number of messages grow exponentially with TTL

B. Expanding ring– Start querying with TTL=1– Keep increasing TTL until search succeeds

Page 33: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (33) Peer-to-peer systems

Decentralized model: Gnutella

• Need some mechanisms to control flooding

C. Random walk: Ask one neighbor randomly, who asks one neighbor, etc.– Reduces the number of flooded messages

• Tradeoff: longer delay, fewer messages– Multiple-walker random walk

• Initiate several random walks in parallel

D. State keeping: Keep track of recently routed messages– Avoids re-broadcasting query messages– Can choose another neighbor if random

walk used

Page 34: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (34) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems– Centralized model– Decentralized model– Hierarchical model

• Structured P2P systems

Page 35: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (35) Peer-to-peer systems

Hierarchical model: FastTrack

• Special nodes (group leader or super-peer)

• Each peer is either a super-peer or assigned to a super-peer

• Super-peer knows the files in all its peers

• Peer queries super-peer – Super-peer may query

other super-peer using flooding

• Download among peers

Page 36: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (36) Peer-to-peer systems

Hierarchical model: FastTrack

• Advantages– Combines advantages of centralized and

decentralized models– Directory is decentralized

• System harder to shut down– Less flooding than decentralized model

• Disadvantages– Super-peers can suffer the same problems

that centralized directory– Scalability problems by using flooding

• Used in KaZaA, Skype

Page 37: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (37) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems

• Structured P2P systems

Page 38: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (38) Peer-to-peer systems

Distributed Hash Tables (DHT)

• Goals– Make sure that an item identified is always

found• Directed search: Assign particular nodes to hold

particular content (or know where it is)– Decentralization and scalability– Self-organization and fault-tolerance

• Handle rapidly arrival and failure of nodes– Good for exact matches, not keyword search

• Examples– CAN, Chord, Kademlia, Pastry, Viceroy,

Tapestry

Page 39: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (39) Peer-to-peer systems

Distributed Hash Tables (DHT)

• A hash table associates data with keys– Key identifies data uniquely

• Key is hashed to find bucket in hash table

• In a DHT, nodes are the hash buckets– Key is hashed to find responsible peer node

key pos

00

hash functionhash function

11

22

N-1N-1

33...

xx

yy zz

lookup (key) → datainsert (key, data)lookup (key) → datainsert (key, data)

“Beatles” 2

hash table

hash bucket

h(key)%N

0

1

2

...

node

key poshash functionhash function

lookup (key) → datainsert (key, data)lookup (key) → datainsert (key, data)

“Beatles” 2h(key)%N

N-1

Page 40: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (40) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems

• Structured P2P systems– Chord– CAN

Page 41: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (41) Peer-to-peer systems

Chord

• Associate to each node and key a unique ID in an uni-dimensional space

• Properties – Efficient: O(Log N) messages per lookup,

where N is the total number of servers – Scalable: O(Log N) state per node– Robust: survives massive changes in

membership • Challenges

– How to locate an item?– How to maintain routing tables? – How to deal with changes in membership?

Page 42: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (42) Peer-to-peer systems

Chord

• m bit identifier space for both keys and nodes– Identifiers go from 0 to 2m-1– Map nodes & keys to identifiers with hash

function• Key identifier = SHA-1(key)• Node identifier = SHA-1(IP address)

• How to map key IDs to node IDs? – Use consistent hashing

– Map key IDs to nodes with ‘closest’ IDs– A key identified by ID is stored on the

successor node of ID (node with next higher ID)

• Each node is responsible for O(K/N) keys

Page 43: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (43) Peer-to-peer systems

6

1

2

6

0

4

26

5

1

3

7

2

identifier

node

X key

successor(1) = 1

successor(2) = 3successor(6) = 0

Chord: Item insertion

• Example:– Identifier space with m=3

Page 44: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (44) Peer-to-peer systems

Chord: Item insertion

6

1

2

0

4

26

5

1

3

7successor(6) = 7

6

1

successor(1) = 3

• Item management with node joins and departures

Page 45: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (45) Peer-to-peer systems

Chord: Item lookup

A. Basic Chord– Each node knows

only two other nodes• Successor• Predecessor (for ring

management)– Lookup by

forwarding requests around the ring through successor pointers

– Requires O(N) hops

N1

N8

N14

N32

N21

N38

N42

N48

N51

N56

m=6m=6

K54

2m-1 0

lookup(K54)

Page 46: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (46) Peer-to-peer systems

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

Chord: Item lookup

B. Finger tables for accelerating lookup– Every node knows

m other nodes– Entry i in the finger

table of node n points to node n + 2i (if any) or to its successor

– Increase distance exponentially

– Lookups take O(Log N) hops

Page 47: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (47) Peer-to-peer systems

Chord: Managing finger tables

01

2

34

5

6

7

i id+2i succ0 2 11 3 12 5 1

finger table

• Example:– Node n1:(1)

joins• All entries in its

finger table are initialized to itself

Page 48: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (48) Peer-to-peer systems

Chord: Managing finger tables

• Example:– Node n2:(2) joins

01

2

34

5

6

7

i id+2i succ0 2 21 3 12 5 1

finger table

i id+2i succ0 3 11 4 12 6 1

finger table

Page 49: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (49) Peer-to-peer systems

Chord: Managing finger tables

• Example:– Nodes n3:(0) & n4:(6) join

01

2

34

5

6

7

i id+2i succ0 2 21 3 62 5 6

finger table

i id+2i succ0 3 61 4 62 6 6

finger table

i id+2i succ0 1 11 2 22 4 6

finger table

i id+2i succ0 7 01 0 02 2 2

finger table

Page 50: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (50) Peer-to-peer systems

Chord: Managing finger tables

• Example:– Insert keys f1:(7) &

f2:(1)– A key identified by

ID is stored on the successor node of ID

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

finger table

i id+2i succ0 3 61 4 62 6 6

finger table

i id+2i succ0 1 11 2 22 4 6

finger table

7

keys 1

keys

i id+2i succ0 7 01 0 02 2 2

finger table

Page 51: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (51) Peer-to-peer systems

Chord: Managing finger tables

0

4

26

5

1

3

7

012

124

130

finger tablei id+2i succ

keys1

012

235

330

finger tablei id+2i succ

keys2

012

457

000

finger tablei id+2i succ

keys

finger tablei id+2i succ

keys

012

702

003

6

6

66

6

• Example 2:– Item

management when node 6 joins

Page 52: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (52) Peer-to-peer systems

Chord: Managing finger tables

0

4

26

5

1

3

7

012

124

130

finger tablei id+2i succ

keys1

012

235

330

finger tablei id+2i succ

keys2

012

457

660

finger tablei id+2i succ

keys

finger tablei id+2i succ.

keys

012

702

003

6

6

6

0

3

• Example 2:– Item

management when node 1 departures

Page 53: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (53) Peer-to-peer systems

Chord: Item lookup with finger tables

01

2

34

5

6

7i id+2i succ0 2 21 3 62 5 6

finger table

i id+2i succ0 3 61 4 62 6 6

finger table

i id+2i succ0 1 11 2 22 4 6

finger table

7

keys 1

keys

i id+2i succ0 7 01 0 02 2 2

finger table

query(7)

1. Check whether the key is stored locally

2. If not, forward the query to the largest node in the finger table not exceeding key ID

Page 54: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (54) Peer-to-peer systems

Chord: Item lookup with finger tables

• Example 2:

N1

N8

N14

N32

N21

N38

N42

N48

N51

N56

m=6m=6

K54

2m-1 0

N8+1

N8+2

N8+4

N8+8

N8+16

N8+32

N14

N14

N14

N21

N32

N42

Finger table

+32

+16 +8

+4

+2

+1lookup(K54)

N42+1

N42+2

N42+4

N42+8

N42+16

N42+32

N48

N48

N48

N51

N1

N14

Finger table

N51+1

N51+2

N51+4

N51+8

N51+16

N51+32

N56

N56

N56

N1

N8

N21

Finger table

Page 55: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (55) Peer-to-peer systems

N36

Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Chord: Node joining

A. Basic process for joining the ring (3 steps):1. Initialize fingers and predecessor of new

node j• Locate any node n in the ring• Ask n to lookup the peers at j+20, j+21, j+22… • Use results to populate finger table of j• j.pred = j.finger[0].pred

Page 56: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (56) Peer-to-peer systems

Chord: Node joining

2. Update finger tables of existing nodes

– For each entry i in the finger table, new node j calls update function on existing nodes that must point to j• Nodes in the

ranges[pred(j)-2i+1, j-2i]

– O(log N) nodes must be updated

N1

N8

N14

N32

N21

N38

N42

N48

N51

N56

m=6m=6 2m-1 0

N8+1

N8+2

N8+4

N8+8

N8+16

N8+32

N14

N14

N14

N21

N32

N42

+16

N28

N28

-16

N12

N6

Page 57: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (57) Peer-to-peer systems

3. Transfer keys responsibility– Connect to successor and transfer keys in

the range from successor to new node

K38K30

N36

N60

N40

N5

N20N99

N80

Chord: Node joining

Copy keys 21..36from N40 to N36

K30

K38

Page 58: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (58) Peer-to-peer systems

Chord: Node joining

B. Stabilization: Less aggressive mechanism to deal with concurrent joins– The goal is to keep successor pointers up to

date • This sufficient to guarantee correctness of lookups

– Finger entries are updated in a lazy fashion

1. Join only initializes the finger to successor node

2. All nodes run a stabilization procedure that periodically verifies successor and predecessor

3. All nodes also refresh periodically finger table entries

Page 59: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (59) Peer-to-peer systems

Chord: Stabilization

np

succ

(np)

= n

s

ns

n

pre

d(n

s) =

np

n joins predecessor = nil n acquires ns as successor via some node in the ring

n runs stabilization procedure n asks ns for its predecessor

ns responds (pred(ns)= np) to n

n confirms ns as successor and notifies ns that it is its new predecessor otherwise, n updates its successor and

stabilizes again ns checks and acquires n as its predecessor

ns transfers keys in the range to n

np also runs stabilization procedure

np asks ns for its predecessor (now n)

np acquires n as its successor

np notifies n that it is its new predecessor

n acquires np as its predecessor

nil

pre

d(n

s) =

n

succ

(np)

= n

succ

(n)

= n

s

pre

d(n

) =

np

Page 60: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (60) Peer-to-peer systems

Chord: Dealing with node failures• Failure of nodes might cause incorrect lookup

– Lookup(K19) fails: N8 returns the first alive node it knows (N32)

• Solution: successor list– Each node knows r immediate successors– When successor fails, replace it with the first live

entry in the list– Stabilize will correct finger table entries pointing to

failed node

N1

N8

N32

N21

N38

N42

N48

N51

N56

m=6m=6

2m-1 0

+16 +8

+4

+2

+1

N14

N18

K19

lookup(K19) ?

N8+1

N8+2

N8+4

N8+8

N8+16

N8+32

N14

N14

N14

N18

N32

N42

Page 61: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (61) Peer-to-peer systems

Chord

• Advantages– Availability

• Guaranteed to find an item if in the network– Decentralized: improves robustness– Scalability: logarithmic growth of lookup

costs• Disadvantages

– Member joining is complicated– Does not provide anonymity

Page 62: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (62) Peer-to-peer systems

SEMINAR PREPARATION – Chordy

• Stoica03. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Page 63: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (63) Peer-to-peer systems

Contents

• Introduction

• Unstructured P2P systems

• Structured P2P systems– Chord– CAN

Page 64: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (64) Peer-to-peer systems

Content Addressable Network (CAN)• Virtual Cartesian coordinate space

– Associate to each node and item a unique ID in an d-dimensional Cartesian space

• Properties – Routing table size: O(d)– O(d*N1/d) messages per lookup, where N is

the total number of servers • Challenges

– How to divide hash space evenly among nodes?

– How to locate an item?

Page 65: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (65) Peer-to-peer systems

CAN: Virtual space construction

• The entire space is divided among all nodes– Every node “owns” a zone in the overall space

• Node coordinates are assigned in the joining process

• Example: – Node n1:(1, 2) first node that joins– n1 covers the entire space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1

d=2

Page 66: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (66) Peer-to-peer systems

CAN: Virtual space construction

• Incrementally split space between nodes that join– Each node covers either a square or a rectangular

area of ratios 1:2 or 2:1

• Example:– Node n2:(4, 2) joins– Space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

d=2

Page 67: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (67) Peer-to-peer systems

CAN: Virtual space construction

• Example:– Node n3:(3, 5) joins – n1’s zone is divided between n1 and n3

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3

d=2

Page 68: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (68) Peer-to-peer systems

CAN: Virtual space construction

• Example:– Nodes n4:(5, 5) and n5:(6, 6) join– n2’s zone is divided between n2 and n4– n4’s zone is divided between n4 and n5

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

d=2

Page 69: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (69) Peer-to-peer systems

CAN: Item insertion

• Node responsible for key K is determined by hashing K for each dimension

• Example:– node I::insert(K,V)– a = hx(K)

– b = hy(K)

x = a

y = b

I

d=2

Page 70: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (70) Peer-to-peer systems

CAN: Item insertion

• Each item is stored by the node who owns its mapping in the space

• Example:– Items f1:(2,3) & f3:(2,1) stored by n1– Item f2:(5,1) stored by n2– Item f4:(7,5) stored by n5

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

d=2

Page 71: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (71) Peer-to-peer systems

CAN: Item lookup

• A node only maintains state for its immediate neighbors– 2d neighbors per node

if zones are equal• Messages are routed

to the neighbor with coordinates closest to the destination– Min. Cartesian distance

• Multiple choices– Can route around

failures

d=2

Page 72: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (72) Peer-to-peer systems

CAN: Item lookup

• Example:– n1 lookups f4

• Path through n3 & n4

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

d=2

Page 73: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (73) Peer-to-peer systems

CAN: Item lookup

• Example:– n1 lookups f4

• n3 & n4 fail

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

d=2

Page 74: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (74) Peer-to-peer systems

CAN: Item lookup

• Example:– n1 lookups f4

• Alternate route through n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

d=2

Page 75: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (75) Peer-to-peer systems

Content Addressable Network (CAN)• Advantages

– Availability• Guaranteed to find an item if in the network

– Efficient at locating information• O(d.N1/d)

– Decentralized: no single point of failure– Fault tolerant routing

• Disadvantages– Impossible to perform a fuzzy search– Low security

Page 76: Sistemes Distribuïts en Xarxa (SDX) Facultat d’Informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) 2013/2014 Q2 9. Peer-to-Peer Systems

SDX (76) Peer-to-peer systems

Summary

• P2P: new model for global scale computation– Decentralized: Nodes are symmetric in

function– High availability & scalability– Typically many nodes, but unreliable,

dynamic and heterogeneous– Two approaches for content location

• Unstructured P2P systems– Examples: Napster, Gnutella, KaZaA,

BitTorrent• Structured P2P systems

– Examples: DHT-based systems: CAN, Chord