p2p tehnoloogiad - · pdf file2 eelmine kord esimene tähtaeg – 02.10.2006 (aga...

1

P2P Tehnoloogiad MTAT.08.002 (2 AP)

Omadused ja P2P Mudelid

Ilja [email protected]

2

Eelmine kord

● Esimene tähtaeg– 02.10.2006 (aga mida varem, seda parem!)

● Servent – olem, mis saab nii teha päringut, kui ka vastata sellele

● Puhas vs hübriid P2P süsteem● Communication –> Group Management

-> Robustness -> Class-specific -> Application specific tasemed

3

Seekord

● P2P omadused● P2P mudelid

– Centralized Directory Model– Flooded Requests Model– Document Routing Model

● Chord, CAN, Tapestry, Pastry

● Projektid (mida varem hakkad, seda kergem on sess!)

4

Decentralization

Decentralization

5

Decentralization

● Pros: Price, scalability, perfomance● Cons: Security, Joining the system

6

Scalability

Decentralization

Scalability

● Synchronization of central services● Maintance of states● Programming model of computation

7

Anonymity

Decentralization

Scalability

Anonymity

8

Anonymity forms

● Author - dokumendi autorit ei saa määrata● Publisher - dokumendi avaldajat ei saa määrata● Reader - kasutajat, kes tõmbab dokumendi, ei

saa määrata● Server - dokumendi põhjal ei saa määrata

servereid, kus ta asub● Document – serverid ei tea, mis faile nad

hoiavad● Query – server ei tea, mis dokumenti ta

kasutab, kui vastab päringule

9

Tehnikad

10

Self-organization

Decentralization

Scalability

Anonymity

Self-organization

● OceanStore – routing● Pastry – failide replikad● FastTrack, Skype – super-nodes

11

Cost of Ownership

Decentralization

Scalability

Anonymity

Self-organizationCost of ownership

Väga väike võrreldes klient-server rakendustega

12

Ad-hoc Connectivity

Decentralization

Scalability

Anonymity


Ad-hoc connectivity

13

Ad-hoc Connectivity

● Ressursside pool P2P süsteemis on ebastabiilne

● Ligipääs failidele on ebastabiilne– SLA puhul – osa teenuspakkujast võib olla

maas● Koostöö süsteemid

– Mobiilsed seadised– Läbipaistev suhtlemine offline süsteemidega

(proxies, sender relays, ...)

14

Perfomance

Decentralization

Scalability

Anonymity


Ad-hoc connectivity Perfomance

15

Perfomance

● Processing● Storage● Networking

16

Perfomance

● Keskselt koordineeritud süsteemid– DNS

● Hajutatud süsteemid– Message forwarding– Võrgutraffic läheb suuremaks

17

Perfomance

● Replication– Luuakse koopiad otsijale lähemale– Uuendusi on vaja propageerida (consistency)

● Caching– FreeNet'is kui fail on leitud ning tagastatud

soovijale, iga vahesõlm puhverdab tagastatud andmeid

18

Perfomance

● Intelligent routing– On vaja aru saada kuidas sõlmed omavahel

suhtlevad (sotsioloogia vaatenurgast)– „Small-world phenomenon“ (Milgram 1967)– Sõlmed, millel on sarnased huvid, võiksid

olla seotud otseselt– Võrgukulud langevad, otsingu kiirus kasvab

19

Security

Decentralization

Scalability

Anonymity



Security

20

Security

● Multi-key encryption– Public key, multiple private keys

● Sandboxing– Koodi käivitamine sõlmes on ebaturvaline– On vaja tagada, et kood ei tee midagi halba– Virtuaalmasinad, proof-carrying code,

certifying compilers

21

Security

● Digital Rights Management– On vaja tagada, et autorit saaks alati

määrata– Watermarking (steganogrpahy): faili

lisatakse signatuur● Reputation and Accountability

– On vaja määrata, kui „hea“ sõlm on– Jagad palju muusikat -> oled hea– Freeloader -> oled halb

22

Security

● Firewalls– P2P vajab otseühendust sõlmede vahel (duh)– Inbound TCP on väga tihti blokeeritud– NAT – Kui mõlemad sõlmed on peidetud

NATi/firewalli taha, võib kasutada kolmanda sõlme

23

Transparency

Decentralization

Scalability

Anonymity



Security Transparency

24

Fault Resilience

Decentralization

Scalability

Anonymity




Fault-resilience

25

Fault-Resilience

● Central design point– Vältida central point of failure!

● Erisõlmed – relays– Groove

● Sõnumite järjekord

26

Interoperability

Decentralization

Scalability

Anonymity




Fault-resilience Interoperability

27

Interoperability

● Peer-to-Peer Working Group (Internet2)– Not too active

● JXTA– Katse teha de facto standardit– Järgmise loengu teema– Hea baas projekti tegemiseks (olemas ka

C/C++ realisatsioon)!

28

P2P Omadused

Decentralization

Scalability

Anonymity




Fault-resilience Interoperability

:)

29

P2P Mudelid

● Centralized Directory Model● Flooded Requests Model● Document Routing Model

30

Centralized Directory

● Sõlmed avalikustavad infot enda kohta tsentraalses serveris

● Kui tuleb päring, server valib hulgast parima peer'i

● Mõned skaleeruvuse probleemid– Samas Napster'i näide näitab, et see ei ole

eriti suur probleem

31

Flooded Requests

● Gnutella mudel● Võrgukoormus on väga suur

– Super-peer'id võivad aidata

32

Document Routing

● FreeNet'i lähenemine

● Iga peer saab IDP

● Iga peer teab teatud hulk teisi peer'i● Dokumendi publitseerimisel saab dokument

samuti

IDD = h(sisu, nimi)

● Dokument on siis saadetud edasi kuni ta jõuab peer'ini, mille ID

P on ID

D'ga kõige sarnasem

33

Document Routing

34

Document Routing

● Otsimine– Päring läheb peer'ile kõige sarnasema ID'ga

kuni dokument on leitud– Dokument on transatud tagasi, iga

transaktsioonis osalev peer salvestab oma koopiat

● Problems– On vaja teada ID enne otsimist– Islanding problem (segmenteerimine)

35

Document Routing

● Chord, CAN, Tapestry ja Pastry● Põhisiht – vähendada hop'ide arvu

otsimisel● Need algoritmid kas garanteerivad või

väidavad, et suure tõenäosusega otsing on O(log) keerukusega

Järgmised slaidid on võetud siit:

http://www.cs.bgu.ac.il/~ccsh032/

36

CAN● CAN is Content-Addressable Network● Interface

– insert(key, value)– Value = retrieve(key)

● Properties– Scalable– Operationally simple– Good perfomance

K V

CAN: basic idea

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

CAN: basic idea

insert(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

CAN: basic idea(K1,V1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

CAN: basic idea

retrieve (K1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

CAN: solution• virtual Cartesian coordinate space

• entire space is partitioned amongst all the nodes – every node “owns” a zone in the overall space

• abstraction– can store data at “points” in the space – can route from one “point” to another

• point = node that owns the enclosing zone

CAN: simple example

1

CAN: simple example

1 2

CAN: simple example

1

2

3

CAN: simple example

1

2

3

4

CAN: simple example

CAN: simple example

I

CAN: simple example

node I::insert(K,V) I

(1) a = hx(K)

CAN: simple example

x = a


(1) a = hx(K) b = hy(K)

CAN: simple example

x = a

y = b


(1) a = hx(K) b = hy(K)

CAN: simple example

(2) route(K,V) -> (a,b)


Following the straight line path from the source to the destinantion

CAN: simple example

(2) route(K,V) -> (a,b)

(3) (a,b) stores (K,V)

(K,V)

node I::insert(K,V) I(1) a = hx(K)

b = hy(K)

CAN: simple example

(2) route “retrieve(K)” to (a,b) (K,V)

(1) a = hx(K) b = hy(K)

node J::retrieve(K)

J

Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)

CAN

CAN: routing table

2d neighbors

CAN: routing

(a,b)

(x,y) ?

A node only maintains state for its immediate neighboring nodes

CAN: routing

CAN: node insertion

Bootstrap node

1) Discover some node “I” already in CANnew node

CAN: node insertion

I

new node 1) discover some node “I” already in CAN

CAN: node insertion

2) pick random point in space

I

(p,q)

new node

CAN: node insertion

(p,q)

3) I routes to (p,q), discovers node J

I

J

new node

CAN: node insertion

newJ

4) split J’s zone in half… new owns one half

New node obtains routing from “J”

Periodic updates:send zone id toits neighbors

CAN: node failures

• Need to repair the space

– Explicit hand over

– recover database• soft-state updates• use replication, rebuild database from replicas

– repair routing • takeover algorithm

CAN: takeover algorithm

• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes over its zone

– Periodic update include: zone id + neighbors

– Absense: singals failure

– TAKEOVER message to all failed node neighbors and sets a takeover timer

– Receipt of TAKEOVER: compare volume and either cancel or reissue TAKEOVER message

CAN: takeover algorithm

• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event

Only the failed node’s immediate neighbors are required for recovery

CAN: node failures

Design recap

• Basic CAN– completely distributed– self-organizing– nodes only maintain state for their immediate

neighbors

• Additional design features– multiple, independent spaces (realities)– background load balancing algorithm– simple heuristics to improve performance

Multi-Demensioned Spaces

• Increase the number of dimensions• Result: reduce path length

• A node NOW has more neighbors

Realities

• Multiple coordinate space• A node is assigned r coordinate zones• Content is replicated to all zones

• Result: can route to (x,y,z) on any reality and at each hop, can use different reality

• Each value is kept at r nodes and each node has r neighbor sets

Outline

• Introduction• Design• Evalution• Ongoing Work

Evaluation

• Scalability

• Low-latency

• Load balancing

• Robustness

CAN: scalability• For a uniformly partitioned space with n nodes and d

dimensions – per node, number of neighbors is 2d– average routing path is (dn1/d)/4 hops– simulations show that the above results hold in practice

• Can scale the network without increasing per-node state

• Chord/Plaxton/Tapestry/Buzz– log(n) nbrs with log(n) hops

CAN: low-latency

• Problem– latency stretch = (CAN routing delay)

(IP routing delay)– application-level routing may lead to high stretch

• Solution– increase dimensions– heuristics

• RTT-weighted routing• multiple nodes per zone (peer nodes)• deterministically replicate entries

Overloading Zones

• Multiple nodes per zone up to MAXPEERS

• Split zone: only if over MAXPEERS

• Each peer in zone knows all others in zone, but still keep one neighbor per zone.

• Periodically: requst list of peers from neighbor and select a new neighbor with best RTT

• Content: divided or replicated

CAN: load balancing

• Two pieces

– Dealing with hot-spots• popular (key,value) pairs• nodes cache recently requested entries• overloaded node replicates popular entries at neighbors

– Uniform coordinate space partitioning• uniformly spread (key,value) entries• uniformly spread out routing load

Uniform Partitioning

• Added check – at join time, pick a zone– check neighboring zones– pick the largest zone and split that one

0

20

40

60

80

100

Uniform Partitioning

V 2V 4V 8V

Volume

Percentage of nodes

w/o check

w/ check

V = total volumen

V16

V 8

V 4

V 2

65,000 nodes, 3 dimensions

CAN: Robustness

• Completely distributed – no single point of failure

• Not exploring database recovery

• Resilience of routing– can route around trouble

Routing resilience

destination

source

Routing resilience

Routing resilience

destination

Routing resilience

• Node X::route(D)

If (X cannot make progress to D) – check if any neighbor of X can make progress– if yes, forward message to one such nbr

Routing resilience

Routing resilience

Inserting a new node affects only a single other node and its immediate neighbors or O(d) neighbors

CAN: node insertion

87

Lõpp

● Järgmine kord– JXTA

?

p2p tehnoloogiad - · pdf file2 eelmine kord esimene tähtaeg – 02.10.2006 (aga...

Documents