p2p tehnoloogiad - · pdf file2 eelmine kord esimene tähtaeg – 02.10.2006 (aga...
TRANSCRIPT
2
Eelmine kord
● Esimene tähtaeg– 02.10.2006 (aga mida varem, seda parem!)
● Servent – olem, mis saab nii teha päringut, kui ka vastata sellele
● Puhas vs hübriid P2P süsteem● Communication –> Group Management
-> Robustness -> Class-specific -> Application specific tasemed
3
Seekord
● P2P omadused● P2P mudelid
– Centralized Directory Model– Flooded Requests Model– Document Routing Model
● Chord, CAN, Tapestry, Pastry
● Projektid (mida varem hakkad, seda kergem on sess!)
4
Decentralization
Decentralization
5
Decentralization
● Pros: Price, scalability, perfomance● Cons: Security, Joining the system
6
Scalability
Decentralization
Scalability
● Synchronization of central services● Maintance of states● Programming model of computation
7
Anonymity
Decentralization
Scalability
Anonymity
8
Anonymity forms
● Author - dokumendi autorit ei saa määrata● Publisher - dokumendi avaldajat ei saa määrata● Reader - kasutajat, kes tõmbab dokumendi, ei
saa määrata● Server - dokumendi põhjal ei saa määrata
servereid, kus ta asub● Document – serverid ei tea, mis faile nad
hoiavad● Query – server ei tea, mis dokumenti ta
kasutab, kui vastab päringule
9
Tehnikad
10
Self-organization
Decentralization
Scalability
Anonymity
Self-organization
● OceanStore – routing● Pastry – failide replikad● FastTrack, Skype – super-nodes
11
Cost of Ownership
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Väga väike võrreldes klient-server rakendustega
12
Ad-hoc Connectivity
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity
13
Ad-hoc Connectivity
● Ressursside pool P2P süsteemis on ebastabiilne
● Ligipääs failidele on ebastabiilne– SLA puhul – osa teenuspakkujast võib olla
maas● Koostöö süsteemid
– Mobiilsed seadised– Läbipaistev suhtlemine offline süsteemidega
(proxies, sender relays, ...)
14
Perfomance
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
15
Perfomance
● Processing● Storage● Networking
16
Perfomance
● Keskselt koordineeritud süsteemid– DNS
● Hajutatud süsteemid– Message forwarding– Võrgutraffic läheb suuremaks
17
Perfomance
● Replication– Luuakse koopiad otsijale lähemale– Uuendusi on vaja propageerida (consistency)
● Caching– FreeNet'is kui fail on leitud ning tagastatud
soovijale, iga vahesõlm puhverdab tagastatud andmeid
18
Perfomance
● Intelligent routing– On vaja aru saada kuidas sõlmed omavahel
suhtlevad (sotsioloogia vaatenurgast)– „Small-world phenomenon“ (Milgram 1967)– Sõlmed, millel on sarnased huvid, võiksid
olla seotud otseselt– Võrgukulud langevad, otsingu kiirus kasvab
19
Security
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
Security
20
Security
● Multi-key encryption– Public key, multiple private keys
● Sandboxing– Koodi käivitamine sõlmes on ebaturvaline– On vaja tagada, et kood ei tee midagi halba– Virtuaalmasinad, proof-carrying code,
certifying compilers
21
Security
● Digital Rights Management– On vaja tagada, et autorit saaks alati
määrata– Watermarking (steganogrpahy): faili
lisatakse signatuur● Reputation and Accountability
– On vaja määrata, kui „hea“ sõlm on– Jagad palju muusikat -> oled hea– Freeloader -> oled halb
22
Security
● Firewalls– P2P vajab otseühendust sõlmede vahel (duh)– Inbound TCP on väga tihti blokeeritud– NAT – Kui mõlemad sõlmed on peidetud
NATi/firewalli taha, võib kasutada kolmanda sõlme
23
Transparency
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
Security Transparency
24
Fault Resilience
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
Security Transparency
Fault-resilience
25
Fault-Resilience
● Central design point– Vältida central point of failure!
● Erisõlmed – relays– Groove
● Sõnumite järjekord
26
Interoperability
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
Security Transparency
Fault-resilience Interoperability
27
Interoperability
● Peer-to-Peer Working Group (Internet2)– Not too active
● JXTA– Katse teha de facto standardit– Järgmise loengu teema– Hea baas projekti tegemiseks (olemas ka
C/C++ realisatsioon)!
28
P2P Omadused
Decentralization
Scalability
Anonymity
Self-organizationCost of ownership
Ad-hoc connectivity Perfomance
Security Transparency
Fault-resilience Interoperability
:)
29
P2P Mudelid
● Centralized Directory Model● Flooded Requests Model● Document Routing Model
30
Centralized Directory
● Sõlmed avalikustavad infot enda kohta tsentraalses serveris
● Kui tuleb päring, server valib hulgast parima peer'i
● Mõned skaleeruvuse probleemid– Samas Napster'i näide näitab, et see ei ole
eriti suur probleem
31
Flooded Requests
● Gnutella mudel● Võrgukoormus on väga suur
– Super-peer'id võivad aidata
32
Document Routing
● FreeNet'i lähenemine
● Iga peer saab IDP
● Iga peer teab teatud hulk teisi peer'i● Dokumendi publitseerimisel saab dokument
samuti
IDD = h(sisu, nimi)
● Dokument on siis saadetud edasi kuni ta jõuab peer'ini, mille ID
P on ID
D'ga kõige sarnasem
33
Document Routing
34
Document Routing
● Otsimine– Päring läheb peer'ile kõige sarnasema ID'ga
kuni dokument on leitud– Dokument on transatud tagasi, iga
transaktsioonis osalev peer salvestab oma koopiat
● Problems– On vaja teada ID enne otsimist– Islanding problem (segmenteerimine)
35
Document Routing
● Chord, CAN, Tapestry ja Pastry● Põhisiht – vähendada hop'ide arvu
otsimisel● Need algoritmid kas garanteerivad või
väidavad, et suure tõenäosusega otsing on O(log) keerukusega
Järgmised slaidid on võetud siit:
http://www.cs.bgu.ac.il/~ccsh032/
36
CAN● CAN is Content-Addressable Network● Interface
– insert(key, value)– Value = retrieve(key)
● Properties– Scalable– Operationally simple– Good perfomance
K V
CAN: basic idea
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
CAN: basic idea
insert(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
CAN: basic idea
insert(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
CAN: basic idea(K1,V1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
CAN: basic idea
retrieve (K1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
CAN: solution• virtual Cartesian coordinate space
• entire space is partitioned amongst all the nodes – every node “owns” a zone in the overall space
• abstraction– can store data at “points” in the space – can route from one “point” to another
• point = node that owns the enclosing zone
CAN: simple example
1
CAN: simple example
1 2
CAN: simple example
1
2
3
CAN: simple example
1
2
3
4
CAN: simple example
CAN: simple example
I
CAN: simple example
node I::insert(K,V) I
(1) a = hx(K)
CAN: simple example
x = a
node I::insert(K,V) I
(1) a = hx(K) b = hy(K)
CAN: simple example
x = a
y = b
node I::insert(K,V) I
(1) a = hx(K) b = hy(K)
CAN: simple example
(2) route(K,V) -> (a,b)
node I::insert(K,V) I
Following the straight line path from the source to the destinantion
CAN: simple example
(2) route(K,V) -> (a,b)
(3) (a,b) stores (K,V)
(K,V)
node I::insert(K,V) I(1) a = hx(K)
b = hy(K)
CAN: simple example
(2) route “retrieve(K)” to (a,b) (K,V)
(1) a = hx(K) b = hy(K)
node J::retrieve(K)
J
Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)
CAN
CAN: routing table
2d neighbors
CAN: routing
(a,b)
(x,y) ?
A node only maintains state for its immediate neighboring nodes
CAN: routing
CAN: node insertion
Bootstrap node
1) Discover some node “I” already in CANnew node
CAN: node insertion
I
new node 1) discover some node “I” already in CAN
CAN: node insertion
2) pick random point in space
I
(p,q)
new node
CAN: node insertion
(p,q)
3) I routes to (p,q), discovers node J
I
J
new node
CAN: node insertion
newJ
4) split J’s zone in half… new owns one half
New node obtains routing from “J”
Periodic updates:send zone id toits neighbors
CAN: node failures
• Need to repair the space
– Explicit hand over
– recover database• soft-state updates• use replication, rebuild database from replicas
– repair routing • takeover algorithm
CAN: takeover algorithm
• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes over its zone
– Periodic update include: zone id + neighbors
– Absense: singals failure
– TAKEOVER message to all failed node neighbors and sets a takeover timer
– Receipt of TAKEOVER: compare volume and either cancel or reissue TAKEOVER message
CAN: takeover algorithm
• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event
Only the failed node’s immediate neighbors are required for recovery
CAN: node failures
Design recap
• Basic CAN– completely distributed– self-organizing– nodes only maintain state for their immediate
neighbors
• Additional design features– multiple, independent spaces (realities)– background load balancing algorithm– simple heuristics to improve performance
Multi-Demensioned Spaces
• Increase the number of dimensions• Result: reduce path length
• A node NOW has more neighbors
Realities
• Multiple coordinate space• A node is assigned r coordinate zones• Content is replicated to all zones
• Result: can route to (x,y,z) on any reality and at each hop, can use different reality
• Each value is kept at r nodes and each node has r neighbor sets
Outline
• Introduction• Design• Evalution• Ongoing Work
Evaluation
• Scalability
• Low-latency
• Load balancing
• Robustness
CAN: scalability• For a uniformly partitioned space with n nodes and d
dimensions – per node, number of neighbors is 2d– average routing path is (dn1/d)/4 hops– simulations show that the above results hold in practice
• Can scale the network without increasing per-node state
• Chord/Plaxton/Tapestry/Buzz– log(n) nbrs with log(n) hops
CAN: low-latency
• Problem– latency stretch = (CAN routing delay)
(IP routing delay)– application-level routing may lead to high stretch
• Solution– increase dimensions– heuristics
• RTT-weighted routing• multiple nodes per zone (peer nodes)• deterministically replicate entries
Overloading Zones
• Multiple nodes per zone up to MAXPEERS
• Split zone: only if over MAXPEERS
• Each peer in zone knows all others in zone, but still keep one neighbor per zone.
• Periodically: requst list of peers from neighbor and select a new neighbor with best RTT
• Content: divided or replicated
CAN: load balancing
• Two pieces
– Dealing with hot-spots• popular (key,value) pairs• nodes cache recently requested entries• overloaded node replicates popular entries at neighbors
– Uniform coordinate space partitioning• uniformly spread (key,value) entries• uniformly spread out routing load
Uniform Partitioning
• Added check – at join time, pick a zone– check neighboring zones– pick the largest zone and split that one
0
20
40
60
80
100
Uniform Partitioning
V 2V 4V 8V
Volume
Percentage of nodes
w/o check
w/ check
V = total volumen
V16
V 8
V 4
V 2
65,000 nodes, 3 dimensions
CAN: Robustness
• Completely distributed – no single point of failure
• Not exploring database recovery
• Resilience of routing– can route around trouble
Routing resilience
destination
source
Routing resilience
Routing resilience
destination
Routing resilience
• Node X::route(D)
If (X cannot make progress to D) – check if any neighbor of X can make progress– if yes, forward message to one such nbr
Routing resilience
Routing resilience
Inserting a new node affects only a single other node and its immediate neighbors or O(d) neighbors
CAN: node insertion
87
Lõpp
● Järgmine kord– JXTA
?