m1 gp_oneswarm

Privacy-preserving P2P data sharing with OneSwarm

浅見・川原研　M１明村　大登

Tomas Isdal Michael PiatekArvind KrishnamurthyThomas AndersonUniversity of Washington

Sigcomm ’10

1

目次

• Ｐ２Ｐにおけるプライバシー欠如• 研究目的• OneSwarmの提供するネットワーク• ノードの Identifyとネットワーク参加方法• 考えられるセキュリティ攻撃• 転送ルール・戦略• 速度とトラヒック評価• まとめ

2

Ｐ２Ｐにおけるプライバシー欠如

• クライアント・サーバ型構造ではコスト大– P２ P型アプリケーションが注目

• Bittorrent 等の P2Pアプリでは個人の持つコンテンツや通信内容をモニタリング可能–コンテンツの所在がバレてほしくない

• Tor, Freenet（既存の手法）はプライバシーのためにパフォーマンスを犠牲にしすぎ– Onion Routing

3

研究目的

• プライバシーの「コスト」を下げる–プライバシーを優先しながら高効率なファイル共有を実現する

• 各自がサーバに頼らず、信頼するリンクと信頼しないリンクを設定してプライバシーのレベルを調節できるシステム

• 実用的であること4

OneSwarmが実現するネットワーク構造

• Bittorrentのクライアントとしても使える (Public sharing)

• 友人同士のファイル共有（With permissions)　と、

• ランダムな仲介者を用いた匿名の通信　　　　　 (Without attribution)

• の３つを１つのソフトに入れこんだ初めての P２Pアプリ

6

ネットワークへの参加方法

Public key → {IP, Port} は全ユーザで DHTで管理7

512bit Public Key 　 512bit Private Key

公開鍵を ID とする (IPアドレスによる IDと違って永続的 )

RSA暗号

Bob

Alice

DHTBob’s Public key

Alice’s Public key

Bob’s IP

Alice’s IP

Public Keyの入手方法

• E-mail， Social Networkなどで事前に交換• Public keyを集めたコミュニティサーバを利用–コミュニティサーバ内から特定の 20ピア程度の Public keyを取得

8

CommunityServer

OneSwarmに参加したいので誰のでもいいので Public Keyく

ださい

ユーザから提供された大量の Public Key

20人程度のPub Key

セキュリティ攻撃の例

• タイミング攻撃– Round Trip Time　から、何ホップ先のノードがコンテンツが来ているかを推定

–即座に Search Responseが帰ってきた場合、直接つながっているピアが目的コンテンツを持つ可能性大

10

セキュリティ攻撃の例②

• 結託攻撃–複数の敵がコンテンツの持ち主 Tに直接つながっているとき、 1人の敵から Tに向けてでた Searchが他の敵に伝えられたかどうか

A1

C2

Ck

T

C1

forwarded?

Figure 5: An attacker, A , with C1 , ..., Ck colluders tests if atarget T is shar ing a file by sending a targeted search and observing a lack of forwarding.

search, we record the delay of the first response, and then inspectthe topology and link delays to compute the number of possibledata sources associated with a given delay and vantage point. Figure 4 summarizes the results. Even with complete topology andlatency information as well as 250,000 vantage points, search response latencies do not localize asingle data source.

4.4 Collusion attackNext, we analyze the case of multiple peers colluding to infer

whether adirectly connected user issharing aparticular file. In thiscase, an attacker A sends a targeted search to target T , receives asearch response, and observes whether the search was forwardedto colluders C1 , ..., Ck who are also peers of T . (This attack isillustrated in Figure 5.) Recall that forwarding search messages isprobabilistic. Each search message has a configurable probability,pf , of being forwarded to a particular peer. As a result, a lackof forwarding does not definitively identify a data source; missingsearch messages may arise from random chance. But, a lack offorwarding observed by many colluding peers is highly suggestiveof T sourcing the object. Assuming a fixed forwarding probabilityof pf and k colluding attackers, Pr[Not source|response received]= (1 − pf )k . With just a few colluders, an attacker can gain highconfidence.

Thisattack requires both theattacker and colluders to bedirectlyconnected to the target. When matched randomly by apublic community server, the likelihood of an individual attacker being assigned aspecific target for acommunity server withN members isn cN , where nc is the number of peers returned for a single request.As a specific example, consider achieving greater than 95% confidence in theidentification of adatasourcegivenpf = 0.5 for peersreceived from a community server.6 Achieving 95% confidence inidentification requires at least six directly connected peers (an attacker and five colluders). For a community server with N users,the likelihood of achieving a particular number of direct connections is given by the complement of a binomial CDF with successprobability n c

N .In practice, the effectiveness of systematic monitoring depends

on the resources of an attacker relative to the population of a public community server. Privacy depends on this ratio being small,and privacyconscious users are free to decrease their forwardingprobability (pf ), avoid public community servers completely, orrequest fewer peers than nc . Figure 6 provides several concreteexamples of the relationship between exposure, forwarding probability, topology, and the number of untrusted peers. In these examples, pf = 0.5, and wevary nc . Decreasing the maximum number

6Low values of pf for community server peers are offset by thehigh amount of path diversity among them.

Figure6: Thecumulativefraction of nodeswhosebehavior canbe infer red with 95% confidence (xaxis) by a given fraction ofcolluding attackers (yaxis). Even assuming widespread use ofpublic community servers, a significant fraction of colludingattackers is required to infer user behavior.

of peers provided by a community server makes compromising itsusers more difficult. But, we find in our evaluation that increasingpeers improves performance (Section 5).

Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not have nc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.

5. EVALUATIONTo evaluateOneSwarm, wemeasure itsperformance and robust

ness both in the wild and synthetically using trace replay. OneSwarm has been downloaded hundreds of thousands of times todate, and we use a combination of both voluntarily reported userdata as well as instrumented clients to quantify OneSwarm’s realworld effectiveness at the scale of thousands of users. To examineOneSwarm’s operation at even larger scale, we replay traces of thesocial graph and usage behavior of more than one million last.fmusers. In both cases, our main result is that OneSwarm provideshigh throughput and availability in spite of the overhead arisingfrom preserving privacy. In support of this conclusion, we alsomeasuretheeffectivenessof OneSwarm’sprotocol mechanismsandreport usage and workload statistics.

5.1 Realwor ld deployment

Methodology: Although many aspects of user behavior are (deliberately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utilization in the wild. The first of these is voluntarily reported summarystatistics from more than 100,000 distinct userscollected over atenmonth period since the public release of our software. These include the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.

Our second source of data is instrumented OneSwarm clientsrunning on hundreds of PlanetLab [27] machines. Subscribing toseveral public community servers bootstraps connectivity for theseclients, providing each with dozensof OneSwarm peersdrawn randomly from the user population. Our PlanetLab nodes act as passivevantagepoints, measuring thethebackground traffic generatedby users. (This includes both data forwarding and control traffic.)On average, these nodes relay more than one terabyte of data perday.

118

攻撃者結託している攻撃者達

コンテンツの持ち主

データの管理と転送ルール①• TTL（ Time To Live = 最大クエリ転送回数）無し–クエリの残り TTL数は重大な情報になる–代わりに Search Cancelメッセージをクエリ発信者から広がらせる

• 全ての Searchは 150msの遅延を人工的に作る– RTTによるタイミング攻撃が困難になる– Search Cancelを受け取る時間を稼ぐことができる

12

データの管理と転送ルール②

• 確率的に Searchクエリを送らない–デフォルトでは 0.5の確率で自分の untrustedピアの１つにクエリを送信しない

–結託攻撃への対策

• 帯域が混雑しているピアには Searchクエリを転送しない–転送速度を速める–リアルタイムでクエリの転送先が変わるため、結託攻撃が困難

13

速度とトラヒック評価• 平均 457 KBps, (Tor は 20 KBps)– フラッディングによって混雑回避 & 複数パス利用– Torでは混雑回避できない＆単一パス

• 平均帯域制限は 49%– 高い値ではないが Searchクエリは混雑時には届かない

14

まとめ

• Torと同じだけのセキュリティを実現しながら、 Bittorrentに近い速度を実現

• …個人の感想としては– TTLを用いない、あえて Flooding型、タイミングをずらすなど、独創的なアイデアの組み合わせ

–アルゴリズムの考案とシミュレーションだけでなく、実用主義で実装まで持ち込んだのが素晴らしい

15

(Appendix)Consisted Hashingを用いた

コミュニティサーバ

16

１〜４　：　ユーザの Public KeyA〜 C　：　 Public Keyをほしがる　　　ピアの IPアドレス

(Appendix)結託攻撃に対する耐性

17

A1

C2

Ck

T

C1

forwarded?











Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not havenc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.




Methodology: Although many aspects of user behavior are (deliberately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utilization in the wild. The first of these is voluntarily reported summarystatistics from more than 100,000 distinct userscollected over atenmonth period since the public release of our software. These include the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.


118

攻撃者結託している攻撃者攻撃者 Aと結託者 Cのうち何ピアかが直接 Tにつながっている必要性

Pr[Not source|response received] = (1-pf)k

Pf : Tがあるピアに Searchを出す確率K : Tに直接リンクを持つ結託者

確率的に困難

混雑状況と戦略による Searchクエリの挙動の変化

(Appendix) 結託攻撃に破れる確率

• Pr[Not source|response received] = (1-pf)k

• Pf : Tがあるピアに Searchを出す確率• K : Tに直接リンクを持つ結託者の数

18

A1

C2

Ck

T

C1

forwarded?











Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not have nc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.




Methodology: Although many aspects of user behavior are (deliberately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utilization in the wild. The first of these is voluntarily reported summarystatistics from morethan 100,000 distinct userscollected over atenmonth period since the public release of our software. These include the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.


118

例えば５人の結託者が被害者 Tと直接つながり合っていて、 Pf=0.5の確率でSearchクエリが転送されないとすると、

(1-0.5)^5 = 0.03125Tがコンテンツ保持者でない確率は3.1% ヤバすぎる、バレバレ

(Appendix)結託攻撃に破れる確率

• 敵とはコミュニティサーバでつながってしまう– １０００人が登録しているサーバから２６人が得られるとして、 Tと敵１人が直接つながる確率は　　　　２６／１０００

– ３０個の敵が試したとしても、 Tが５人の敵と直接つながってしまう確率は１％以下

• コミュニティサーバは Consistent Hash法で作られている– １つの IPの敵が 1000人中 1000人の Public key を取得することは困難

19

m1 gp_oneswarm

Technology