m1 gp_oneswarm

19
Privacy-preserving P2P data sharing with OneSwarm 浅見・川原研 M 明村 大登 Tomas Isdal Michael PiatekArvind Krishnamurthy Thomas Anderson University of Washington Sigcomm ’10 1

Upload: daito-akimura

Post on 14-Jul-2015

657 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: M1 gp_OneSwarm

Privacy-preserving P2P data sharing with OneSwarm

浅見・川原研 M1明村 大登

Tomas Isdal Michael PiatekArvind KrishnamurthyThomas AndersonUniversity of Washington

Sigcomm ’10

1

Page 2: M1 gp_OneSwarm

目次

• P2Pにおけるプライバシー欠如• 研究目的• OneSwarmの提供するネットワーク• ノードの Identifyとネットワーク参加方法• 考えられるセキュリティ攻撃• 転送ルール・戦略• 速度とトラヒック評価• まとめ

2

Page 3: M1 gp_OneSwarm

P2Pにおけるプライバシー欠如

• クライアント・サーバ型構造ではコスト大– P2 P型アプリケーションが注目

• Bittorrent 等の P2Pアプリでは個人の持つコンテンツや通信内容をモニタリング可能–コンテンツの所在がバレてほしくない

• Tor, Freenet(既存の手法)はプライバシーのためにパフォーマンスを犠牲にしすぎ– Onion Routing

3

Page 4: M1 gp_OneSwarm

研究目的

• プライバシーの「コスト」を下げる–プライバシーを優先しながら高効率なファイル共有を実現する

• 各自がサーバに頼らず、信頼するリンクと信頼しないリンクを設定してプライバシーのレベルを調節できるシステム

• 実用的であること4

Page 5: M1 gp_OneSwarm

5

Page 6: M1 gp_OneSwarm

OneSwarmが実現するネットワーク構造

• Bittorrentのクライアントとしても使える (Public sharing)

• 友人同士のファイル共有(With permissions) と、

• ランダムな仲介者を用いた匿名の通信      (Without attribution)

• の3つを1つのソフトに入れこんだ初めての P2Pアプリ

6

Page 7: M1 gp_OneSwarm

ネットワークへの参加方法

Public key → {IP, Port} は全ユーザで DHTで管理7

512bit Public Key   512bit Private Key

公開鍵を ID とする (IPアドレスによる IDと違って永続的 )

RSA暗号

Bob

Alice

DHTBob’s Public key

Alice’s Public key

Bob’s IP

Alice’s IP

Page 8: M1 gp_OneSwarm

Public Keyの入手方法

• E-mail, Social Networkなどで事前に交換• Public keyを集めたコミュニティサーバを利用–コミュニティサーバ内から特定の 20ピア程度の Public keyを取得

8

CommunityServer

OneSwarmに参加したいので誰のでもいいので Public Keyく

ださい

ユーザから提供された大量の Public Key

20人程度のPub Key

Page 9: M1 gp_OneSwarm
Page 10: M1 gp_OneSwarm

セキュリティ攻撃の例

• タイミング攻撃– Round Trip Time から、何ホップ先のノードがコンテンツが来ているかを推定

–即座に Search Responseが帰ってきた場合、直接つながっているピアが目的コンテンツを持つ可能性大

10

Page 11: M1 gp_OneSwarm

セキュリティ攻撃の例②

• 結託攻撃–複数の敵がコンテンツの持ち主 Tに直接つながっているとき、 1人の敵から Tに向けてでた Searchが他の敵に伝えられたかどうか

A1

C2

Ck

T

C1

forwarded?

Figure 5: An attacker, A , with C1 , ..., Ck colluders tests if atarget T is shar ing a file by sending a targeted search and ob­serving a lack of forwarding.

search, we record the delay of the first response, and then inspectthe topology and link delays to compute the number of possibledata sources associated with a given delay and vantage point. Fig­ure 4 summarizes the results. Even with complete topology andlatency information as well as 250,000 vantage points, search re­sponse latencies do not localize asingle data source.

4.4 Collusion attackNext, we analyze the case of multiple peers colluding to infer

whether adirectly connected user issharing aparticular file. In thiscase, an attacker A sends a targeted search to target T , receives asearch response, and observes whether the search was forwardedto colluders C1 , ..., Ck who are also peers of T . (This attack isillustrated in Figure 5.) Recall that forwarding search messages isprobabilistic. Each search message has a configurable probability,pf , of being forwarded to a particular peer. As a result, a lackof forwarding does not definitively identify a data source; missingsearch messages may arise from random chance. But, a lack offorwarding observed by many colluding peers is highly suggestiveof T sourcing the object. Assuming a fixed forwarding probabilityof pf and k colluding attackers, Pr[Not source|response received]= (1 − pf )k . With just a few colluders, an attacker can gain highconfidence.

Thisattack requires both theattacker and colluders to bedirectlyconnected to the target. When matched randomly by apublic com­munity server, the likelihood of an individual attacker being as­signed aspecific target for acommunity server withN members isn cN , where nc is the number of peers returned for a single request.As a specific example, consider achieving greater than 95% confi­dence in theidentification of adatasourcegivenpf = 0.5 for peersreceived from a community server.6 Achieving 95% confidence inidentification requires at least six directly connected peers (an at­tacker and five colluders). For a community server with N users,the likelihood of achieving a particular number of direct connec­tions is given by the complement of a binomial CDF with successprobability n c

N .In practice, the effectiveness of systematic monitoring depends

on the resources of an attacker relative to the population of a pub­lic community server. Privacy depends on this ratio being small,and privacy­conscious users are free to decrease their forwardingprobability (pf ), avoid public community servers completely, orrequest fewer peers than nc . Figure 6 provides several concreteexamples of the relationship between exposure, forwarding proba­bility, topology, and the number of untrusted peers. In these exam­ples, pf = 0.5, and wevary nc . Decreasing the maximum number

6Low values of pf for community server peers are offset by thehigh amount of path diversity among them.

Figure6: Thecumulativefraction of nodeswhosebehavior canbe infer red with 95% confidence (x­axis) by a given fraction ofcolluding attackers (y­axis). Even assuming widespread use ofpublic community servers, a significant fraction of colludingattackers is required to infer user behavior.

of peers provided by a community server makes compromising itsusers more difficult. But, we find in our evaluation that increasingpeers improves performance (Section 5).

Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not have nc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.

5. EVALUATIONTo evaluateOneSwarm, wemeasure itsperformance and robust­

ness both in the wild and synthetically using trace replay. One­Swarm has been downloaded hundreds of thousands of times todate, and we use a combination of both voluntarily reported userdata as well as instrumented clients to quantify OneSwarm’s real­world effectiveness at the scale of thousands of users. To examineOneSwarm’s operation at even larger scale, we replay traces of thesocial graph and usage behavior of more than one million last.fmusers. In both cases, our main result is that OneSwarm provideshigh throughput and availability in spite of the overhead arisingfrom preserving privacy. In support of this conclusion, we alsomeasuretheeffectivenessof OneSwarm’sprotocol mechanismsandreport usage and workload statistics.

5.1 Real­wor ld deployment

Methodology: Although many aspects of user behavior are (delib­erately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utiliza­tion in the wild. The first of these is voluntarily reported summarystatistics from more than 100,000 distinct userscollected over atenmonth period since the public release of our software. These in­clude the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.

Our second source of data is instrumented OneSwarm clientsrunning on hundreds of PlanetLab [27] machines. Subscribing toseveral public community servers bootstraps connectivity for theseclients, providing each with dozensof OneSwarm peersdrawn ran­domly from the user population. Our PlanetLab nodes act as pas­sivevantagepoints, measuring thethebackground traffic generatedby users. (This includes both data forwarding and control traffic.)On average, these nodes relay more than one terabyte of data perday.

118

攻撃者結託している攻撃者達

コンテンツの持ち主

Page 12: M1 gp_OneSwarm

データの管理と転送ルール①• TTL( Time To Live = 最大クエリ転送回数)無し–クエリの残り TTL数は重大な情報になる–代わりに Search Cancelメッセージをクエリ発信者から広がらせる

• 全ての Searchは 150msの遅延を人工的に作る– RTTによるタイミング攻撃が困難になる– Search Cancelを受け取る時間を稼ぐことができる

12

Page 13: M1 gp_OneSwarm

データの管理と転送ルール②

• 確率的に Searchクエリを送らない–デフォルトでは 0.5の確率で自分の untrustedピアの1つにクエリを送信しない

–結託攻撃への対策

• 帯域が混雑しているピアには Searchクエリを転送しない–転送速度を速める–リアルタイムでクエリの転送先が変わるため、結託攻撃が困難

13

Page 14: M1 gp_OneSwarm

速度とトラヒック評価• 平均 457 KBps, (Tor は 20 KBps)– フラッディングによって 混雑回避 & 複数パス利用– Torでは混雑回避できない&単一パス

• 平均帯域制限は 49%– 高い値ではないが Searchクエリは混雑時には届かない

14

Page 15: M1 gp_OneSwarm

まとめ

• Torと同じだけのセキュリティを実現しながら、 Bittorrentに近い速度を実現

• …個人の感想としては– TTLを用いない、あえて Flooding型、タイミングをずらすなど、独創的なアイデアの組み合わせ

–アルゴリズムの考案とシミュレーションだけでなく、実用主義で実装まで持ち込んだのが素晴らしい

15

Page 16: M1 gp_OneSwarm

(Appendix)Consisted Hashingを用いた

コミュニティサーバ

16

1〜4 : ユーザの Public KeyA〜 C :  Public Keyをほしがる   ピアの IPアドレス

Page 17: M1 gp_OneSwarm

(Appendix)結託攻撃に対する耐性

17

A1

C2

Ck

T

C1

forwarded?

Figure 5: An attacker, A , with C1 , ..., Ck colluders tests if atarget T is shar ing a file by sending a targeted search and ob­serving a lack of forwarding.

search, we record the delay of the first response, and then inspectthe topology and link delays to compute the number of possibledata sources associated with a given delay and vantage point. Fig­ure 4 summarizes the results. Even with complete topology andlatency information as well as 250,000 vantage points, search re­sponse latencies do not localize asingle data source.

4.4 Collusion attackNext, we analyze the case of multiple peers colluding to infer

whether adirectly connected user issharing aparticular file. In thiscase, an attacker A sends a targeted search to target T , receives asearch response, and observes whether the search was forwardedto colluders C1 , ..., Ck who are also peers of T . (This attack isillustrated in Figure 5.) Recall that forwarding search messages isprobabilistic. Each search message has a configurable probability,pf , of being forwarded to a particular peer. As a result, a lackof forwarding does not definitively identify a data source; missingsearch messages may arise from random chance. But, a lack offorwarding observed by many colluding peers is highly suggestiveof T sourcing the object. Assuming a fixed forwarding probabilityof pf and k colluding attackers, Pr[Not source|response received]= (1 − pf )k . With just a few colluders, an attacker can gain highconfidence.

Thisattack requires both theattacker and colluders to bedirectlyconnected to the target. When matched randomly by apublic com­munity server, the likelihood of an individual attacker being as­signed aspecific target for acommunity server withN members isn cN , where nc is the number of peers returned for a single request.As a specific example, consider achieving greater than 95% confi­dence in theidentification of adatasourcegivenpf = 0.5 for peersreceived from a community server.6 Achieving 95% confidence inidentification requires at least six directly connected peers (an at­tacker and five colluders). For a community server with N users,the likelihood of achieving a particular number of direct connec­tions is given by the complement of a binomial CDF with successprobability n c

N .In practice, the effectiveness of systematic monitoring depends

on the resources of an attacker relative to the population of a pub­lic community server. Privacy depends on this ratio being small,and privacy­conscious users are free to decrease their forwardingprobability (pf ), avoid public community servers completely, orrequest fewer peers than nc . Figure 6 provides several concreteexamples of the relationship between exposure, forwarding proba­bility, topology, and the number of untrusted peers. In these exam­ples, pf = 0.5, and wevary nc . Decreasing the maximum number

6Low values of pf for community server peers are offset by thehigh amount of path diversity among them.

Figure6: Thecumulativefraction of nodeswhosebehavior canbe infer red with 95% confidence (x­axis) by a given fraction ofcolluding attackers (y­axis). Even assuming widespread use ofpublic community servers, a significant fraction of colludingattackers is required to infer user behavior.

of peers provided by a community server makes compromising itsusers more difficult. But, we find in our evaluation that increasingpeers improves performance (Section 5).

Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not havenc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.

5. EVALUATIONTo evaluateOneSwarm, wemeasure itsperformance and robust­

ness both in the wild and synthetically using trace replay. One­Swarm has been downloaded hundreds of thousands of times todate, and we use a combination of both voluntarily reported userdata as well as instrumented clients to quantify OneSwarm’s real­world effectiveness at the scale of thousands of users. To examineOneSwarm’s operation at even larger scale, we replay traces of thesocial graph and usage behavior of more than one million last.fmusers. In both cases, our main result is that OneSwarm provideshigh throughput and availability in spite of the overhead arisingfrom preserving privacy. In support of this conclusion, we alsomeasuretheeffectivenessof OneSwarm’sprotocol mechanismsandreport usage and workload statistics.

5.1 Real­wor ld deployment

Methodology: Although many aspects of user behavior are (delib­erately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utiliza­tion in the wild. The first of these is voluntarily reported summarystatistics from more than 100,000 distinct userscollected over atenmonth period since the public release of our software. These in­clude the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.

Our second source of data is instrumented OneSwarm clientsrunning on hundreds of PlanetLab [27] machines. Subscribing toseveral public community servers bootstraps connectivity for theseclients, providing each with dozensof OneSwarm peersdrawn ran­domly from the user population. Our PlanetLab nodes act as pas­sivevantagepoints, measuring thethebackground traffic generatedby users. (This includes both data forwarding and control traffic.)On average, these nodes relay more than one terabyte of data perday.

118

攻撃者 結託している攻撃者攻撃者 Aと結託者 Cのうち何ピアかが直接 Tにつながっている必要性

Pr[Not source|response received] = (1-pf)k

Pf : Tがあるピアに Searchを出す確率K : Tに直接リンクを持つ結託者

確率的に困難

混雑状況と戦略による Searchクエリの挙動の変化

Page 18: M1 gp_OneSwarm

(Appendix) 結託攻撃に破れる確率

• Pr[Not source|response received] = (1-pf)k

• Pf : Tがあるピアに Searchを出す確率• K : Tに直接リンクを持つ結託者の数

18

A1

C2

Ck

T

C1

forwarded?

Figure 5: An attacker, A , with C1 , ..., Ck colluders tests if atarget T is shar ing a file by sending a targeted search and ob­serving a lack of forwarding.

search, we record the delay of the first response, and then inspectthe topology and link delays to compute the number of possibledata sources associated with a given delay and vantage point. Fig­ure 4 summarizes the results. Even with complete topology andlatency information as well as 250,000 vantage points, search re­sponse latencies do not localize asingle data source.

4.4 Collusion attackNext, we analyze the case of multiple peers colluding to infer

whether adirectly connected user issharing aparticular file. In thiscase, an attacker A sends a targeted search to target T , receives asearch response, and observes whether the search was forwardedto colluders C1 , ..., Ck who are also peers of T . (This attack isillustrated in Figure 5.) Recall that forwarding search messages isprobabilistic. Each search message has a configurable probability,pf , of being forwarded to a particular peer. As a result, a lackof forwarding does not definitively identify a data source; missingsearch messages may arise from random chance. But, a lack offorwarding observed by many colluding peers is highly suggestiveof T sourcing the object. Assuming a fixed forwarding probabilityof pf and k colluding attackers, Pr[Not source|response received]= (1 − pf )k . With just a few colluders, an attacker can gain highconfidence.

Thisattack requires both theattacker and colluders to bedirectlyconnected to the target. When matched randomly by apublic com­munity server, the likelihood of an individual attacker being as­signed aspecific target for acommunity server withN members isn cN , where nc is the number of peers returned for a single request.As a specific example, consider achieving greater than 95% confi­dence in theidentification of adatasourcegivenpf = 0.5 for peersreceived from a community server.6 Achieving 95% confidence inidentification requires at least six directly connected peers (an at­tacker and five colluders). For a community server with N users,the likelihood of achieving a particular number of direct connec­tions is given by the complement of a binomial CDF with successprobability n c

N .In practice, the effectiveness of systematic monitoring depends

on the resources of an attacker relative to the population of a pub­lic community server. Privacy depends on this ratio being small,and privacy­conscious users are free to decrease their forwardingprobability (pf ), avoid public community servers completely, orrequest fewer peers than nc . Figure 6 provides several concreteexamples of the relationship between exposure, forwarding proba­bility, topology, and the number of untrusted peers. In these exam­ples, pf = 0.5, and wevary nc . Decreasing the maximum number

6Low values of pf for community server peers are offset by thehigh amount of path diversity among them.

Figure6: Thecumulativefraction of nodeswhosebehavior canbe infer red with 95% confidence (x­axis) by a given fraction ofcolluding attackers (y­axis). Even assuming widespread use ofpublic community servers, a significant fraction of colludingattackers is required to infer user behavior.

of peers provided by a community server makes compromising itsusers more difficult. But, we find in our evaluation that increasingpeers improves performance (Section 5).

Figure 6 also shows the privacy benefits associated with a mixof trusted and untrusted peers. For this case (Untrusted, 26 peers),weconsidered thevulnerability of clients in our last.fm tracewhenadopting a policy of peering with untrusted clients only when theydid not have nc or more contacts from their social network. Userswith a largenumber of trusted friends arecompletely isolated fromcolluding attackers, shifting risk to others that are forced to moreheavily rely on untrusted peers.

5. EVALUATIONTo evaluateOneSwarm, wemeasure itsperformance and robust­

ness both in the wild and synthetically using trace replay. One­Swarm has been downloaded hundreds of thousands of times todate, and we use a combination of both voluntarily reported userdata as well as instrumented clients to quantify OneSwarm’s real­world effectiveness at the scale of thousands of users. To examineOneSwarm’s operation at even larger scale, we replay traces of thesocial graph and usage behavior of more than one million last.fmusers. In both cases, our main result is that OneSwarm provideshigh throughput and availability in spite of the overhead arisingfrom preserving privacy. In support of this conclusion, we alsomeasuretheeffectivenessof OneSwarm’sprotocol mechanismsandreport usage and workload statistics.

5.1 Real­wor ld deployment

Methodology: Although many aspects of user behavior are (delib­erately) obscured by designing for privacy, wedraw on two sourcesof data to profile OneSwarm’s structure, performance, and utiliza­tion in the wild. The first of these is voluntarily reported summarystatistics from morethan 100,000 distinct userscollected over atenmonth period since the public release of our software. These in­clude the total number of peers, themethod used for key exchange,and aggregate data transfer volumes.

Our second source of data is instrumented OneSwarm clientsrunning on hundreds of PlanetLab [27] machines. Subscribing toseveral public community servers bootstraps connectivity for theseclients, providing each with dozensof OneSwarm peersdrawn ran­domly from the user population. Our PlanetLab nodes act as pas­sivevantagepoints, measuring thethebackground traffic generatedby users. (This includes both data forwarding and control traffic.)On average, these nodes relay more than one terabyte of data perday.

118

例えば5人の結託者が被害者 Tと直接つながり合っていて、 Pf=0.5の確率でSearchクエリが転送されないとすると、

(1-0.5)^5 = 0.03125Tがコンテンツ保持者でない確率は3.1% ヤバすぎる、バレバレ

Page 19: M1 gp_OneSwarm

(Appendix)結託攻撃に破れる確率

• 敵とはコミュニティサーバでつながってしまう– 1000人が登録しているサーバから26人が得られるとして、 Tと敵1人が直接つながる確率は    26/1000

– 30個の敵が試したとしても、 Tが5人の敵と直接つながってしまう確率は1%以下

• コミュニティサーバは Consistent Hash法で作られている– 1つの IPの敵が 1000人中 1000人の Public key を取得することは困難

19