11 class 2012 socialnetworks
TRANSCRIPT
-
7/31/2019 11 CLASS 2012 SocialNetworks
1/40
Systems challenges inonline social media
Alan Mislove
College of Computer and Information ScienceNortheastern University
February 22nd, 2012, Networks Class
-
7/31/2019 11 CLASS 2012 SocialNetworks
2/40
22.02.12 Networks Class Alan Mislove
Social networking sites (Web 2.0)
Popular way toconnect and share contentPhotos, videos, blogs, profiles, news, status...
MySpace (275 M), Facebook (500 M)
Growing exponentially
Incredible amounts of content being sharedFacebook (7.5 B photos/month)
YouTube (48 hours of video/min)
2
0
125
250
375
500
04 06 08 10FacebookU
sers(millions)
-
7/31/2019 11 CLASS 2012 SocialNetworks
3/40
22.02.12 Networks Class Alan Mislove
Whats new in Web 2.0?
3
Web 2.0Web 1.0
-
7/31/2019 11 CLASS 2012 SocialNetworks
4/40
22.02.12 Networks Class Alan Mislove
My groups research
Thesis: OSNs (Web 2.0) fundamentally different from Web 1.0
Introducing new and unforeseen challengesNeed new approaches to address these challenges
Leveraging social networks results in better systemsDue to the increasing integration of systems and social networks
But, must be backed by measurement and analysis
My groups research is motivated by effects of this changeWill give three examples today
4
-
7/31/2019 11 CLASS 2012 SocialNetworks
5/40
Alan Mislove12.06.10 University of Massachusetts, Boston
Effect 1:
Changing patterns of content creation + exchange
submitted tousenix12
5
-
7/31/2019 11 CLASS 2012 SocialNetworks
6/40
22.02.12 Networks Class Alan Mislove 6
Pre-2005 Web (a.k.a. Web 1.0)
Telecom ItaliaTelvia
Fastweb NGI
-
7/31/2019 11 CLASS 2012 SocialNetworks
7/40
22.02.12 Networks Class Alan Mislove
Difference 1: Content popularity
7
Fraction of documents(ranked from most to least popular)
Fractionof
requests
facebook photos [2]
classic web [1]
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
[1] Breslau et al., INFOCOM, 1999, [2] Mislove et al., WSDM, 2010
even distribution
-
7/31/2019 11 CLASS 2012 SocialNetworks
8/40
22.02.12 Networks Class Alan Mislove
Implication: Caches less effective
Popularity distribution much more evenObjects have more narrow scope
In classic Web:Caching top 10% serves between 55% [1] and 95% [2] of requests
Success of CDNs, web caches, ...
In online social media:Caching top 10% would only serve 27% [3] of requests
8
[1] Breslau et al., INFOCOM, 1999, [2] Arlitt et al. IEEE Network, 2000, [3] Mislove et al., WSDM, 2010
-
7/31/2019 11 CLASS 2012 SocialNetworks
9/40
22.02.12 Networks Class Alan Mislove 9
Difference 2: Content generation
Telecom ItaliaTelvia
Fastweb NGI
-
7/31/2019 11 CLASS 2012 SocialNetworks
10/40
22.02.12 Networks Class Alan Mislove
Implication: Workload change
Significant content creation at networks edgeEase of digital content creation (photos, video)
Ubiquity of Internet access (cell phone, iPad)
In classic Web:Workload was center-to-edge
Caching, CDNs take load off origin server
In online social media:Workload is edge-to-edge
Significant geographic locality
10
-
7/31/2019 11 CLASS 2012 SocialNetworks
11/40
22.02.12 Networks Class Alan Mislove
How is OSN content being delivered?
Web 1.0 centralized architectures dominateAkamai, Limelight, Clearway, ...
Facebook serves much of its own content
Mismatch between infrastructure, workload
Workload is naturally decentralizedEvery Facebook upload goes via CA
Can we build a workload-matching distribution system?Avoid unnecessary, expensive transfers
11
-
7/31/2019 11 CLASS 2012 SocialNetworks
12/40
22.02.12 Networks Class Alan Mislove
WebCloud: Decentralized delivery
First step towards decentralized Web content deliveryChallenge: Web doesnt support decentralization
Browsers distinct from Web servers
Use novel techniques to allow browser to serve contentNo client-side changes
Users help serve content they upload
Result: Scalable, workload-matching architecture
Next: Brief technical discussion
12
-
7/31/2019 11 CLASS 2012 SocialNetworks
13/40
22.02.12 Networks Class Alan Mislove
WebCloud design overview
Goal: Move towards more decentralized content exchangeKeep content exchange at the edge
Want to make it work with todays sites, browsers
Reason: Users wont install anythingCant require users to do anything different
Idea: Introduce a middlebox to allow browsers to communicate
To build WebCloud, need to makeClient-side changes
Deploy middleboxes
13
-
7/31/2019 11 CLASS 2012 SocialNetworks
14/40
22.02.12 Networks Class Alan Mislove
Client-side changes
Want to turn web browser into web serverImplement WebCloud in Javascript
Add it to the sites pages
Use LocalStorage to storage browsed contentPersistent cache, up to 5MB/site
Easily programmatically accessed
Treated like LRU cache
Use WebSockets/XHR to communicate with middleboxAllows bi-directional communication
Onlineclient is always connected to middlebox
14
+
-
7/31/2019 11 CLASS 2012 SocialNetworks
15/40
22.02.12 Networks Class Alan Mislove
Middleboxes
Add redirector proxies in each ISPLike Akamai proxy, but doesnt store any content
Maintains open connect to online web visitors
Run by OSN provider
Clients connect to proxyInform proxy of locally stored content
Clients request content from proxyProxy checks for other local clientsFound: fetches content, forwards to requestor
Not found: fetches content from origin site
15
Redirector
proxy
-
7/31/2019 11 CLASS 2012 SocialNetworks
16/40
22.02.12 Networks Class Alan Mislove
Putting it all together
Overall, WebCloud serves as a distributed cache
Use content-hashes to ensure integrity
Privacy implicationsk-anonymity for viewers
16
Redirector
proxy
Client A
Client B
Internet
-
7/31/2019 11 CLASS 2012 SocialNetworks
17/40
22.02.12 Networks Class Alan Mislove
WebCloud applied to real-world site
Top-50 U.S. web siteSimulation based on Akamai logs
Would dramatically reduce bandwidth requiredSavings for both site and ISP
17
00:00
Fri
00:00
Sat
00:00
Sun
00:00
Mon
00:00
Tue
00:00
Wed
00:00
Thu
20
40
60
80
100
120
00:00
Fri
Time(one week)
B
andwidth
(MB/s)
76% reduction in 95th percentile bandwidth
-
7/31/2019 11 CLASS 2012 SocialNetworks
18/40
22.02.12 Networks Class Alan Mislove
Summary
Beginnings of shift in patterns of content creation + exchangePatterns changing from center to edge to edge to edge
Less biased popularity distribution
But, still using centralized delivery architectures
WebCloud: Step towards decentralized Web content deliveryUsers help serve content they create
Implemented using existing browser features; no client changes
Evaluation demonstrated practicality, efficacy
18
-
7/31/2019 11 CLASS 2012 SocialNetworks
19/40
Alan Mislove12.06.10 University of Massachusetts, Boston
Effect 2:
Changing meaning of accounts/identity
nsdi11
19
-
7/31/2019 11 CLASS 2012 SocialNetworks
20/40
-
7/31/2019 11 CLASS 2012 SocialNetworks
21/40
22.02.12 Networks Class Alan Mislove
Sybils
Free accounts with privileges leading to Sybil attacks [IPTPS 2002]Single person creates many accounts
Why?
Natural: Gain extra privilegesIncentives set up to encourage this
Examples in the wildMaze [ICDCS 2007]
Digg [NSDI 2009]
TripAdvisor [NYT, 10/2011]
Facebook, Gmail [me, others]
21
-
7/31/2019 11 CLASS 2012 SocialNetworks
22/40
22.02.12 Networks Class Alan Mislove 22
Example: Online marketplaces
Among most successful Web siteseBay alone: $62 b in 2010
But, known to suffer from fraud
Auctions
Marketplace
-
7/31/2019 11 CLASS 2012 SocialNetworks
23/40
22.02.12 Networks Class Alan Mislove
Identities and reputations
23
$40
$25
$90
$2
$5
$1
$300
$90
$50
Feedback profile
$300
$90 $90
Significant monetary lossesRecent arrest of user who stole $717 k from 5,000 users
Used >250 accounts
-
7/31/2019 11 CLASS 2012 SocialNetworks
24/40
22.02.12 Networks Class Alan Mislove
Bazaar: A new approach
New approach to strengthening user reputations
Leverages an (existing) risk network
Focuses on protecting buyers from malicious sellers
Works in conjunction with existing marketplaceAssumes same feedback system as today
No additional monetary cost
No strong identities
Insight: Successful transactions represent shared riskBuyer and seller more likely to enter into future transactions
24
-
7/31/2019 11 CLASS 2012 SocialNetworks
25/40
22.02.12 Networks Class Alan Mislove
Bazaars risk network
Successful transaction two identities linked
Weighted by amount of transaction
Risk network automatically generatedUsers need not even know about it
25
$5
$25
$1
$7
$45
$4$10
$3
$50$10
-
7/31/2019 11 CLASS 2012 SocialNetworks
26/40
22.02.12 Networks Class Alan Mislove
Estimating risk
26
Bazaar calculates max-flow between buyer and sellerIf max-flow lower than potential transaction, flag as fraudulent
$5
$300
$4000
$50
$200
$100
Max-flow: $5
Buyer
Seller
-
7/31/2019 11 CLASS 2012 SocialNetworks
27/40
22.02.12 Networks Class Alan Mislove
Summary
Increasing trend of online services with free accountsOpens new vector for attack
Focused on reputation manipulation in online marketplacesBazaar: A new approach to strengthening reputations
Evaluated on 10 m auctions from eBay UKWould have prevented 164 k of negative feedback
Only in five categories over 90 days
Currently looking to apply techniques to other domains
27
-
7/31/2019 11 CLASS 2012 SocialNetworks
28/40
Alan Mislove12.06.10 University of Massachusetts, Boston
Effect 3:
Changing requirements of end users
imc11
28
-
7/31/2019 11 CLASS 2012 SocialNetworks
29/40
16.10.2009 CCIS/COE Retreat Alan Mislove 29
Privacy on OSNs
Privacy is a significant issue on OSNs
Received recent press, research attention
What is underlying privacy debate?
1. Sites control personal information of millions of users
2. Users are expected to manage their privacy
5,830 word privacy policyOver 100 different settings
Default is open-to-the-world (over 800 million users)
-
7/31/2019 11 CLASS 2012 SocialNetworks
30/40
16.10.2009 CCIS/COE Retreat Alan Mislove
A fundamental shift for users
Prior to OSNs
Users were largely content consumers
Now, with sites like Facebook
Users expected to be content creators and managersMust enumerate who is able to access every uploaded content
Avg. 130 friends, 90 pieces of content/month...
Whats the extent of privacy problem?
So far, most studies anecdotal
Can we quantify the extent of the privacy problem on Facebook?
30
-
7/31/2019 11 CLASS 2012 SocialNetworks
31/40
16.10.2009 CCIS/COE Retreat Alan Mislove
Facebook privacy model
Consider Facebook-supported content:
Photos, Videos, Statuses, Links and Notes
Five sharing granularities:
Only Me (Me)Some Friends (SF)
All Friends (AF)
Friends of Friends (FoF)
Everyone (All)
31
-
7/31/2019 11 CLASS 2012 SocialNetworks
32/40
16.10.2009 CCIS/COE Retreat Alan Mislove
Measuring desired and actual settings
32
Design a Facebook survey application
Collects actual setting for all content
Selects up to 10 photos
Asks user about desired privacy setting
Recruit using Amazon Mechanical Turk
Total of 200 Facebook users
Pay them each $1
116,553 actual settings
1,675 desired settings
Study was conducted under Northeastern IRB protocol #10-10-04
-
7/31/2019 11 CLASS 2012 SocialNetworks
33/40
16.10.2009 CCIS/COE Retreat Alan Mislove
What are the existing privacy settings?
36% of all content shared with the default (visible to all users)
Photos have the most privacy-conscious settings
33
0
0.1
0.2
0.3
0.4
0.5
0.6
Photo Video Status Link Note
Fraction
of
Content
Only Me
Some Friends
All Friends
Friends of Friends
Everyone
Default
-
7/31/2019 11 CLASS 2012 SocialNetworks
34/40
-
7/31/2019 11 CLASS 2012 SocialNetworks
35/40
16.10.2009 CCIS/COE Retreat Alan Mislove
What about photos with modified settings?
Settings match only for 39% of privacy-modified photosEven when user has explicitly changed setting
Take-away: Not just poor defaults
Users have significant trouble managing their privacy35
Actual
Setting
De ired Setting
Me SF AF FoF All
Me
SF
AF
FoF
All
2 6 4 0 4
2 12 29 8 11
40 8 237 40 69 218 (28%)
39 17 148 45 47
0 0 0 0 0
Total 54 (33% 296 (39%)
Additional 768 photos with non-default privacy settings
-
7/31/2019 11 CLASS 2012 SocialNetworks
36/40
16.10.2009 CCIS/COE Retreat Alan Mislove
Can we improve sharing mechanisms?
Can we provide better management tools?
Ease users role as content manager
Idea: Leverage the structure of the social network
Create privacy groups from users friends
Update the groups as the user forms or breaks friendships
36
-
7/31/2019 11 CLASS 2012 SocialNetworks
37/40
16.10.2009 CCIS/COE Retreat Alan Mislove
Automatically detecting friendlists
Friendlists: Facebook feature similar to Google+ Circles
Ground truth; Meaningful groupings of users for privacy
Collected 233 friendlists from our 200 AMT users
Do friendlists correspond with the social network?Normalized conductance [WSDM10] rates the quality of community
Strongly positive values indicate significant community structure
Results on 233 friendlists:Over 48% friendlists correspond to strong communities
May be able to be inferred from social network
37
-
7/31/2019 11 CLASS 2012 SocialNetworks
38/40
16.10.2009 CCIS/COE Retreat Alan Mislove
Summary
Privacy an important issue on OSNs
But, to date, no quantification of privacy problem
Develop methodology to measure actual, desired privacy settings
Deployed to 200 Facebook users from AMT
Findings:
36% of all content shared with the default settings
Privacy settings match expectations less than 40% of the timeEven when users has already modified setting
But, potential to aid users by providing better mechanisms
38
-
7/31/2019 11 CLASS 2012 SocialNetworks
39/40
22.02.12 Networks Class Alan Mislove
Conclusion
Social networks and computer systems increasingly integratedNew way of organizing information
Leading to new opportunities, challenges
My groups goal: Leverage social networks in systems design
WebCloud: Addresses challenges with emerging workloads
Bazaar: Addresses challenges with free accounts
Privacy: Addresses difference between privacy perception andreality
39
-
7/31/2019 11 CLASS 2012 SocialNetworks
40/40
Questions?
Work done in collaboration with
Ben Adams (MPI-I), Bobby Bhattacharjee (University of Maryland), Meeyoung Cha (KAIST),
Peter Druschel (MPI-SWS), Krishna P. Gummadi (MPI-SWS),
Andreas Haeberlen (University of Pennsylvania), Ancsa Hannk (Northeastern University),
Jonathan Katz (University of Maryland), Hema Swetha Koppula (Yahoo Research India),
Sune Lehmann (TU Copenhagen), Yabing Liu (Northeastern University),
Arash Molavi (Northeastern University), Jukka-Pekka Onnela (Harvard University),
Ansley Post (Google), J. Niels Rosenquist (Harvard Medical School),
Neil Spring (University of Maryland), Ravi Sundaram (Northeastern University),
Malveeka Tewari (University of California, San Diego), Bimal Viswanath (MPI-SWS),
Liang Zhang (Northeastern University), Fangfei Zhou (Northeastern University)