the velocity of censorship: high-fidelity detection of microblog post deletions

24
The Velocity of Censorship: High- Fidelity Detection of Microblog Post Deletions Tao Zhu 1 ; David Phipps 2 ; Adam Pridgen 3 ; Jedidiah R. Crandall 4 ; Dan S. Wallach 3 1 Independent Researcher 2 Bowdoin College 3 Rice University 4 University of New Mexico 22 nd USENIX Security Symposium (USENIX Security '13) 左左左 2013/09/10 Seminar @ ADLab, CSIE, NCU

Upload: brian

Post on 22-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

22 nd  USENIX Security Symposium (USENIX Security '13). The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions. Tao Zhu 1 ; David Phipps 2 ; Adam Pridgen 3 ; Jedidiah R. Crandall 4 ; Dan S. Wallach 3 1 Independent Researcher 2 Bowdoin College - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

The Velocity of Censorship: High-Fidelity Detection of Microblog Post DeletionsTao Zhu1; David Phipps2; Adam Pridgen3; Jedidiah R. Crandall4; Dan S. Wallach3

1Independent Researcher2Bowdoin College3Rice University4University of New Mexico

22nd USENIX Security Symposium (USENIX Security '13)

左昌國2013/09/10 Seminar @ ADLab, CSIE, NCU

Page 2: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Outline• Introduction• Methodology• Hypotheses• Topic Extraction• Discussion• Conclusion

2

Page 3: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Introduction• Microblogs in China : Weibo

• Sina Weibo ( http://weibo.com )• 503 million registered users (Dec. 2012)• 100 million messages sent daily• Promoting visibility of social issues

• China employs both backbone-level filtering of IP packets and higher level filtering implemented in the software• Many works focus on how and what to filter• This paper focuses on how quickly microblog posts are removed

3

Page 4: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Introduction• Contributions:

• The implementation of a method that detect a censorship event within 1-2 mins of its occurrence

• To understand how Weibo can react so quickly in terms of deleting posts with sensitive content• 4 hypotheses

• To overcome the usage of neologisms, named entities, and informal language in Chinese for topical analysis

4

Page 5: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology• Identifying the sensitive user group• Crawling posts of sensitive user group• Detecting deletions

5

Page 6: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology – Identifying the Sensitive User Group

• Search the outdated sensitive keywords in China Digital Times (http://chinadigitaltimes.net/2013/06/two-years-of-sensitive-words-grass-mud-horse-list/)• Using the keywords like “ 党产共” ; 2011-4 ~ 2012-10• Starting with 25 sensitive users (manually selected)

6

> 5 repostsfor each user

25 sensitive users > 5 deletion26

Page 7: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology - Identifying the Sensitive User Group

• Sensitive group reaches 3567 users after 15 days• More than 4500 post deletions daily

• 1500 “permission denied” posts• 12% of the total posts from the group were eventually deleted

• This methodology cannot a representative sample of the whole Weibo

7

Page 8: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology - Crawling• User timeline :

• Weibo user timeline API returns the most recent 50 posts of the specified user.

• Querying 3567 sensitive users one per minute• 100 accounts for API call• 300 concurrent Tor circuit

• Four-node cluster running Hadoop and HBase

8

Page 9: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology – Detecting Deletions• If a post is in the database but is not returned from Weiboissue a secondary query for that postto determine what error message is returned

• Permission-denied or system deletion• “Permission-Denied” error• Caused by censorship event• The post still exists but cannot be accessed by users

• General deletion• “Post does not exist” error• May caused by user self deletion or censorship events• The post does not exist.

9

Page 10: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Methodology – Detecting Deletions• This paper focuses on system deletions

• Apparently not by users• From July 2012 to September 2012, 2.38 million posts were

collected, with a 12.8% total deletion rate (4.5% for system deletions and 8.3% for general deletions).

• The lifetime of a post is the time difference between the time the system detected the post being deleted and the creation time.• The measurement fidelity is on the order of minutes

10

Page 11: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Distribution of Deleted Posts

11

Page 12: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Hypotheses• How can the Weibo system find sensitive posts and

remove them so quickly?• How are those sensitive posts located by the moderators

after a month in the huge database?

• Weibo has different strategies to target sensitive contents

12

Page 13: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Hypotheses• Hypothesis 1:

• Weibo has filtering mechanisms as a proactive, automated defense• Explicit filtering• Implicit filtering

• “shishikanfalunhowle”• Camouflaged posts

13

Page 14: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Hypotheses• Hypothesis 2:

• Weibo targets specificusers, such as those who frequently post sensitive content

14

Page 15: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

15

• Hypothesis 3:• When a sensitive post is found, a moderator will

use automated searching tools to find all of its related reposts (parent, child, etc.), and delete them all at once

Hypotheses

Page 16: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Hypotheses• Hypothesis 4:

• Deletion speed is related to the topic.That is, particular topics are targeted for deletion based on how sensitive they are.

• Main 5 topics:• Qidong• Qian Yunhui• Beijing Rainstorm• Diaoyu Island• Group Sex

16

Page 17: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Topic Extraction• Automatic methods are needed to classify the posts• TF*IDF (https://zh.wikipedia.org/wiki/TF-IDF)

• Assign weights to the terms (n-grams) of a document• Pointillism approach [27]

• Reconstruction from grams to words and phrases using external information

17

Page 18: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Topic Extraction• 李 W 阳 (Li Wangyang, from李旺阳 )• 六圌四 (June Fourth, from 六四 )• 胡 () 涛 (Hu Jintao, from 胡锦涛 )• 启 - 东 , 启 \ 东 and 启 / 东(Qidong, from 启东 )

18

Page 19: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Topic Extraction• Which topics among these have been discussed for the

longest period of time?• Independent Component Analysis (ICA)

• Beijing, government, China, country, policeman, and people• These 6 terms appear in almost every individual topic

19

Page 20: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Discussion – Filtering Mechanisms• Proactive mechanisms

• Hypothesis 1• Backwards reposts search

• Hypothesis 3: chain reposts deletion• Backwards keyword search

• Similar to hypothesis 3: relative keywords deletion• 兲朝• 37 人 (http://

news.now.com/home/international/player?newsId=40857)• Monitoring specific users

• Hypothesis 2

20

Page 21: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Discussion – Filtering Mechanisms• Account closures

• 300 user accounts closed• Search filtering• Public timeline filtering• User credit point

• Users can report sensitive or rumor-based posts to earn points

21

Page 22: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Discussion – Time-of-day Behavior

22

Page 23: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Discussion – Time-of-day Behavior

23

Page 24: The Velocity of Censorship: High-Fidelity Detection of  Microblog  Post Deletions

Conclusion• Deletions happen most heavily in the first hour

• 90% of the deletions happen within the first 24 hours• The 4 hypotheses

24