suspicious news detection using micro blog texttagami/resources/paclic2018.pdfn proposed a new task...

24
Suspicious News Detection Using Micro Blog Text Tsubasa Tagami " Hiroki Ouchi #," , Hiroki Asano ",# , Kazuaki Hanawa " , Kaori Uchiyama " , Kaito Suzuki " , Kentaro Inui ",# , Atsushi Komiya % , Atsuo Fujimura % , Hitofumi Yanai & , Ryo Yamashita , Akinori Machino ( " Graduate School of Information Sciences, Tohoku University, Japan # RIKEN, % SmartNews, Inc., & FactCheck Initiative Japan, Watchdog for Accuracy inNews-reporting, Japan, ( Hi-Ether Japan PACLIC2018

Upload: others

Post on 07-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Suspicious News Detection Using Micro Blog Text

Tsubasa Tagami"Hiroki Ouchi#,", Hiroki Asano",#, Kazuaki Hanawa", Kaori Uchiyama",Kaito Suzuki", Kentaro Inui",#, Atsushi Komiya%, Atsuo Fujimura%,

Hitofumi Yanai&, Ryo Yamashita', Akinori Machino(

"Graduate School of Information Sciences, Tohoku University, Japan#RIKEN, %SmartNews, Inc., &FactCheck Initiative Japan,

'Watchdog for Accuracy inNews-reporting, Japan, (Hi-Ether Japan

PACLIC2018

Page 2: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

n Proposed a new task suspicious news detection using micro blog text

n This task aims to detect suspicious news articles that need to be fact-checking

n Developed human-machine hybrid fact-checking n Applied to a real-world situation of Okinawa governor election and detected 21 Fake News

Outline

SuspiciousNewsDetectionUsingMicroBlogText 2SuspiciousNewsDetectionUsingMicroBlogText 2

predict

Fact-checker

suspicious

http://www.news1~I suspect it is fake news. Read WSJ...

http://www.news2~This is completely misinformation ...

Page 3: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

The Post-Truth Eran “Fake News” is considered to be a significant problem

p Researchers said fake news on social media influenced US election voters [Bovet+, 2018]

p Fake News led a young man to murder nine people at a historic African-American church in Charleston

p A drama featuring Fake News is produced in Japan by national broadcasting company

SuspiciousNewsDetectionUsingMicroBlogText 3

https://www.nhk.or.jp/dodra/fakenews/

https://www.eurweb.com/2018/01/trump-reveals-winners-controversial-fake-news-awards/

https://theundefeated.com/features/how-fake-news-led-to-dylann-roof-to-murder-nine-people/

Page 4: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

What is Fake Newsn Definition

p News articles that are intentionally false and could mislead readers [Shu et al., 2017]

n Problematic Issuep The spreading of Fake News has a negative impacton our society and the news industry

SuspiciousNewsDetectionUsingMicroBlogText

negatively affect an election cause a conflictFAKE

4

Page 5: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Difficulty of Fact-Checkingn Fact-checking is a time-consuming task, sometimes It

takes a whole day to research and write a articlen Fact-checkers cannot keep up with the amount of misinformation generated every day

n Human fact-checking is an intellectually demanding and laborious process

SuspiciousNewsDetectionUsingMicroBlogText 5

Narrowing down the number of articles that require human fact-checking is necessary

Page 6: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Difficulty of Narrowing Down Articles

n Simply filtering with specific keywords such as ‘misinformation’ and ‘fake’ can not find C efficientlyp Just saying personal impression on the article

p The target of mention is not the content of news

SuspiciousNewsDetectionUsingMicroBlogText 6

http://www.news1~I really can not believe it. I wish it were a misinformation. I’m lost for words, but I’ll send my prayers!

http://www.news1~Does anybody feel she is trying to talk around false teeth because of all those implants in her cheeks and chin?

Page 7: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Our Goaln Automating suspicious news detection using posts on SNS that cast suspicion on news articles

SuspiciousNewsDetectionUsingMicroBlogText 7

1Collect posts on SNS

collect

database

2Predict suspicious or not using posts

predict

suspicious

Page 8: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Definitions of Termsn Suspicion casting posts (SCP)

p Posts on SNS that refer to and cast suspicion on certain news articles

n Suspicious articles (SA)p News articles to be verified by human fact-checker p We defined SA are news articles mentioned by at least one SCP

SuspiciousNewsDetectionUsingMicroBlogTextcitizen

http://www.news.~I suspect it is fake news. Read WSJ article ‒ says ...

fact-checker

Suspicion casting post (SCP)Fact-checking

Suspicious article (SA)

8

Page 9: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Proposed Taskn Propose and formalize two tasksn Suspicion Casting Post Detection

Post on SNS that refer to a news articleJudgement whether it is SCP or just mentioning personal impression on the article

n Suspicious Article Detectionp Given a set of posts that refer to same article, judge whether the set include SCP or not

SuspiciousNewsDetectionUsingMicroBlogText 9

1Input :Output :

2

http://www.news1.~This article denotes misinformation, doesn’t it?

Suspicion casting post (SCP)http://www.news2.~I really can not believe it. I wish it were a lie. I‘ll send my prayers!

Not suspicion casting post

Page 10: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake

2. Removed the noises such as article title, URL, mentions and hashtags from posts

3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise

SuspiciousNewsDetectionUsingMicroBlogText 10

1

http:www.news.~ #pleaserepostThis article is completely misinformation because …

Page 11: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Dataset

SuspiciousNewsDetectionUsingMicroBlogText 11

This article is completely misinformation because …

Preprocess

n Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake

2. Removed the noises such as article title, URL, mentions and hashtags from posts

3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise

1

http:www.news.~ #pleaserepostThis article is completely misinformation because …

Page 12: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake

2. Removed the noises such as article title, URL, mentions and hashtags from posts

3. To each collected post, we annotated 1 if the post casts suspicion and 0 otherwise

SuspiciousNewsDetectionUsingMicroBlogText 12

1

This article is completely misinformation because …

Preprocesshttp:www.news.~ #pleaserepostThis article is completely misinformation because …

Suspicion casting post (SCP)

Page 13: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Datasetn We created two datasets for our tasksn Suspicious Article Dataset

1. Collected a set of posts that refer to same news article and preprocessed these posts similarly

2. Annotated 1 if a set of posts refer to the same article include at least one SCP and 0 otherwise

SuspiciousNewsDetectionUsingMicroBlogText 13

2

This is completely false …

This fiscal policy is wrong …

Annotate

Suspicious articleSuspicion casting post (SCP)

Page 14: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Datasetn Statistics of datasetsn Suspicion Casting Post Dataset

p Number of sample is 7,775 posts (pos:1,036 / neg:6,739)p Average length of posts is 56.6 characters

n Suspicious Article Datasetp Number of sample is 1,836 articles (pos:564 / neg:1,272)p Average length of posts is 60.4 charactersp Average number of posts per article is 2.75

SuspiciousNewsDetectionUsingMicroBlogText 14

1

2

Page 15: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Experiments Setupn Models

p Logistic Regression (LR)p SVMp Decision Tree (DT)p Random Forest (RF)p LSTM

n Settingsp Word embeddings : 300dim (Learned from 4.5M tweets)p Vocab. size : 80K

n Evaluationp Precision, Recall, Micro-F1, Recall@K (Only SA detection)p Stratified 5-fold cross validation

SuspiciousNewsDetectionUsingMicroBlogText 15

Page 16: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Resultsn Results for SCP detection

n Results for SA detection

SuspiciousNewsDetectionUsingMicroBlogText 16

Overall, the LR, SVM and LSTM models yielded higher Micro-F1 scores than DT and RF models

Similarly, the LR, SVM and LSTM models achieved higher scores than the other two models

Page 17: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Error Analysisn Analyzed incorrectly judged posts by all models

p It is difficult for the basic models to properly capture sentence-level meanings, since the models mainly used word-level features, • Answer : SCP, Prediction : not SCP

• Answer : not SCP, Prediction : SCP

SuspiciousNewsDetectionUsingMicroBlogText 17

http://www.news1~At last, the news source has got clear... I wished it had been misinformation

http://www.news1~The description that a part ... is not wrong, but since the level of ~ , this title can mislead readers.

Page 18: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Resultsn Recall@K curve of SA detection task

SuspiciousNewsDetectionUsingMicroBlogText 18

Most of the models achieved 80% recall at the top 40% ranked articles

We can collect 80% suspicious articles only checking the top 40% ranked articles

Page 19: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Applicationn Created an application to support manual Fact-

checking named Fact-checking console

SuspiciousNewsDetectionUsingMicroBlogText 19

Suspicion casting post

Suspicious article

Suspicious article

Suspicion casting post

Page 20: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Application

SuspiciousNewsDetectionUsingMicroBlogText 20

Page 21: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Fact-check Projectn Our project used Fact-checking Console at the

Okinawa governor election held in Sep. 2018

SuspiciousNewsDetectionUsingMicroBlogText 21

(2018.9.1~10.3)http://fij.info/project/okinawa2018

6 media and 26 volunteers participated in this project as Fact-checker

Page 22: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Fact-check Project Outline

SuspiciousNewsDetectionUsingMicroBlogText 22

1Collect posts on SNS

collect

database

2Predict suspicious or not using posts

predict

Fact-checker

Fact-checking console

suspicious

3Check suspiciousnews articles

suspiciouscheck

Page 23: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Example of detected Fake Newsn Some media reported a famous female singer NamieAmuro supported a candidate Denny Tamaki

SuspiciousNewsDetectionUsingMicroBlogText 23

Suspicion casting post (SCP)

Denny Tamaki

NamieAmuro

FAKEA misinformation as if

Namie Amuro is supporting Denny Tamaki is spreading.

Page 24: Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Conclusionn Summary

p Formalized and tackled a task, suspicious news detection using microblog text

p Applied our system to fact-checking activities in a real-world situation and succeeded to detect fake news

n Future workp To develop systems, we will create more sophisticated models for suspicious news detection

p Evaluate the difference between using our application for fact-checking and not using

p Consider information of news articles to predict

SuspiciousNewsDetectionUsingMicroBlogText 24