web 上的信息可信度问题 - renmin university of...

23
Web 上的信息可信度问题 艾静 随着 Web 的飞速发展,Web 上的信息越来越多,如何保证和辨别这些信息的可信度成为一个重 要的研究问题。本综述系统而详细地介绍了信息可信度的重要性、概念,以及在几个典型应用场景 下的信息可信度问题和相应的解决办法。此外,还介绍了关于 Web 上信息的可信度的评估标准和评 价方法。 Web 上的信息可信度问题是从上个世纪 90 年代逐渐兴起的一个重要的研究题目,随着 Web 发展,又衍生出许多与之相关的新问题。本综述首先从三个典型的例子出发,阐述了 Web 上的信息 可信度对人们生活的巨大影响,以及 Web 用户对可信信息的渴求,从而引出 credibility 的定义和相 关概念。通过大量阅读相关文献,总结归纳出了六种不同的 Web 应用场景,关于 Web credibility 论文也一般都是研究这六大应用场景下的 credibility 问题以及相应的解决办法。 第一个应用场景是单个的网站或网页。网站的建设者想知道那些特性使用户对网站的信任感增 加,哪些特性会减少用户的信任感。而作为信息浏览者的用户来说,在网上浏览的时候,需要辨别 遇到的网站或者网页的可信度,上面描述的事情是不是真的。接下来的三类应用场景,是现在最流 行的三种网络结构:P2P 网络、语义网和社交网络。它们的共同之处就是都是网状结构,主要探讨 的是某个节点在整个网络中的可信度问题,以及如何自动辨别出哪些节点是不可信的,并把这些节 点从网络中剔除出去,常用的方法是信任值的传播,这是针对网络结构设计的方法,信任网络(trust of Web)常被使用。第五和第六类应用场景是现在非常流行的网上论坛和合作知识库(主要是指维 基百科)。它们的共同之处是:用户贡献自己的信息到 Web 上。由于这两类应用场景都是集合了用户 的群体意见和集体智慧,因此这方面的相关工作都是利用用户的评论、打分来判断信息的可信度。 最后,本综述还总结了几种典型的从不同角度来验证信息可信度的方法,以及信息可信度的评价标 准。这些评价方法是通用的,在上述的六种不同的 Web 应用场景下都可以使用。 本综述是关于 Web 上的信息可信度问题的初步总结。主要涉及相关文章的整理归纳和介绍工作。 详细内容见后续的 ppt

Upload: others

Post on 21-Apr-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Web 上的信息可信度问题

艾静

随着 Web 的飞速发展,Web 上的信息越来越多,如何保证和辨别这些信息的可信度成为一个重

要的研究问题。本综述系统而详细地介绍了信息可信度的重要性、概念,以及在几个典型应用场景

下的信息可信度问题和相应的解决办法。此外,还介绍了关于 Web 上信息的可信度的评估标准和评

价方法。 Web 上的信息可信度问题是从上个世纪 90 年代逐渐兴起的一个重要的研究题目,随着 Web 的

发展,又衍生出许多与之相关的新问题。本综述首先从三个典型的例子出发,阐述了 Web 上的信息

可信度对人们生活的巨大影响,以及 Web 用户对可信信息的渴求,从而引出 credibility 的定义和相

关概念。通过大量阅读相关文献,总结归纳出了六种不同的 Web 应用场景,关于 Web credibility 的

论文也一般都是研究这六大应用场景下的 credibility 问题以及相应的解决办法。 第一个应用场景是单个的网站或网页。网站的建设者想知道那些特性使用户对网站的信任感增

加,哪些特性会减少用户的信任感。而作为信息浏览者的用户来说,在网上浏览的时候,需要辨别

遇到的网站或者网页的可信度,上面描述的事情是不是真的。接下来的三类应用场景,是现在最流

行的三种网络结构:P2P 网络、语义网和社交网络。它们的共同之处就是都是网状结构,主要探讨

的是某个节点在整个网络中的可信度问题,以及如何自动辨别出哪些节点是不可信的,并把这些节

点从网络中剔除出去,常用的方法是信任值的传播,这是针对网络结构设计的方法,信任网络(trust of Web)常被使用。第五和第六类应用场景是现在非常流行的网上论坛和合作知识库(主要是指维

基百科)。它们的共同之处是:用户贡献自己的信息到 Web 上。由于这两类应用场景都是集合了用户

的群体意见和集体智慧,因此这方面的相关工作都是利用用户的评论、打分来判断信息的可信度。

最后,本综述还总结了几种典型的从不同角度来验证信息可信度的方法,以及信息可信度的评价标

准。这些评价方法是通用的,在上述的六种不同的 Web 应用场景下都可以使用。 本综述是关于 Web 上的信息可信度问题的初步总结。主要涉及相关文章的整理归纳和介绍工作。

详细内容见后续的 ppt。

Page 2: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Information Credibility on the Web

报告人:艾静

2008-12-20

Outline

A Brief Introduction to Information Credibility

Credibility in Different Web Scenarios

Information Credibility Assessment and Evaluation

Summary & Our Ideas

2/44

Web上的信息可信度问题

Page 3: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Example 1: False News

Civil news, blog news

news site

Steve Jobs, CEO of Apple, rushed to ER following

severe heart attack

At 9 on October 3, 2008

An hour later

instantly vaporize9,000,000,000 dollars!

10:00The spokesman denied the message10:20iReport deleted it

Internet Shoppingmuch lower price

Example 2: E-commerce

www.dangdang.comwww.taobao.com www.amazon.cn

I want to buy a notebook.

which site provides certified product?

Fraudsinferior merchandise

Misleading, and biased comments

……

3/44

4/44

Web上的信息可信度问题

Page 4: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Example 3: Search Engine Ranking

Baidu'sbid-for-ranking

GoogleWeb spamCloaking……

Overview

Credibility: the objective and subjective components of the believability of a source or message

Two key components: trustworthiness expertise (authority of the data source)

Credibility on the web has become an important topic since the mid-1990s

5/44

6/44

Web上的信息可信度问题

Page 5: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Outline

A Brief Introduction to Information Credibility

Credibility in Different Web Scenarios

Information Credibility Assessment and Evaluation

Summary & Our Ideas

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

7/44

8/44

Web上的信息可信度问题

Page 6: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Credibility for webpage & website

Two perspectives:From human browsers:

—— How to identify the true information and false?—— I feel this website is more reliable than that. —— Which features make a website more reliable?

From search engines:There is too much “web spam” on the Internet How to detect them automatically and efficiently?

(detailed introduction by Hu Xiangmei in Report 2)

Web pages that exist only to mislead searchengines into (mis)leading users to certain web sites.

Related References

BJ Fogg, Jonathan Marshall, Othman Laraki, et al. What Makes Web Sites Credible?——A Report on a Large Quantitative Study. Stanford University, SIGCHI2001

R. Lee, D.Kitayama and K. Sumiya. Web-based Evidence Excavation to Explore the Authenticity of Local Events. University of Hyogo, Japan. WICOW2008

Y. Kawai, Y. Fujita, T. Kumamoto. Using a Sentiment Map for Visualizing Credibility of News Sites on the Web. Kyoto Sangyo University, Japan. WICOW2008

9/44

10/44

Web上的信息可信度问题

Page 7: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Evaluating 51different Web site elements

Which features make web sites more credible?

What Makes Web Sites Credible?——A Report on a Large Quantitative Study, Stanford University, SIGCHI’01

Sample online questionnaire

Over 1400 participators

Identify Credibility of News Sites on the Web (1)

Believe Evidence Search for Events on the web event =(time, space, vestige)

Web-based Evidence Excavation to Explore the Authenticity of Local Events, University of Hyogo, WICOW08

a bag of words characteristic

Construct a database of real-world events from the Web

User Interface for Credible Search

Credible event

database

11/44

12/44

Web上的信息可信度问题

Page 8: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

Peer-to-Peer Networks

P2P Architecture:the open and anonymous nature

A client-server network

Offer an almost ideal environment for the spread ofinauthentic files

File sharing network

13/44

14/44

Web上的信息可信度问题

Page 9: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Related References

F. Cornelli, E. Damiani, S. D. C. D. Vimercati, S. Paraboschi, and S. Samarati. Choosing Reputable Servents in a P2P Network. In Proceedings of the 11th World Wide Web Conference, Hawaii, USA, May 2002

K. Aberer and Z. Despotovic. Managing Trust in a Peer-2-Peer Information System. In Proceedings of the 10th International Conference on Information and Knowledge Management (ACM CIKM), New York, USA, 2001.

Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: Proceedings of the 12th international conference on World Wide Web. (2003)

Damiani, E., di Vimercati, S., Paraboschi, S., Samarati, P., Violante, F.: A reputation-based approach for choosing reliable resources in peer-to-peer networks. (2002) In 9th ACM Conf. on Computer and Communications Security.

S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. Incentives for Combatting Freeriding on P2P Networks. Technical report, Stanford University, 2003.

Problem

Attacks by anonymous malicious peers: introduce viruses…

Goal:identify malicious peersthat provide inauthentic files

Based on the peer’s previous behavior:

history of uploads

The EigenTrust Algorithm for Reputation Management in P2P Networks, Stanford University, WWW03

15/44

16/44

Web上的信息可信度问题

Page 10: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Friends of Friends

Problem:Each peer has limitedpast experience.Knows few other peers.

Ask for the opinions of the people who you trust.

Compute a global trust tivalue for a peer

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

−−

00

0

000

Peer 4

Peer 6

Peer 1

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎛−−

00000

0

Peer 2

Peer 8

how much you trust him C14

What they think of peer k C42

she will have a complete view of the network

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

17/44

18/44

Web上的信息可信度问题

Page 11: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

What is Semantic Web?

The Semantic Web:The extension of the current webinformation is given well-defined meaningenable computers and humans to work in cooperationalso be known as Web3.0

(Tim Berners-Lee in 1999)

Semantic Web provides computer the ability of automatically processing information

User Centric &Human Understanding

Computer Understanding!!!

Agent

Related References

McGuinness, D.L., Pinheiro da Silva, P.: Explaining answers from the semantic web: The inference web approach. In: Journal of Web Semantics. Volume 1. (2004) 397–413

M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Proceedings of the Second International Semantic Web Conference, pages 351–368, 2003.

Ceglowski M, Coburn A, Cuadrado J. Semantic search of unstructured data using contextual network graphs, June 2003.

Gans G, Jarke M, Kethers S, Lakemeyer G. Modeling the impact of trust and distrust in agent networks. In: Proceedings of the Third International Bi-Conference Workshop on Agent-oriented Information Systems, Montreal, Canada, May 2001.

Golbeck J, Parsia B, Hendler J. Trust networks on the semantic web. In: Proceedings of Cooperative Intelligent Agents, Helsinki, Finland, August 2003.

20/44

1944

Web上的信息可信度问题

Page 12: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Trust Management for the Semantic Web

The semantic web large, uncensored system anyone may contribute

Assume all the information on the Semantic Web: logical assertions

Establish the degree of belief in a statement

Each source’s belief in the statement and the user’s trust in each source

——how can a user decide how much to trust a source she does not know directly?

employing a web of trusteach user maintains a small set of users he/she trusts

Trust Management for the Semantic Web, University of Washington, International Semantic Web Conference 2003

Web of Trust

i

People/AgentProducer/Consumer

Local neighborshelp in

determining trust of

distant neighbors

Each user specifies a small set of users she/he trusts

21/44

22/44

Web上的信息可信度问题

Page 13: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

Social Network

social networka social structure made of nodes (generally individuals or organizations) tied by one or more specific types of interdependency

values, visions, ideas, financial exchange, friendship, kinship……graph-based structures

A (directed) networkof people

24/44

23/44

Web上的信息可信度问题

Page 14: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Related References

Mui, L.: Computational Models of Trust and Reputation: Agents, Evolutionary Games, and Social Networks. PhD thesis, MIT (2002)

C-N Ziegler and G. Lausen: Propagation Models for Trust and Distrust in Social Networks. Information Systems Frontiers 7:4/5, 337–358, 2005

Guha, R., Kumar R., Raghavan P., and Tomkins A. Propagation of trust and distrust. In Proceedings of the Thirteenth International World Wide Web Conference, 2004

Propagation of Trust and Distrust

Experience with real-world suggests that distrust is at least as important as trust

Trust useful, authentic information

Distrust Disinformation(useless, inauthentic, fraudulent information)

the eigenvector of the matrix of distrust values

25/44

26/44

Web上的信息可信度问题

Page 15: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Solution

n users

n x n matrices: T and D (T: Trust D: Distrust)tij = i ’s trust in j0 <= tij <= 1same for D (distrust)

predict unknown values from T and D

M: generic belief matrix

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

27/44

28/44

Web上的信息可信度问题

Page 16: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

E-commerce and Recommendation Systems

……

Related References

J. Staddon, R.Chow. Detecting Reviewer Bias Through Web-Based Association Mining. PARC, WICOW08

P. Kollock. The production of trust in online markets. In E. J. Lawler and M. Macy, S. Thyne, and H. A. Walker, editors, Advances in Group Processes, volume 16, pages 99–123. JAI Press, 1999.

S. Nakamura, M.Shimizu and K. Tanaka. Can Social Annotation Support Users in Evaluating the Trustworthiness of Video Clips? Graduate School of Informatics, Kyoto University, WICOW08

N.Wanas, M.El-Saban, H. Ashour, W. Ammar. Automatic Scoring of Online Discussion Posts. Cairo Microsoft Innovation Center, WICOW08

S. Ba, A. B. Whinston, and H. Zhang. Building trust in online auction markets through an economic incentive mechanism. Decision Support Systems, 35(3):273–286, 2002.

29/44

30/44

Web上的信息可信度问题

Page 17: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Users Scoring

collaborative intelligence the posts that are worth attending

Post rating: five point scale

automatically assess online discussion posts automatic content filtering in online discussion forums

Support Vector Machine (SVM) classifier

Automatic Scoring of Online Discussion Posts, Cairo Microsoft Innovation Center, WICOW08

Solution

Aim: detect potential bias, assess the validity of online reviews

bring the broader context of the reviewer into the online communityassociation rules between book reviewers and the authorsof the books they review

an association rule: A, B: reviewer/author of the same book Pr(B|A) is large Pr(A^B) is large

Frequentlyco-occurrence of names

31/44

32/44

Web上的信息可信度问题

Page 18: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Credibility in Different Web Scenarios

P2P Network

Social Network

Web Page &Web Site

Semantic Web

Collaborative RepositoriesWikipedia

Online DiscussionForums

Credibility

on the Web

Wikipedia

The emerging pattern for building large information repositories

encourage many people to collaborate in a distributed manner create and maintain a repository of shared contentopen editing: allows users to freely create and edit web pages

33/44

34/44

Web上的信息可信度问题

Page 19: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Related References

Blaze, M., Feigenbaum, J., Lacy, J.: Decentralized trust management. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy. (1996) 164–173

Deborah L. McGuinness1, Honglei Zeng1, Paulo Pinheiro da Silva. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006

Rui Lopes, Luís Carriço. On the Credibility of Wikipedia: an Accessibility Perspective. WICOW2008

B. Thomas Adler, Luca de Alfaro. A Content-Driven Reputation System for the Wikipedia, WWW2007

M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In CIKM ’07.

Trust in Social Collaborative Information Spaces

ConceptsArticleVersion (of an article) FragmentAuthor

RelationsAn article: multiple versionsA version: multiple fragmentsA fragment: an authorA version: multiple authors

Article

Version

Fragment

Author

Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study, Stanford University, WWW06

1:n

1:n

1:n

1:1

35/44

36/44

Web上的信息可信度问题

Page 20: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Deriving Trust from Revision History

Revision Operations (insertion, deletion, modification) implies trust

trustworthiness of the revised article depends onthe trustworthiness of the previous versionthe author of the last revisionthe modified content involved in the fragment

Revision history is widely available in cooperative information systems

Wikipedia Article with Citation Trust View

Citation Revision

Fragments are colored per their trust values computed from Citation Trust

37/44

38/44

Web上的信息可信度问题

Page 21: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Outline

A Brief Introduction to Information Credibility

Credibility in Different Web Scenarios

Information Credibility Assessment and Evaluation

Summary & Our Ideas

Related References

Irit Askira Gelman, Anthony L. Barletta. A “Quick and Dirty” Website Data Quality Indicator. University of Arizona. WICOW2008

Llewellyn C.M. Tang, Yuyang Zhao, Simon Austin. A Characteristic Based Information Evaluation Model. Loughborough University. WICOW2008

39/44

40/44

Web上的信息可信度问题

Page 22: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

the spelling error rate the quality of the document

Application:social forum exchanges, personal websites, wikipedia, etc

a minimal set E of spelling errors (10 common English spelling errors)hit counts of search engine queries on E

positively related

literature on web data credibility assessment

A “Quick and Dirty” Website Data Quality Indicator, University of Arizona, WICOW08, short paper

Spelling Error:Recieve

AccomodateAccrossTruelyAcheiveAffraidAgressive

AppearenceTomorow

Arguement

(( )

+ +1

, ),

( , ) ( , )j

jj j

HitCount eErrorIndex

de d

HitCount e d HitCount c d=

Information many characteristic quantify Value of Information (VOI)

Characteristic Based Information Evaluation

A Characteristic Based Information Evaluation Model, Department of Civil and Building Engineering Loughborough University, WICOW08

41/44

42/44

Web上的信息可信度问题

Page 23: Web 上的信息可信度问题 - Renmin University of Chinaidke.ruc.edu.cn/reports/report2008/Technology seminar... · 2018-03-21 · Web 上的信息可信度问题 艾静 随着Web

Outline

A Brief Introduction to Information Credibility

Credibility in Different Web Scenarios

Information Credibility Assessment and Evaluation

Summary & Our Ideas

SummarySemantic

Web

P2P network

Socialnetwork

Network structureTrust of web

Trust value matrix

Online Discussion

Forums

Collaborative Repositories

Wikipedia

Scoring by users’ commentsRanking by trust valueMachine learning classifier

Influence PropagationGraph mining

Rating modelScoring mechanism

43/44

44/44

Web上的信息可信度问题