web 上的信息可信度问题 - renmin university of...
TRANSCRIPT
Web 上的信息可信度问题
艾静
随着 Web 的飞速发展,Web 上的信息越来越多,如何保证和辨别这些信息的可信度成为一个重
要的研究问题。本综述系统而详细地介绍了信息可信度的重要性、概念,以及在几个典型应用场景
下的信息可信度问题和相应的解决办法。此外,还介绍了关于 Web 上信息的可信度的评估标准和评
价方法。 Web 上的信息可信度问题是从上个世纪 90 年代逐渐兴起的一个重要的研究题目,随着 Web 的
发展,又衍生出许多与之相关的新问题。本综述首先从三个典型的例子出发,阐述了 Web 上的信息
可信度对人们生活的巨大影响,以及 Web 用户对可信信息的渴求,从而引出 credibility 的定义和相
关概念。通过大量阅读相关文献,总结归纳出了六种不同的 Web 应用场景,关于 Web credibility 的
论文也一般都是研究这六大应用场景下的 credibility 问题以及相应的解决办法。 第一个应用场景是单个的网站或网页。网站的建设者想知道那些特性使用户对网站的信任感增
加,哪些特性会减少用户的信任感。而作为信息浏览者的用户来说,在网上浏览的时候,需要辨别
遇到的网站或者网页的可信度,上面描述的事情是不是真的。接下来的三类应用场景,是现在最流
行的三种网络结构:P2P 网络、语义网和社交网络。它们的共同之处就是都是网状结构,主要探讨
的是某个节点在整个网络中的可信度问题,以及如何自动辨别出哪些节点是不可信的,并把这些节
点从网络中剔除出去,常用的方法是信任值的传播,这是针对网络结构设计的方法,信任网络(trust of Web)常被使用。第五和第六类应用场景是现在非常流行的网上论坛和合作知识库(主要是指维
基百科)。它们的共同之处是:用户贡献自己的信息到 Web 上。由于这两类应用场景都是集合了用户
的群体意见和集体智慧,因此这方面的相关工作都是利用用户的评论、打分来判断信息的可信度。
最后,本综述还总结了几种典型的从不同角度来验证信息可信度的方法,以及信息可信度的评价标
准。这些评价方法是通用的,在上述的六种不同的 Web 应用场景下都可以使用。 本综述是关于 Web 上的信息可信度问题的初步总结。主要涉及相关文章的整理归纳和介绍工作。
详细内容见后续的 ppt。
Information Credibility on the Web
报告人:艾静
2008-12-20
Outline
A Brief Introduction to Information Credibility
Credibility in Different Web Scenarios
Information Credibility Assessment and Evaluation
Summary & Our Ideas
2/44
Web上的信息可信度问题
Example 1: False News
Civil news, blog news
news site
Steve Jobs, CEO of Apple, rushed to ER following
severe heart attack
At 9 on October 3, 2008
An hour later
instantly vaporize9,000,000,000 dollars!
10:00The spokesman denied the message10:20iReport deleted it
Internet Shoppingmuch lower price
Example 2: E-commerce
www.dangdang.comwww.taobao.com www.amazon.cn
I want to buy a notebook.
which site provides certified product?
Fraudsinferior merchandise
Misleading, and biased comments
……
3/44
4/44
Web上的信息可信度问题
Example 3: Search Engine Ranking
Baidu'sbid-for-ranking
GoogleWeb spamCloaking……
Overview
Credibility: the objective and subjective components of the believability of a source or message
Two key components: trustworthiness expertise (authority of the data source)
Credibility on the web has become an important topic since the mid-1990s
5/44
6/44
Web上的信息可信度问题
Outline
A Brief Introduction to Information Credibility
Credibility in Different Web Scenarios
Information Credibility Assessment and Evaluation
Summary & Our Ideas
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
7/44
8/44
Web上的信息可信度问题
Credibility for webpage & website
Two perspectives:From human browsers:
—— How to identify the true information and false?—— I feel this website is more reliable than that. —— Which features make a website more reliable?
From search engines:There is too much “web spam” on the Internet How to detect them automatically and efficiently?
(detailed introduction by Hu Xiangmei in Report 2)
Web pages that exist only to mislead searchengines into (mis)leading users to certain web sites.
Related References
BJ Fogg, Jonathan Marshall, Othman Laraki, et al. What Makes Web Sites Credible?——A Report on a Large Quantitative Study. Stanford University, SIGCHI2001
R. Lee, D.Kitayama and K. Sumiya. Web-based Evidence Excavation to Explore the Authenticity of Local Events. University of Hyogo, Japan. WICOW2008
Y. Kawai, Y. Fujita, T. Kumamoto. Using a Sentiment Map for Visualizing Credibility of News Sites on the Web. Kyoto Sangyo University, Japan. WICOW2008
9/44
10/44
Web上的信息可信度问题
Evaluating 51different Web site elements
Which features make web sites more credible?
What Makes Web Sites Credible?——A Report on a Large Quantitative Study, Stanford University, SIGCHI’01
Sample online questionnaire
Over 1400 participators
Identify Credibility of News Sites on the Web (1)
Believe Evidence Search for Events on the web event =(time, space, vestige)
Web-based Evidence Excavation to Explore the Authenticity of Local Events, University of Hyogo, WICOW08
a bag of words characteristic
Construct a database of real-world events from the Web
User Interface for Credible Search
Credible event
database
11/44
12/44
Web上的信息可信度问题
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
Peer-to-Peer Networks
P2P Architecture:the open and anonymous nature
A client-server network
Offer an almost ideal environment for the spread ofinauthentic files
File sharing network
13/44
14/44
Web上的信息可信度问题
Related References
F. Cornelli, E. Damiani, S. D. C. D. Vimercati, S. Paraboschi, and S. Samarati. Choosing Reputable Servents in a P2P Network. In Proceedings of the 11th World Wide Web Conference, Hawaii, USA, May 2002
K. Aberer and Z. Despotovic. Managing Trust in a Peer-2-Peer Information System. In Proceedings of the 10th International Conference on Information and Knowledge Management (ACM CIKM), New York, USA, 2001.
Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: Proceedings of the 12th international conference on World Wide Web. (2003)
Damiani, E., di Vimercati, S., Paraboschi, S., Samarati, P., Violante, F.: A reputation-based approach for choosing reliable resources in peer-to-peer networks. (2002) In 9th ACM Conf. on Computer and Communications Security.
S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. Incentives for Combatting Freeriding on P2P Networks. Technical report, Stanford University, 2003.
Problem
Attacks by anonymous malicious peers: introduce viruses…
Goal:identify malicious peersthat provide inauthentic files
Based on the peer’s previous behavior:
history of uploads
The EigenTrust Algorithm for Reputation Management in P2P Networks, Stanford University, WWW03
15/44
16/44
Web上的信息可信度问题
Friends of Friends
Problem:Each peer has limitedpast experience.Knows few other peers.
Ask for the opinions of the people who you trust.
Compute a global trust tivalue for a peer
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
−−
00
0
000
Peer 4
Peer 6
Peer 1
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛−−
00000
0
Peer 2
Peer 8
how much you trust him C14
What they think of peer k C42
she will have a complete view of the network
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
17/44
18/44
Web上的信息可信度问题
What is Semantic Web?
The Semantic Web:The extension of the current webinformation is given well-defined meaningenable computers and humans to work in cooperationalso be known as Web3.0
(Tim Berners-Lee in 1999)
Semantic Web provides computer the ability of automatically processing information
User Centric &Human Understanding
Computer Understanding!!!
Agent
Related References
McGuinness, D.L., Pinheiro da Silva, P.: Explaining answers from the semantic web: The inference web approach. In: Journal of Web Semantics. Volume 1. (2004) 397–413
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Proceedings of the Second International Semantic Web Conference, pages 351–368, 2003.
Ceglowski M, Coburn A, Cuadrado J. Semantic search of unstructured data using contextual network graphs, June 2003.
Gans G, Jarke M, Kethers S, Lakemeyer G. Modeling the impact of trust and distrust in agent networks. In: Proceedings of the Third International Bi-Conference Workshop on Agent-oriented Information Systems, Montreal, Canada, May 2001.
Golbeck J, Parsia B, Hendler J. Trust networks on the semantic web. In: Proceedings of Cooperative Intelligent Agents, Helsinki, Finland, August 2003.
20/44
1944
Web上的信息可信度问题
Trust Management for the Semantic Web
The semantic web large, uncensored system anyone may contribute
Assume all the information on the Semantic Web: logical assertions
Establish the degree of belief in a statement
Each source’s belief in the statement and the user’s trust in each source
——how can a user decide how much to trust a source she does not know directly?
employing a web of trusteach user maintains a small set of users he/she trusts
Trust Management for the Semantic Web, University of Washington, International Semantic Web Conference 2003
Web of Trust
i
People/AgentProducer/Consumer
Local neighborshelp in
determining trust of
distant neighbors
Each user specifies a small set of users she/he trusts
21/44
22/44
Web上的信息可信度问题
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
Social Network
social networka social structure made of nodes (generally individuals or organizations) tied by one or more specific types of interdependency
values, visions, ideas, financial exchange, friendship, kinship……graph-based structures
A (directed) networkof people
24/44
23/44
Web上的信息可信度问题
Related References
Mui, L.: Computational Models of Trust and Reputation: Agents, Evolutionary Games, and Social Networks. PhD thesis, MIT (2002)
C-N Ziegler and G. Lausen: Propagation Models for Trust and Distrust in Social Networks. Information Systems Frontiers 7:4/5, 337–358, 2005
Guha, R., Kumar R., Raghavan P., and Tomkins A. Propagation of trust and distrust. In Proceedings of the Thirteenth International World Wide Web Conference, 2004
Propagation of Trust and Distrust
Experience with real-world suggests that distrust is at least as important as trust
Trust useful, authentic information
Distrust Disinformation(useless, inauthentic, fraudulent information)
the eigenvector of the matrix of distrust values
25/44
26/44
Web上的信息可信度问题
Solution
n users
n x n matrices: T and D (T: Trust D: Distrust)tij = i ’s trust in j0 <= tij <= 1same for D (distrust)
predict unknown values from T and D
M: generic belief matrix
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
27/44
28/44
Web上的信息可信度问题
E-commerce and Recommendation Systems
……
Related References
J. Staddon, R.Chow. Detecting Reviewer Bias Through Web-Based Association Mining. PARC, WICOW08
P. Kollock. The production of trust in online markets. In E. J. Lawler and M. Macy, S. Thyne, and H. A. Walker, editors, Advances in Group Processes, volume 16, pages 99–123. JAI Press, 1999.
S. Nakamura, M.Shimizu and K. Tanaka. Can Social Annotation Support Users in Evaluating the Trustworthiness of Video Clips? Graduate School of Informatics, Kyoto University, WICOW08
N.Wanas, M.El-Saban, H. Ashour, W. Ammar. Automatic Scoring of Online Discussion Posts. Cairo Microsoft Innovation Center, WICOW08
S. Ba, A. B. Whinston, and H. Zhang. Building trust in online auction markets through an economic incentive mechanism. Decision Support Systems, 35(3):273–286, 2002.
29/44
30/44
Web上的信息可信度问题
Users Scoring
collaborative intelligence the posts that are worth attending
Post rating: five point scale
automatically assess online discussion posts automatic content filtering in online discussion forums
Support Vector Machine (SVM) classifier
Automatic Scoring of Online Discussion Posts, Cairo Microsoft Innovation Center, WICOW08
Solution
Aim: detect potential bias, assess the validity of online reviews
bring the broader context of the reviewer into the online communityassociation rules between book reviewers and the authorsof the books they review
an association rule: A, B: reviewer/author of the same book Pr(B|A) is large Pr(A^B) is large
Frequentlyco-occurrence of names
31/44
32/44
Web上的信息可信度问题
Credibility in Different Web Scenarios
P2P Network
Social Network
Web Page &Web Site
Semantic Web
Collaborative RepositoriesWikipedia
Online DiscussionForums
Credibility
on the Web
Wikipedia
The emerging pattern for building large information repositories
encourage many people to collaborate in a distributed manner create and maintain a repository of shared contentopen editing: allows users to freely create and edit web pages
33/44
34/44
Web上的信息可信度问题
Related References
Blaze, M., Feigenbaum, J., Lacy, J.: Decentralized trust management. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy. (1996) 164–173
Deborah L. McGuinness1, Honglei Zeng1, Paulo Pinheiro da Silva. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006
Rui Lopes, Luís Carriço. On the Credibility of Wikipedia: an Accessibility Perspective. WICOW2008
B. Thomas Adler, Luca de Alfaro. A Content-Driven Reputation System for the Wikipedia, WWW2007
M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In CIKM ’07.
Trust in Social Collaborative Information Spaces
ConceptsArticleVersion (of an article) FragmentAuthor
RelationsAn article: multiple versionsA version: multiple fragmentsA fragment: an authorA version: multiple authors
Article
Version
Fragment
Author
Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study, Stanford University, WWW06
1:n
1:n
1:n
1:1
35/44
36/44
Web上的信息可信度问题
Deriving Trust from Revision History
Revision Operations (insertion, deletion, modification) implies trust
trustworthiness of the revised article depends onthe trustworthiness of the previous versionthe author of the last revisionthe modified content involved in the fragment
Revision history is widely available in cooperative information systems
Wikipedia Article with Citation Trust View
Citation Revision
Fragments are colored per their trust values computed from Citation Trust
37/44
38/44
Web上的信息可信度问题
Outline
A Brief Introduction to Information Credibility
Credibility in Different Web Scenarios
Information Credibility Assessment and Evaluation
Summary & Our Ideas
Related References
Irit Askira Gelman, Anthony L. Barletta. A “Quick and Dirty” Website Data Quality Indicator. University of Arizona. WICOW2008
Llewellyn C.M. Tang, Yuyang Zhao, Simon Austin. A Characteristic Based Information Evaluation Model. Loughborough University. WICOW2008
39/44
40/44
Web上的信息可信度问题
the spelling error rate the quality of the document
Application:social forum exchanges, personal websites, wikipedia, etc
a minimal set E of spelling errors (10 common English spelling errors)hit counts of search engine queries on E
positively related
literature on web data credibility assessment
A “Quick and Dirty” Website Data Quality Indicator, University of Arizona, WICOW08, short paper
Spelling Error:Recieve
AccomodateAccrossTruelyAcheiveAffraidAgressive
AppearenceTomorow
Arguement
(( )
+ +1
, ),
( , ) ( , )j
jj j
HitCount eErrorIndex
de d
HitCount e d HitCount c d=
√
Information many characteristic quantify Value of Information (VOI)
Characteristic Based Information Evaluation
A Characteristic Based Information Evaluation Model, Department of Civil and Building Engineering Loughborough University, WICOW08
41/44
42/44
Web上的信息可信度问题
Outline
A Brief Introduction to Information Credibility
Credibility in Different Web Scenarios
Information Credibility Assessment and Evaluation
Summary & Our Ideas
SummarySemantic
Web
P2P network
Socialnetwork
Network structureTrust of web
Trust value matrix
Online Discussion
Forums
Collaborative Repositories
Wikipedia
Scoring by users’ commentsRanking by trust valueMachine learning classifier
Influence PropagationGraph mining
Rating modelScoring mechanism
43/44
44/44
Web上的信息可信度问题