feedwiz: using automated document clustering to “map the blogosphere”

18
FeedWiz: Using Automated Document Clustering to “Map the Blogosphere” David Schuff ([email protected]) Temple University Ozgur Turetken ([email protected]) Ryerson University

Upload: ellis

Post on 06-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”. David Schuff ([email protected]) Temple University Ozgur Turetken ([email protected]) Ryerson University. The role of weblogs. Increasingly important mode of discourse Is this really the “new media”?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

David Schuff ([email protected])Temple University

Ozgur Turetken ([email protected])Ryerson University

Page 2: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

The role of weblogs

Increasingly important mode of discourse Is this really the “new media”?

Page 3: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

The consequences

Proliferation of information Easy self-publishing Proliferation of content

Leads to a “silo effect” Limited information diet of

only a few blogs Will tend to seek out

confirmatory points of view

Our area of interest is news and political blogs. Not a blog about Paris Hilton (yes, there is one).

Page 4: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

The consequences

(Strict) filtering is seen as a threat to public discourse and democracy (Sunstein 2004)

At least, the true potential of the blogosphere is not being realized

Page 5: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

The power law distribution

An exponential relationship between two variables

Used to explain website popularity

On the right: number of inbound links by weblog (2002)

http://www.shirky.com/writings/powerlaw_weblog.html

The top 3% of the political blog sites accounted for 20% of the inbound links

Page 6: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

The decision support and information systems context

A key challenge is to create tools that help “filter, sort, and navigate” the blogosphere (Cayzer 2004)

Blogging is essentially a form of CMC (Tan et al. 2005)

Can facilitate “common understanding” The formation of an opinion is essentially

a decision-making issue

Page 7: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Research question

How can information presentation techniques be used to improve information consumption on the blogosphere?

Our proposition: This can be done by presenting information organized by content, not by author (or site)

Page 8: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

What we’re drawing from

Chunking and semantic networks (Miller 1964, Mandler 1967, Quillian 1968, Collins and Quillian 1969)

Clustering of text-based documents(Chen et al. 1996, Chen et al. 1996, Pirolli et al. 1997, Spangler et al. 2003, Roussinov and Chen 2001, Turetken and Sharda 2004)

Information visualization “Preattentive” extraction of information (Bray

1996)

Size and color (Shneiderman 1994)

Page 9: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

FeedWiz (demo)

Live demo… How it works…

Select/create a list of weblogs

Navigate clusters of blog entries

Browse the individual clusters

Page 10: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Study 1 design

Quasi-experiment (semi-controlled)

Two groups of subjects Both given a list of webogs Group A: Given an ordered list of URLs Group B: Given FeedWiz

Page 11: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

O X OMeasuring effectiveness

Study how attitudes change (OXO design)

Measuring… Opinion (agree/disagree and supporting rationale) Level of conviction Sources (blogs) used to form the opinion

Ask subjects’ opinion on an

issue (i.e., hybrid cars)

Give subjects an hour to read the

list of blogs

Ask subjects again for their opinion on that

issue

Page 12: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Hypotheses

H1: In forming their opinions, FeedWiz users will use more sources than those who use an ordered list

H2: FeedWiz users will be more likely to change their opinions than those who use an ordered list

H3: FeedWiz users are less likely to form strong opinions than those who use an ordered list

Page 13: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Study 2 design

Intensive data collection with small sample Tracking of eye-movements Recording verbal comments

Protocol analysis For further insights on usability of tool

Page 14: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Expected contributions

Investigate how opinions are formed from blogs

Understand how information presentation techniques can influence information consumption Implications for public discourse on the web

Creation of a highly usable tool which demonstrates those techniques

Page 15: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

References

Bray, T. (1996). Measuring the Web, In Proceedings of the Fifth International World Wide Web Conference, Paris, France.

Cayzer, S. (2004). Semantic blogging and decentralized knowledge management. Communications of the ACM, 47(12), 47-52.

Chen, H., Nunamaker, J., Orwig, R.E., & Titkova, O. (1998). Information visualization for collaborative computing. IEEE Computer, 31(8), 75-82.

Chen, H., Schuffels, C., & Orwig, R.E. (1996). Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1), 88-102.

Collins, A.M. & Quillian, M.R. (1969). Retrieval time from semantic memory. Journal of Learning and Verbal Behavior, 8, 240-247.

Mandler, G. (1967). Organization in memory. In K. W. Spence, & J. T. Spence (Eds.), The Psychology of Learning and Motivation (pp. 327-372). New York, NY: Academic Press.

Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

Pirolli, P. Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/gather browsing communicates the topic structure of a very large text collection. In Proceedings of the Conference on Human Factors in Computing Systems, New York, NY: ACM Press, 213-220.

Quillian, M.R. (1968). Semantic memory. In M. Minsky (Ed.), Semantic Information Processing (pp. 227-270), Cambridge, MA: The MIT Press.

Page 16: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

References (continued)

Roussinov, D.G. & Chen, H. (2001). Information navigation on the web by clustering and summarizing query results. Information Processing and Management, 37(6), 789-817.

Shirky, C. (2003). Power laws, weblogs, and inequality. Accessed September 26, 2006 from http://www.shirky.com/writings/powerlaw_weblog.html.

Shneiderman, B. (1994). Dynamic queries for visual information seeing. IEEE Software, 11(6), 70.

Spangler, S., Kreulen, J.T., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191-212.

Sunstein, C.R. (2004). Democracy and filtering. Communications of the ACM, 47(12), 57-59.

Tan, C., Goswami, S., Chan, Y., & Zhong, Y. (2005). Conceptual evaluation of weblog as a computer-mediated communication application. In Proceedings from the 11th Americas Conference on Information Systems, Omaha, NE, 2361-2367.

Turetken, O. & Sharda, R. (2004). Development of a fisheye-based information search processing aid (FISPA) for managing information overload in the web environment. Decision Support Systems, 37(3), 415-434.

Page 17: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Appendix: How FeedWiz Works

FeedWiz Application Architecture

FeedWiz Application Server

HierarchicalClustering

Module Intelligent Miner for Text

Feed Aggregation

Module

.NET Web Service (C#)

Weblog sites(RSS feeds)

FeedWizClient

Flash applicationList of blogURLs

Hierarchy (XML) and individual posts

Page 18: FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

Appendix: How the documents are clustered

Blog posts are saved as text files on the FeedWiz server

Those files are grouped into clusters based on similarity

An output file is generated that describes the hierarchy

HierarchicalClustering

Module

Hierarchical Clustering Module

Original collection

1st

Iteration

2nd

Iteration

3rd

Iteration

nth

Iteration