surfacing real-world event content on twitter

96
SURFACING REAL-WORLD EVENT CONTENT ON TWITTER Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Upload: hila-becker

Post on 06-May-2015

1.107 views

Category:

Technology


2 download

DESCRIPTION

Talk given at Google NYC on October 15th, 2010.

TRANSCRIPT

Page 1: Surfacing Real-World Event Content on Twitter

SURFACING REAL-WORLD

EVENT CONTENT ON TWITTER

Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Page 2: Surfacing Real-World Event Content on Twitter

Event Content in Social Media

Page 3: Surfacing Real-World Event Content on Twitter

Event Content in Social Media

Smaller events, without traditional

news coverage

Popular, widely known events

Page 4: Surfacing Real-World Event Content on Twitter

Event Content in Social Media

Discovery

Detect events using features of social media content (e.g., term statistics)

Mining content from known event sources (e.g., user-contributed event databases)

Organization

Associating social media content with events

Identifying similar content within and across sites

Presentation

Selecting what content to display to a user

Providing interfaces that summarize and aggregate the content along different dimensions

Page 5: Surfacing Real-World Event Content on Twitter

Event Content in Social Media

Discovery

Detect events using features of social media content (e.g., term statistics)

Mining content from known event sources (e.g., user-contributed event databases)

Organization

Associating social media content with events

Identifying similar content within and across sites

Presentation

Selecting what content to display to a user

Providing interfaces that summarize and aggregate the content along different dimensions

Page 6: Surfacing Real-World Event Content on Twitter

Event Content in Social Media

Discovery

Detect events using features of social media content (e.g., term statistics)

Mining content from known event sources (e.g., user-contributed event databases)

Organization

Associating social media content with events

Identifying similar content within and across sites

Presentation

Selecting what content to display to a user

Providing interfaces that summarize and aggregate the content along different dimensions

Page 7: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Real-time

Retrospective

(Prospective)

Content discovery

Known properties

Event databases (e.g., Upcoming, Eventful)

Keyword triggers (e.g, “earthquake”)

Shared calendars

Unknown properties

Page 8: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Real-time

Retrospective

(Prospective)

Content discovery

Known properties

Event databases (e.g., Upcoming, Eventful)

Keyword triggers (e.g, “earthquake”)

Shared calendars

Unknown properties

Page 9: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Page 10: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Twitter new event detection [Petrović et al. NAACL’10]

Page 11: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Twitter new event detection [Petrović et al. NAACL’10]

Event detection on Flickr [Chen and Roy CIKM’09]

Page 12: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Earthquake prediction

using Twitter [Sakaki et al.

WWW’10]

Twitter new event detection [Petrović et al. NAACL’10]

Event detection on Flickr [Chen and Roy CIKM’09]

Page 13: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Earthquake prediction

using Twitter [Sakaki et al.

WWW’10]

Twitter new event detection [Petrović et al. NAACL’10]

Event detection on Flickr [Chen and Roy CIKM’09]

Organization of YouTube

concert videos [Kennedy and

Naaman WWW’09]

Page 14: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Page 15: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Surfacing events on

Twitter

Page 16: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Learning similarity metrics

for event identification on

Flickr [Becker et al. WSDM’10]

Surfacing events on

Twitter

Page 17: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Learning similarity metrics

for event identification on

Flickr [Becker et al. WSDM’10]

Surfacing events on

Twitter

Identifying Twitter content

for planned events

Page 18: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Learning similarity metrics

for event identification on

Flickr [Becker et al. WSDM’10]

Surfacing events on

Twitter

Identifying Twitter content

for planned events

Connecting events across

sites (e.g., YouTube,

Picasa)

Page 19: Surfacing Real-World Event Content on Twitter

Twitter Content

Streams of textual

messages

Brief content (140

characters)

Communicated to network

of followers

Page 20: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Page 21: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Recurring

Twitter-centric

Confusing

Real-World

Events?

Page 22: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Recurring

Twitter-centric

Confusing

Real-World

Events?

Page 23: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Recurring

Twitter-centric

Confusing

Real-World

Events?

Page 24: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Recurring

Twitter-centric

Confusing

Real-World

Events?

Page 25: Surfacing Real-World Event Content on Twitter

Twitter Trending Topics

Twitter trending topics, September 24, 2010 7:00am

Recurring

Twitter-centric

Confusing

Real-World

Events?

Page 26: Surfacing Real-World Event Content on Twitter

Identifying Events on Twitter

Challenges:

Wide variety of topics, not all related to events (e.g.,

morning greetings, “thank you” messages)

Low quality text: abbreviations, unconventional language,

riddled with typos, grammatically incorrect

Opportunities:

Content generated in real-time as events happen

Time and location information

Page 27: Surfacing Real-World Event Content on Twitter

Identifying Events on Twitter

Challenges:

Wide variety of topics, not all related to events (e.g.,

morning greetings, “thank you” messages)

Low quality text: abbreviations, unconventional language,

riddled with typos, grammatically incorrect

Opportunities:

Content generated in real-time as events happen

Time and location information

Page 28: Surfacing Real-World Event Content on Twitter

Events on Twitter

Types of events on Twitter

Exogenous: Real-world occurrences (e.g., Superbowl, “Lost” finale)

Endogenous: Specific to the Twitter-verse (e.g., #thingsyoushouldntsay meme, RT statement by Lady Gaga)

Event:

One or more terms and a time period

Volume of messages posted for the terms in the time period exceeds some expected level of activity

Page 29: Surfacing Real-World Event Content on Twitter

Events on Twitter

Types of events on Twitter

Exogenous: Real-world occurrences (e.g., Superbowl, “Lost” finale)

Endogenous: Specific to the Twitter-verse (e.g., #thingsyoushouldntsay meme, RT statement by Lady Gaga)

Event:

One or more terms and a time period

Volume of messages posted for the terms in the time period exceeds some expected level of activity

Page 30: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 31: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 32: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 33: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 34: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweets

Page 35: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweets

Page 36: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets

Page 37: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets

Page 38: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets Event Clusters

Page 39: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets Event Clusters

Page 40: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets Event Clusters

Page 41: Surfacing Real-World Event Content on Twitter

Surfacing Event Content on Twitter

Tweet Clusters

Tweets Event Clusters Selected Tweets

Page 42: Surfacing Real-World Event Content on Twitter

Organizing Tweets in Real-Time

Order tweets by post time

Use TF-IDF vector representation of textual content

Stop word elimination

Stemming

Enhanced weight for hashtags (#tag)

IDF computed over past data

Separate tweets by location

Focus on tweets from NYC

Different locations can be processed in parallel

Page 43: Surfacing Real-World Event Content on Twitter

Organizing Tweets in Real-Time

Order tweets by post time

Use TF-IDF vector representation of textual content

Stop word elimination

Stemming

Enhanced weight for hashtags (#tag)

IDF computed over past data

Separate tweets by location

Focus on tweets from NYC

Different locations can be processed in parallel

Page 44: Surfacing Real-World Event Content on Twitter

Organizing Tweets in Real-Time

Order tweets by post time

Use TF-IDF vector representation of textual content

Stop word elimination

Stemming

Enhanced weight for hashtags (#tag)

IDF computed over past data

Separate tweets by location

Focus on tweets from NYC

Different locations can be processed in parallel

Page 45: Surfacing Real-World Event Content on Twitter

Clustering Algorithm

Many alternatives possible! [Berkhin 2002]

Single-pass incremental clustering algorithm

Scalable, online solution

Used effectively for

Event identification in textual news [Allan et al. 1998]

News event detection on Twitter [Sankaranarayanan et al. 2009]

Does not require a priori knowledge of number of

clusters

Known fragmentation issue, often solved with a

periodic second pass

Page 46: Surfacing Real-World Event Content on Twitter

Clustering Algorithm

Many alternatives possible! [Berkhin 2002]

Single-pass incremental clustering algorithm

Scalable, online solution

Used effectively for

Event identification in textual news [Allan et al. 1998]

News event detection on Twitter [Sankaranarayanan et al. 2009]

Does not require a priori knowledge of number of

clusters

Known fragmentation issue, often solved with a

periodic second pass

Page 47: Surfacing Real-World Event Content on Twitter

Overview of Cluster-based Approach

Group similar tweets via online clustering

Compute statistics of cluster content

Top terms (e.g., [earthquake, haiti])

Number of documents per hour

Use cluster-level features to identify event clusters

Single feature with threshold (e.g., increase in volume

over time-window)

Trained classification model

Page 48: Surfacing Real-World Event Content on Twitter

Overview of Cluster-based Approach

Group similar tweets via online clustering

Compute statistics of cluster content

Top terms (e.g., [earthquake, haiti])

Number of documents per hour

Use cluster-level features to identify event clusters

Single feature with threshold (e.g., increase in volume

over time-window)

Trained classification model

Page 49: Surfacing Real-World Event Content on Twitter

Overview of Cluster-based Approach

Group similar tweets via online clustering

Compute statistics of cluster content

Top terms (e.g., [earthquake, haiti])

Number of documents per hour

Use cluster-level features to identify event clusters

Single feature with threshold (e.g., increase in volume

over time-window)

Trained classification model

Page 50: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 51: Surfacing Real-World Event Content on Twitter

Social Interaction Features

Retweets

RT @username

Often characterize Twitter-specific events

Replies

Tweet starts with @username

Possible indication of non-event content

Mentions

@username anywhere in the tweet

Reference to twitter users that might be part of an event

Page 52: Surfacing Real-World Event Content on Twitter

Social Interaction Features

Retweets

RT @username

Often characterize Twitter-specific events

Replies

Tweet starts with @username

Possible indication of non-event content

Mentions

@username anywhere in the tweet

Reference to twitter users that might be part of an event

Page 53: Surfacing Real-World Event Content on Twitter

Social Interaction Features

Retweets

RT @username

Often characterize Twitter-specific events

Replies

Tweet starts with @username

Possible indication of non-event content

Mentions

@username anywhere in the tweet

Reference to twitter users that might be part of an event

Page 54: Surfacing Real-World Event Content on Twitter

Topic Coherence

Intuition: clusters with strong inter-document similarity

may contain event information

Class

Today

Early

Work

Sleep

Start

I’m gonna do my best to go

sleep during all my classes

today =)

Starting work early today.

Looking fwd to cooking class

tonight!

Today starts the rest of my

life…

Katie

Couric

President

Obama

Interview

CBS

Katie Couric Interview With

President Obama

http://bit.ly/bRsGPo

The Katie Couric-President

Obama interview has now

begun on CBS

Katie Couric interviews

President Obama during CBS'

Super Bowl pregame coverage

Page 55: Surfacing Real-World Event Content on Twitter

Trending Behavior

Trending

characteristics of

top terms in

cluster:

Exponential fit

Deviation from

expected

volume

Volume over time for the term “valentine”

time

docu

ment

s

time (hours)

Page 56: Surfacing Real-World Event Content on Twitter

Twitter-Centric Event Features

Tagging behavior

Multi-word tags (e.g., #myhomelesssignwouldsay)

Percentage of tagged tweets

Top term is a tag

Retweeting

Percentage of messages with RT @

Percentage of messages from top RTed tweet

Page 57: Surfacing Real-World Event Content on Twitter

Twitter-Centric Event Features

Tagging behavior

Multi-word tags (e.g., #myhomelesssignwouldsay)

Percentage of tagged tweets

Top term is a tag

Retweeting

Percentage of messages with RT @

Percentage of messages from top RTed tweet

Page 58: Surfacing Real-World Event Content on Twitter

Event Classifier

Use features to build a classifier

Human-annotated training data

SVM model (selected during training phase)

Alternative classification modes:

RW-Event: real-world event vs. rest

TC-Event: event (real-world or Twitter-centric) vs. non-

event

Page 59: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 60: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 61: Surfacing Real-World Event Content on Twitter

Real-Time Unsupervised Event

Identification on Twitter

Organization

Content representation: text, time, location

Group similar content via clustering

Discovery

Extract discriminating features of clusters

Build an event classifier

Presentation

Select content for each event

Evaluate the quality, relevance, and usefulness

Page 62: Surfacing Real-World Event Content on Twitter

Event Content Selection

Tiger

Woods

Apology

Page 63: Surfacing Real-World Event Content on Twitter

Event Content Selection

Tiger

Woods

Apology

Tiger Woods to make a

public apology Friday and

talk about his future in golf.

Tiger Woods Returns To

Golf - Public Apology

http://bit.ly/9Ui5jx

Tiger woods y'all,tiger

woods y'all,ah tiger woods

y'all

Tiger Woods Hugs:

http://tinyurl.com/yhf4

uzw

Wedge wars upstage

Watson v Woods: BBC

Sport (blog)

Page 64: Surfacing Real-World Event Content on Twitter

Event Content Selection

Tiger

Woods

Apology

Tiger Woods to make a

public apology Friday and

talk about his future in golf.

Tiger Woods Returns To

Golf - Public Apology

http://bit.ly/9Ui5jx

Tiger woods y'all,tiger

woods y'all,ah tiger woods

y'all

Tiger Woods Hugs:

http://tinyurl.com/yhf4

uzw

Wedge wars upstage

Watson v Woods: BBC

Sport (blog)

Page 65: Surfacing Real-World Event Content on Twitter

Event Content Selection

Challenges:

Clusters contain noise

Relevant tweets might have poor quality text

Relevant, high quality tweets might not be interesting

For each tweet and a given event evaluate

Quality

Relevance

Usefulness

Page 66: Surfacing Real-World Event Content on Twitter

Event Content Selection

Challenges:

Clusters contain noise

Relevant tweets might have poor quality text

Relevant, high quality tweets might not be interesting

For each tweet and a given event evaluate

Quality

Relevance

Usefulness

Page 67: Surfacing Real-World Event Content on Twitter

Centrality Based Tweet Selection

Centroid

Cosine similarity of each tweet to cluster centroid

Degree

Tweets are nodes

Tweets are connected if their similarity is above a threshold

Compute degree centrality of each node

LexRank [Erkan and Radev 2004]

Same graph structure as Degree method

Central tweets are similar to other central tweets

Page 68: Surfacing Real-World Event Content on Twitter

Centrality Based Tweet Selection

Centroid

Cosine similarity of each tweet to cluster centroid

Degree

Tweets are nodes

Tweets are connected if their similarity is above a threshold

Compute degree centrality of each node

LexRank [Erkan and Radev 2004]

Same graph structure as Degree method

Central tweets are similar to other central tweets

Page 69: Surfacing Real-World Event Content on Twitter

Centrality Based Tweet Selection

Centroid

Cosine similarity of each tweet to cluster centroid

Degree

Tweets are nodes

Tweets are connected if their similarity is above a threshold

Compute degree centrality of each node

LexRank [Erkan and Radev 2004]

Same graph structure as Degree method

Central tweets are similar to other central tweets

Page 70: Surfacing Real-World Event Content on Twitter

Experimental Setup: Data

>2,600,000 tweets, collected via Twitter API

Location: New York City area

Indicated on user profile

Time: February 2010

First week used to calibrate statistics

Second week used for training/validation

Third and fourth weeks used for testing

Page 71: Surfacing Real-World Event Content on Twitter

Experimental Setup: Data

>2,600,000 tweets, collected via Twitter API

Location: New York City area

Indicated on user profile

Time: February 2010

First week used to calibrate statistics

Second week used for training/validation

Third and fourth weeks used for testing

Page 72: Surfacing Real-World Event Content on Twitter

Experimental Setup: Data

>2,600,000 tweets, collected via Twitter API

Location: New York City area

Indicated on user profile

Time: February 2010

First week used to calibrate statistics

Second week used for training/validation

Third and fourth weeks used for testing

Page 73: Surfacing Real-World Event Content on Twitter

Experimental Setup: Training

Data:

504 clusters

Fastest growing clusters/hour in second week of February

2010

Labels:

Real-world event (e.g., [superbowl,colts,saints,sb44])

Twitter-specific event (e.g., [uknowubrokewhen,money,job])

Non-event (e.g., [happy,love,lol])

Ambiguous cluster (e.g., [south,park,west,sxsw,cartman])

Page 74: Surfacing Real-World Event Content on Twitter

Experimental Setup: Training

Data:

504 clusters

Fastest growing clusters/hour in second week of February

2010

Labels:

Real-world event (e.g., [superbowl,colts,saints,sb44])

Twitter-specific event (e.g., [uknowubrokewhen,money,job])

Non-event (e.g., [happy,love,lol])

Ambiguous cluster (e.g., [south,park,west,sxsw,cartman])

Page 75: Surfacing Real-World Event Content on Twitter

Experimental Setup: Testing

Baselines:

Naïve Bayes text classification (NB-Text)

Fastest-growing clusters per hour

Classifiers:

RW-Event

TC-Event

400 clusters

5 hours

Top 20 clusters per hour according to RW-Event, TC-Event, Fastest-growing, random

Page 76: Surfacing Real-World Event Content on Twitter

Experimental Setup: Testing

Baselines:

Naïve Bayes text classification (NB-Text)

Fastest-growing clusters per hour

Classifiers:

RW-Event

TC-Event

400 clusters

5 hours

Top 20 clusters per hour according to RW-Event, TC-Event, Fastest-growing, random

Page 77: Surfacing Real-World Event Content on Twitter

Experimental Setup: Testing

Baselines:

Naïve Bayes text classification (NB-Text)

Fastest-growing clusters per hour

Classifiers:

RW-Event

TC-Event

400 clusters

5 hours

Top 20 clusters per hour according to RW-Event, TC-Event, Fastest-growing, random

Page 78: Surfacing Real-World Event Content on Twitter

Experimental Methodology: Event

Classification

Classification accuracy

10-fold cross validation

Separate test set of randomly chosen tweets

Event surfacing

Top events per hour for each technique

Evaluation:

Precision@K

NDCG@K

Page 79: Surfacing Real-World Event Content on Twitter

Experimental Methodology: Event

Classification

Classification accuracy

10-fold cross validation

Separate test set of randomly chosen tweets

Event surfacing

Top events per hour for each technique

Evaluation:

Precision@K

NDCG@K

Page 80: Surfacing Real-World Event Content on Twitter

Identified Events

Description Keywords

Senator Evan Bayh's Retirement bayh, evan, senate, congress, retire

Westminster Dog Show westminster, dog, show, club, kennel

Obama’s Meeting with the Dalai Lama lama, dalai, meet, obama, china

NYC Toy Fair toyfairny, starwars, hasbro, lego, toy

Marc Jacobs Fashion Show jacobs, marc, nyfw, show, fashion

A sample of events identified by our classifiers on the test set

Page 81: Surfacing Real-World Event Content on Twitter

Classification Performance (F-measure)

RW-Event classifier is more effective at

discriminating between real-world events and rest

of Twitter data

Classifier Validation Test

NB-Text 0.785 0.702

RW-Event 0.849 0.837

TC-Event 0.875 0.789

Page 82: Surfacing Real-World Event Content on Twitter

Precision@K Evaluation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 10 15 20

Pre

cisi

on

Number of Clusters (K)

RW-Event

TC-Event

Fastest

Random

Page 83: Surfacing Real-World Event Content on Twitter

NDCG@K Evaluation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 10 15 20

ND

CG

Number of Clusters (K)

RW-Event

TC-Event

Fastest

Random

Page 84: Surfacing Real-World Event Content on Twitter

Experimental Methodology:

Content Selection

50 event clusters

Randomly selected from test set

5 top tweets per event for each: Centroid, Degree,

LexRank

Labeled on a 1-4 scale

Quality: excellent (4) poor (1)

Relevance: clearly relevant (4) not relevant (1)

Usefulness: clearly useful (4) not useful (1)

Page 85: Surfacing Real-World Event Content on Twitter

Selected Tweets: Example

Method Tweet

Centroid

Video: Tiger regretful; unsure about return to golf - Main Line ...:

(AP) Tiger Woods publicly apologized Friday...

http://bit.ly/dAO41N

Degree

Watson: Woods needs to show humility upon return (AP): Tom

Watson says Tiger Woods needs to "show some humility to...

http://bit.ly/cHVH7x

LexRank RT @EricStangel: Tiger Woods statement: And now for Elin's

repsonse....

A sample of tweets selected by different centrality methods

Page 86: Surfacing Real-World Event Content on Twitter

Content Selection Results

Average scores over all events

High quality and relevance (>3) for both Degree

and Centroid

Centroid only method with high usefulness

Method Quality Relevance Usefulness

LexRank 3.444 2.984 2.608

Degree 3.536 3.156 2.802

Centroid 3.636 3.694 3.474

Page 87: Surfacing Real-World Event Content on Twitter

Preferred Method per Event

Centroid is the preferred method across all metrics

For usefulness, Centroid tweets preferred more than 2:1

compared to Degree, 4:1 compared to LexRank

Method Quality Relevance Usefulness

LexRank 22.66% 16.33% 12%

Degree 31.66% 25.33% 28%

Centroid 45.66% 58.33% 60%

Page 88: Surfacing Real-World Event Content on Twitter

Conclusions

Techniques for discovering, organizing, and presenting social media from real-world events

Event classifiers

Important to capture features of Twitter-specific events in order to reveal the real-world events

Effectively surfaced real-world events in an unsupervised setting

Content selection

Similarity to centroid technique better at selecting event content

There is relevant and useful event content on Twitter!

Page 89: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Learning similarity metrics

for event identification on

Flickr [Becker et al. WSDM’10]

Surfacing events on

Twitter

Identifying Twitter content

for planned events

Connecting events across

sites (e.g., YouTube,

Picasa)

Page 90: Surfacing Real-World Event Content on Twitter

Learning Similarity Metrics for Event

Identification in Social Media (WSDM ’10)

Ctitle

Ctags

Ctime

Page 91: Surfacing Real-World Event Content on Twitter

Combine

similarities

Learning Similarity Metrics for Event

Identification in Social Media (WSDM ’10)

Wtitle

Wtags

Wtime

f(C,W)

Ctitle

Ctags

Ctime

Learned in a

training step

Page 92: Surfacing Real-World Event Content on Twitter

Combine

similarities

Learning Similarity Metrics for Event

Identification in Social Media (WSDM ’10)

Wtitle

Wtags

Wtime

f(C,W)

Ctitle

Ctags

Ctime

Final

clustering

solution

Learned in a

training step

Page 93: Surfacing Real-World Event Content on Twitter

Identifying Tweets for Known Events

Page 94: Surfacing Real-World Event Content on Twitter

Identifying Tweets for Known Events

Page 95: Surfacing Real-World Event Content on Twitter

Identifying Events in Social Media

Timeliness

Cont

ent

Dis

cove

ry

Real-time Retrospective

Kno

wn

Unk

now

n

Learning similarity metrics

for event identification on

Flickr [Becker et al. WSDM’10]

Surfacing events on

Twitter

Identifying Twitter content

for planned events

Connecting events across

sites (e.g., YouTube,

Picasa)

Page 96: Surfacing Real-World Event Content on Twitter

Thank you!

Pablo Barrio

David Elson

Dan Iter

Yves Petinot

Sara Rosenthal

Gonçalo Simões

Matt Solomon

Kapil Thadani