Регина Барзилай "Извлечение информации из...
DESCRIPTION
31 января, семинар "День MIT в Яндексе"Регина Барзилай "Извлечение информации из социальных медиа"- Методы машинного обучения в применении к извлечению информации из сетевого пользовательского контента.- Рассмотрение набора задач, связанных с извлечением информации, таких как анализ рецензий по составляющим и создание базы событий по твитам.- Автоматическое построение контентной структуры документа на основе большого потока пользовательского контента с сильным шумом.- Автоматическая агрегация содержимого рецензий и извлечении событий из потока сообщений в твиттере.TRANSCRIPT
Informa(on Extrac(on for Social Media
Regina Barzilay
Chris(na Sauper, Aria Haghighi, and Ted Benson
1
Selec(ng a Hotel
2
Selec(ng a Hotel
3
Selec(ng a Hotel
4
User-‐generated Content
• Large amounts of user-‐generated content
• Increasingly important in decision making • Time-‐consuming to read it all
NLP can help! 5
The Power of Word Counts
Simple sta(s(cal models are effec(ve for many informa(on extrac(on tasks
• Bag-‐of-‐words approaches for classifica(on
stocks
trading financial bank cloudy
cold
plants
storm
6
The Power of Word Counts
Simple sta(s(cal models are effec(ve for many Informa(on Extrac(on tasks • Sequence labeling for seman(c role labeling
the earthquake injured three people
EVENT CASUALTIES CASUALTIES
7
NONE NONE
The Power of Word Counts
8
The Power of Word Counts
9
Every 'me I fire a linguist, the performance of the speech recognizer goes up (F. Jelinek)
Beyond Wall Street Journal
Moving from formal text…
…to social media
10
• Model document structure as part of the extrac(on process
• Exploit large amount of raw data to supplement annota(ons
Our Approach
11
So so so good! One of my favorite restaurants in Boston! I have to take off a single star, because there are a couple dishes I didn't enjoy, but if you go and order well, this is a 6 star experience! You start with bread and the most delicious olive oil. It has a strong olive taste, I had to forcefully stop myself from finishing our basket. My sugges(on is to order family style, and based on the number of people you have, order main entrees for half your party size and 3-‐4 small plates for the other half. (ex. Party of 4 = 2 entrees and 6-‐8 apps) The entrees really are to die for, every one I've had has been delicious. Such a great combo of flavors and spices, they some seriously ar(s(c crea(ons! The best small plates are the Sultans Delight (fall apart lamb and unreal baba), the spiced carrots (seems simple, but they are amazing!), the falafel....ok, there are a bunch that are great, but those are some favorites! I would skip the deviled eggs, mul(ple people talked them up to me before I went, really not that exci(ng. The chick peas in vermicelli, also not great, in fact I really did not like it with the fake orange taste. It reminded me of one of those chocolate oranges you can break apart. The black eyed pea soup was nothing special. For dessert, please get the baked Alaska, it was unbelievable!
A new favorite! #1. The hosts seated us almost immediately (even though we didn't have a reserva(on on a Friday night). #2. The food was amazing (N.B. the bread in the basket goes really well with the warmed, spiced olive oil from one of our meze plates), We had the olives w/ za'atar (a lot for two people, $5), quail kebabs (delicious, tender, spicy, 2 for $13), monkfish curry (yummy, $26) and a tea-‐ser dessert with sour cherries that tastes a lot like vermicelli/milk dessert (12). #3. The service was excellent-‐-‐perfectly (med plates, etc. #4. Ana Sortun was there! She's awesome and she was hands-‐on-‐-‐adding things to dishes, etc. I would go back-‐-‐I s(ll like the hummus at Sofra and wasn't sure that Oleana would be more impressive-‐-‐but it was excellent. Great place for a special occasion meal or a night out with tourists.
The suspense factor as each surprise dish was delivered added to the whole experience, and the (ming was flawless-‐-‐we'd finish a dish, have ample (me to enjoy it, a few minutes for some sips of wine and conversa(on, and just as there was a breath, the next dish would arrive. The whole meal felt like a well-‐orchestrated performance rather than just a meal. As for the food-‐-‐there's not enough to be said. I realize this is a lille cliched, but honestly, I haven't tasted food like this since living in the bay area, with crea(ve combina(ons and well-‐designed dishes and contrasts that surprise the palele and are a delight to eat; not simply good, but joy-‐inducing. The highlight of the night was a dish of crab cakes with asparagus-‐-‐they had a small poached quail egg on each cake, combined with a lemon flavor from a lille juice and zest it almost made a meringue that was incredible. This is without a doubt the best single dish I have had at any restaurant in Boston, and I'd go back to Oleana just for this. Or any other of the dishes we had that night, frankly. Some were beler than others, but each was unique and in some way surprising and fun. Absolutely fantas(c experience all around, and so far the best overall restaurant experience I've had in the Boston area. These people understand that a meal is more than just about the food, it's about the service, the wine, the scenery, and on top of all that, the flavors and combina(ons of culinary delights. Oleana turned an otherwise ordinary night into an experience I won't forget, and I can't wait to return.
I wandered in here on a whim with a friend a while back, completely underdressed but on the lookout for a good meal. When we went around the side to the entrance, someone called to us from the roof-‐-‐"watch out, there's glass on the floor, I dropped a light bulb. Can you go inside and grab someone for me and ask them to bring a screwdriver?" Sure, no problem-‐-‐so I went inside and told the hostess, "Hey your maintenance guy needs a screwdriver up on the roof." She laughed, "Oh... that's not a maintenance guy, that's the owner." And from that moment I knew this place was special. It's a good sign when the owner of a restaurant is up on the roof changing lightbulbs-‐-‐it's clear what kind of care and alen(on goes into every detail. If only I had known then how that would translate to the en(re experience, and especially the food. Once out on the pa(o, we waited for only a few minutes by the fountain before being seated. I had a glass of wine and was just enjoying the pa(o, the bread, and the otherworldly feel of the place. You can't be stressed out here; it's designed perfectly to be almost a Shangri-‐La of spaces, and all in the middle of Inman square. Unexpected. Impeccable. At this point we were so confident in the holis(c quality of the restaurant that we decided to trust the chef and go all out. We ordered a tas(ng menu, and two other mezos/appe(zers that sounded good from the specials menu. If you're here, I *highly* recommend this. You probably won't end up spending any more than if you had ordered individually, but you'll taste some incredible things you might not have thought to get.
Oleana serves inspired, well prepared food from the best possible ingredients. The menu is well priced for the quality. The wine list is very food friendly, includes many organic and bio-‐dynamic wines, and is also reasonably priced. It's a great place for vegetarians. A vegetarian tas(ng menu was available the night I went -‐ it was superb and plen(ful (I could not finish it). The service is friendly, prompt, and helpful. The space is relaxing and casually elegant. I was there on a Tuesday evening when two men were quietly playing lovely world music. As I was visi(ng the Boston area with my family, we brought along our children (8&10). While I would not call Oleana family-‐friendly, they were accommoda(ng of our children. I should note one of my children is a highly selec(ve eater, but they are both used to ea(ng in high-‐end restaurants and a very well behaved in nice restaurants (or so we are told). Given that I live in the San Francisco Bay Area, I'm spoiled by excellent vegetarian friendly restaurants. Now I have a spot to return to in the Boston area that meets expecta(ons.
This is a fantas(c restaurant in Cambridge. The decor, music and smells will make you feel like you are in another world. I was a lille skep(cal when I heard Turkish Food, but those feelings were quickly squashed when I had the food. We started with the Fried Mussels with Hot Peppers and Turkish Tarator Sauce. The Mussels were fried to perfec(on and the baler was very light. I could have eaten a thousand of them, but I love fried food. The Vermont Quail was very tasty as well. The quail was very tender, which is hard to do because those lille guys are so small. The last starter was the Sultans Delight. The Tamarind Glazed Beef was so tender. The Smokey Eggplant Puree went well with the dish and was even beler slathered on the bread. We shared an entree for the evening. I highly suggest the Azuluna Pork, Crispy Pea Paella, Fried Fiddleheads and Paprika Sauce. The pork was just as tender as the beef and the accompaniments went so well with the meat. The meal was very flavorful and seasoned to perfec(on. I wish I could have golen dessert, but I was so full. I did see them bring some out and they looked wonderful and decadent. I had the Sangria to drink, it was refreshing because it was humid outside, but it was not the best in the world. I think I might try something new if I ever go back again. The servers are great, so nice and knowledgeable about the menu. It really means a lot to me to see someone get excited about a menu.
Delicious. Delighsul. Worth it. I've eaten at Oleana several (mes and the food is always very good. The service is typically really solid -‐ I've had some dinners where the waitstaff was really top-‐notch and other (mes when it's been good but not spectacular. Either way the hummus and falafel small plates (meze) are just SO tasty. Definitely a great place to celebrate a special event or just when you need a par(cular pick-‐me-‐up. Not the fussiest or the fanciest food (this is a compliment from me!) nor the most elegant ambiance (wish it was a lille more quiet),, but clearly a very special place for a great meal! Plus you NEED to get the Baked Alaska (YUM). It should be a requirement for going to the place. (I would give the food 5 stars; the overall ambiance -‐ read: it can be prely loud -‐ and waitstaff variability knocked the overall experience to a 4).
Had dinner here on Friday night and it was superb!! Cute ambiance...great for in(mate dinner or date place. Dim ligh(ng, nice decor, not too loud. Service was excellent as well as all the recommenda(ons. Started with the Moroccan-‐style Octopus and Fatoush. Tasty, light, unique flavors, great presenta(on. Everything was quickly eaten up with smiles. For entrees, had the Beef Kabob special (which was the hit!, beef with delicious flavors and cooked perfectly med-‐rare tender), Cod, and Lamb. Everything was tasty yet light, with delicate complimen(ng flavors. Dessert was the winner-‐Passion fruit Bisteeya....goodness what is this?? IT was to DIE for and a perfect ending to the meal. I think we literally ate this in 2 seconds and contemplated ordering a second. It's light, tart, fluffy, creamy, and thirst quenching all at the same (me. Needless to say Oleana became a favorite in one evening...the food is just very unique. I love have great flavors without feeling like I gained 10 pounds ea(ng a wonderful dinner. I'll definitely be back soon! 12 hlp://condensr.com
Mo(va(ng Example Aspect Snippets
atmosphere “stylish decor” “awesome art”
food “loved it!”
“tasty calzones!”
service “fast and friendly” “impatient waiters”
Importance of Context:
Ordered chicken parm and loved it! Friend had the veal. The service was ...
... by local ar(sts.
food
{ 13
Mul(-‐Aspect Summariza(on
Sequence Labeling Task I ordered lunch from them the other day and I was [FOOD pleasantly surprised]. Our waiter dazzled me with his blue eyes and genuine smile, and all the waiters were [SERVICE extremely professional and efficient].
Content Topic Model
I ordered lunch from them the other day and I was pleasantly surprised. Our waiter dazzled me with his blue eyes and genuine smile, and all the waiters were extremely professional and efficient.
14
The Big Disconnect
-‐ Topic Models -‐ Rhetorical Structure Analysis
Discourse Modeling
-‐ Informa(on Extrac(on -‐ Sen(ment Analysis
Analysis Applica(ons
15
Approach Overview
Task Labels: Observed
Task Labels: Observed
I had the shrimp salad and was [FOOD pleasantly surprised]. The [ATMOSPHERE decor was tasteful] and staff was [SERVICE extremely professional and efficient].
words
labels
16
Approach Overview
Task Labels: Observed
Goal: Analysis applica(ons sensi(ve to document structure
Task Labels: Observed
Content Labels: Latent
17
Approach Overview
• Jointly learn structure and task parameters – Topics are latent variables shaped by task
• Principled way to incorporate unlabeled data – More unlabeled data, beler performance
18
Factoriza(on {
Product over sentences
Bag-‐of-‐words
{
CRF
{ { Topic Trans.
19
Mul(-‐Aspect Summariza(on Content Model: Sentence-‐Level HMM
... chicken, parm, ordered, loved, ... { }
Task: Token-‐Level condi(onal random field
Ordered chicken parm and loved it
20
Augmen(ng CRF with Topics
...
... Add context features
topic 3
21
Joint Learning Objec(ve {
Content and task params.
22
Joint Learning E-‐Step:
Can be computed using Forward-‐Backward algorithm
23
Joint Learning M-‐Step:
For : Standard normaliza(on of T counts from E-‐Step. For :
weighted condi(onal likelihood objec(ve
24
Supervised Objec(ve
{ Labeled data for content and task parameters
25
Semi-‐Supervised Objec(ve
{ Labeled data for content and task
parameters
{
Unlabeled data for content parameters
26
Data set
• Amazon TV reviews – Train: 35 reviews – Test: 24 reviews – Unlabeled: 12,600 reviews
• Yelp restaurant reviews – Train: 48 reviews – Test: 48 reviews – Unlabeled: 33,000 reviews
27
Informa(on Extrac(on Goal: Extract phrases from review text in pre-‐specified categories Input: User-‐generated review text, labeled training data Output: Labeled phrases in each category
28
I came here with my husband for the tas(ng menu, and we were not disappointed. We got to sit at the chef’s table, which overlooked the kitchen. The service was polite and knowledgeable, the atmosphere was elegant and energePc and the food was wonderfully creaPve and delicious.
FOOD SERVICE ATMOSPHERE PRICE OVERALL
Systems
• NoCM: Just the CRF, no content model
• IndepCM: Es(mate content model parameters first, then use them in the CRF.
• JointCM: Es(mate content and CRF parameters jointly using EM
29
Results
Token F-‐measure Evalua(on
30
Impact of Unlabeled Data
41,5
47.3 47.8
38
44
50
0 6 300 12 600
Number of Unlabeled Reviews
Setup: Using the Amazon corpus, fix the amount of labeled data, vary the amount of unlabeled data
31
Mul(-‐Aspect Sen(ment Ranking Task: Predict sen(ment (1-‐10) for each aspect
Approach:
• Same objec(ve as summariza(on
• Different E-‐ and M-‐Steps [See paper]
Aspect Rating
picture 9.0
audio 9.5
extra 7.0
32
L2 Error: Lower is beler
Mul(-‐Aspect Sen(ment Ranking DVD Review Domain
33
Paper & Code
• Paper: hlp://groups.csail.mit.edu/rbg/code/content_structure/sauper-‐emnlp-‐10.pdf
• Code: hlp://groups.csail.mit.edu/rbg/code/content_structure/code.tar.gz
• Data: hlp://groups.csail.mit.edu/rbg/code/content_structure/data.tgz
34
The fried oysters were very good
The casish tasted dry and bland and boring
The star of the plate was the grits
The gnocchi with mushrooms was outstanding
The casish approaches perfec(on
The shrimp and grits are nothing less than spectacular
+
─
+
+
+
+
« « « « «
#1
#2
Agree to Disagree
35
Review Aggrega(on • Hundreds of reviews for each product • Opinions vary widely
→ Need to aggregate sta(s(cs • Histograms show sen(ment distribu(on, but it’s not enough
36
Aspect-‐based Analysis
Prior work: Use a set of predefined domain-‐specific product aspects (e.g., Snyder and Barzilay 2007)
→ Coarse level analysis
37
Informa(ve Aggrega(on
Useful informa(on: – What’s the best dish at this restaurant?
– What do people dislike about this restaurant?
– Which dishes do people disagree about?
38
We had a great Pme last night at this restaurant. T h e s u s h i w a s s o incredibly fresh. We had a bad experience at the ba r , t hough . My chocolate marPni was absolutely terrible. We will be back, but we’ll skip the drinks.
Wow, I can’t believe how much this place has changed! They used to be mediocre, but now they never fail to amaze. We started off at the bar with awesome sake bombs. When we got to our table, the sushi was fantasPc.
I have such mixed things t o s a y a b o u t t h i s restaurant. On one hand, their sushi is unquesPonably the best in the city. On the other, the atmosphere isn’t that great. Plus, their drinks are completely watered down.
Aggrega(on of product-‐specific aspects
Informa(ve Aggrega(on
We had a great Pme last night at this restaurant. T h e s u s h i w a s s o incredibly fresh. We had a bad experience at the ba r , t hough . My chocolate marPni was absolutely terrible. We will be back, but we’ll skip the drinks.
Wow, I can’t believe how much this place has changed! They used to be mediocre, but now they never fail to amaze. We started off at the bar with awesome sake bombs. When we got to our table, the sushi was fantasPc.
I have such mixed things t o s a y a b o u t t h i s restaurant. On one hand, their sushi is unquesPonably the best in the city. On the other, the atmosphere isn’t that great. Plus, their drinks are completely watered down.
Sushi
Chicken
100% posiPve
33% posiPve
Japanese Restaurant
Relevant aspects User sen(ment
39
Corpus-‐driven Aspect Defini(on
Define aspects dynamically based on reviews
We had a great Pme l a s t n i gh t a t th i s restaurant. The sushi was so incredibly fresh. W e h a d a b a d experience at the bar, though. My chocolate marPni was absolutely terrible. We will be back, but we’ll skip the drinks.
Wow, I can’t believe how much this place has changed! They used to be mediocre, but now they never fail to amaze. We started off at the bar with awesome sake bombs. When we got to our table, the sushi was fantasPc.
I have such mixed things to say about this restaurant. On one hand, their sushi is unquesPonably the best in the city. On the other, the atmosphere isn’t that great. Plus, t h e i r d r i n k s a r e completely watered down.
Bakery
-‐ Cookies -‐ Cakes -‐ Pies
We had a great Pme l a s t n i gh t a t th i s restaurant. The sushi was so incredibly fresh. W e h a d a b a d experience at the bar, though. My chocolate marPni was absolutely terrible. We will be back, but we’ll skip the drinks.
Wow, I can’t believe how much this place has changed! They used to be mediocre, but now they never fail to amaze. We started off at the bar with awesome sake bombs. When we got to our table, the sushi was fantasPc.
I have such mixed things to say about this restaurant. On one hand, their sushi is unquesPonably the best in the city. On the other, the atmosphere isn’t that great. Plus, t h e i r d r i n k s a r e completely watered down.
Japanese Restaurant
-‐ Sushi -‐ Sake -‐ Dessert
→ Aspects specific to each product
40
Corpus-‐driven Aspect Defini(on
Allows comparison across mul(ple reviews
– Consensus (both posi(ve and nega(ve) What’s the best/worst aspect of this product?
I buy all of my baked g o o d s f r om t h i s bakery. Their bread is so delicious! It’s also good for all kinds of baked goods. They also have some truly beauPful cakes on display. Even their cookies are great!
I picked up a birthday cake for my son here yesterday. It was the most amazing cake I’ve ever seen! The de co r aPon s we r e outstanding, and all the kids loved the chocolate icing. I’ll definitely come back!
This place is nice for some baked goods, but some things are really nasty. The loaf of bread I bought was stale! They were happy to take it back and give me another, but I’ll be watching next Pme.
Bakery
…truly beauPful cakes on display. …most amazing cake I’ve ever seen!
41
Corpus-‐driven Aspect Defini(on
Allows comparison across mul(ple reviews
– Consensus (both posi(ve and nega(ve) What’s the best/worst aspect of this product?
– Conflicts of opinion What aspects do people disagree about?
I buy all of my baked g o o d s f r om t h i s bakery. Their bread is so delicious! It’s also good for all kinds of baked goods. They also have some truly beauPful cakes on display. Even their cookies are great!
I picked up a birthday cake for my son here yesterday. It was the most amazing cake I’ve ever seen! The de co r aPon s we r e outstanding, and all the kids loved the chocolate icing. I’ll definitely come back!
This place is nice for some baked goods, but some things are really nasty. The loaf of bread I bought was stale! They were happy to take it back and give me another, but I’ll be watching next Pme.
Bakery
Their bread is so delicious! The loaf of bread I bought was stale!
42
Task: Input Input:
– Food-‐related snippets from restaurant reviews • Concise descrip(on of a user’s opinion
– Automa(cally extracted from full review text (Sauper et al. 2010)
– Segmented by restaurant, but no addi(onal annota(on
the sushi was so incredibly fresh best chicken katsu in town drinks are fun, fresh, and delicious
I’d recommend the apple pie the bread was disappoinPngly stale chocolate torte is the stuff of dreams
43
Japanese Restaurant Bakery
We went to the restaurant, and the sushi was incredibly fresh.
Task: Output Output:
– Relevant aspects for each restaurant – Aspect label for each snippet – Sen(ment label for each snippet
44
+ they had a decent burrito − the burrito was mediocre at best − the burrito was heavily cilantroed
+ the salsa is incredible + the mango salsa is perfectly diced + hola free chips & salsa
Burrito Salsa Mexican Restaurant
Possible Solu(on
Use clustering based on lexical similarity
Problem: Clusters and aspects are not aligned!
the marPnis were very good the marPnis were tasty
the wine list was pricey their wine selec(on is horrible
the sushi was the best I’d ever had best paella I’d ever had
the fillet was the best steak we’d ever had it’s the best soup I’ve ever had
ParPal output of state-‐of-‐the-‐art clustering system
45
Our Solu(on
• Jointly model aspect and sen(ment
• Leverage data to dis(nguish sen(ment and aspect
46
Bakery Japanese
Review 1
Review 2
Review 3
delicious fresh
fantas(c amazing
beau(ful stale
fantas(c smooth
beau(ful fresh
delicious bland
pies cookies
cakes pies
cakes bread
salmon sake
maki salmon
maki miso
Model: Overview
• Each snippet has an aspect and a sen(ment • Each word is drawn from a topic distribu(on: – Aspects are specific to a single product
– Sen(ment is global across all products
– Background distribu(on is global
• Transi(on distribu(on encodes word topic transi(ons
great horrible amazing
dessert pizza pad thai
our was food
47
They had wonderful appePzers.
Model: Genera(ve Story
1. Global distribu(ons
2. Restaurant-‐level distribu(ons
3. Snippet-‐level latent structure
4. Words
48
Model: Genera(ve Story
B
Background distribu(on
Sen(ment distribu(ons
+ -‐
Globally, a. Background distribu(on
word distribu(on for stop words and in-‐domain white noise
b. Sen(ment distribu(ons , word distribu(ons over posi(ve and nega(ve sen(ment words small bias for seed words
c. Transi(on distribu(on first-‐order Markov distribu(on of word topic transi(ons
Λ
Transi(on distribu(on
49
Model: Genera(ve Story
For each restaurant , a. Aspect distribu(ons
word distribu(on for each aspect
b. Aspect-‐sen(ment binomials probability of posi(ve vs. nega(ve sen(ment for each aspect
c. Aspect mul(nomial probability of each aspect
Aspect distribu(ons
1 …2 K ψ
Aspect mul(nomial Aspect-‐sen(ment binomials
…φ1 φ2 φK
50
Model: Genera(ve Story
For each snippet from restaurant , a. Aspect
chosen from aspect mul(nomial
b. Sen(ment chosen from aspect-‐sen(ment binomial
c. Sequence of word topics Background, Aspect, or Sen(ment selected from transi(on distribu(on
2 ψAspect
φ2 + Sen(ment
Word topic sequence
B B A S S Λ 51
Model: Genera(ve Story
For each word , a. Word
chosen from topic-‐specific distribu(on based on word topic sequence
2
+
Aspect
Sen(ment
Word topic sequence
B B A S S Background
B B B A S S
The pizza was really great
2
+
B
52
Standard Varia(onal Inference
• Desired posterior:
Model parameters
Observed data
Latent structure
53
Standard Varia(onal Inference
• Desired posterior:
• Op(mizing directly is intractable • Instead, op(mize varia(onal objec(ve with mean-‐field factoriza(on:
54
s.t. factorizes
Data Set
Food-‐related snippets from Yelp restaurant reviews (Sauper et al. 2010)
– 13,879 total snippets – 328 restaurants – 42.1 snippets per restaurant (high variance) – 7.8 words per snippet
Seed words for sen(ment distribu(ons – 42 posi(ve, 33 nega(ve – Relevant to domain (e.g., “delicious”)
55
Experiments: Aspect Clustering
• Gold standard – Clusters over 3,250 snippets – Collected via Mechanical Turk
• Baseline – CLUTO clustering weighted by TF*IDF
• MUC cluster evalua(on metric – Based on number of cluster merges and splits required to achieve gold data
• Both systems allowed 10 clusters per restaurant
56
Experiments: Aspect Clustering
69,3
75,5
60
70
80
Baseline Our model
MUC F1
the marPnis are very good the marPni selec(on looked delicious the s’mores marPni sounded excellent
Our model
the marPnis are very good the mozzarella was very fresh
the fish and various meets were well made
Baseline Baseline
the carrot cake was delicious it was rich, creamy, and delicious
the pasta bolognese was rich and robust
Our model
the carrot cake was delicious the best carrot cake I’ve ever eaten carrot cake was deliciously moist
57
Error Analysis Number of sen(ment and aspect errors approximately equal
58
Aspect errors − Similar aspect words in different
contexts
Sen(ment errors − Rare sen(ment words
− Nega(on, some(mes
the cream cheese was n’t bad
belgian frites are very crave-‐able the blackened chicken was meh chicken enchiladas are yummy
the cream cheese wasn’t bad ice cream was just delicious
Paper & Code
• Paper hlp://groups.csail.mit.edu/rbg/code/content_a�tude/sauper-‐acl-‐11.pdf
• Code hlp://groups.csail.mit.edu/rbg/code/content_a�tude/code.tar.gz
59
The Task
• Goal: Automa(c construc(on of even records from Twiler
• Input: Stream of Twiler messages
• Output: Table of event records
Seated at @carnegiehall wai'ng for @CraigyFerg’s show
@DJPaulyD absolutely killed it at Terminal 5 last night.
Craig, nice seeing you #noelnight this weekend @becksdavis!
Ar#st Venue Craig Ferguson Carnegie Hall DJ Pauly D Terminal 5
60
B.B. King Blues Club
Sunday Gospel Brunch
Highline BallroomJ. Cole
Beacon TheaterHall & Oates
Jeff Tweedy Bowery Ballroom
Best Buy TheaterJim Gaffigan
Amos Lee Bardavon Opera House
Artist Venue
Example Output
61
IE for Social Media: Challenges
• Messages are short ⇒ Individual message may not contain all event fields.
• Message are expressed in colloquial language ⇒ Mapping between messages and event record is not
obvious
Ar(st: Craig Ferguson Venue: Carnegie Hall
Seated at @carnegiehall wai(ng for @CraigyFerg’s show
RT @leerader : ge�ng REALLY stoked for #CraigyAtCarnegie sat night.
62
IE for Social Media: Opportunity
Significant redundancy in Twiler stream:
Approach: Drive event extrac(on by modeling agreement in message stream.
Seated at @carnegiehall wai'ng for @CraigyFerg’s show
@DJPaulyD absolutely killed it at Terminal 5 last night.
Craig, nice seeing you #noelnight this weekend @becksdavis!
63
Model Func(onality • Message level analysis: Tag words in message with
event-‐field labels.
@YonderMountain rocking Mercury Lounge
ar'st none venue venue
Message (x)
Label (y)
64
Model Func(onality • Message level analysis: Tag words in message with
event-‐field labels.
• Message clustering: Group messages based on events.
• Event records: Induce canonical value for each field.
Ar#st Venue Craig Ferguson Carnegie Hall
ArPst Venue Radiohead Coliseum
Craig Ferguson, what a riot! Carnegie is in s'tches
Record (R) Alignment #CraigAtCarnie is star'ng now! #iamsoexcited
Going to see Radiohead at the Coliseum tonight!
Pumped for R A D I O H E A D !!!
(A)
65
Model Overview
Source of supervision: Example event records -‐ Alignment between records and messages not observed. -‐ Message level field annota(ons not observed.
July 16, 5:30pm at American Folk Art Museum
Jun 17, 8:00 PM at Izod Center
Jun 17, 8:00 PM at Tarrytown Music Hall
66
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
Model Overview • (y) Message level analysis
• (A) Message clustering
• (R) Event records
Learn jointly in factor graph model
P (R,A, y|x) �
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
Sequence Labeling
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
Record Uniqueness
Term Popularity
Record Consistency
( ) ( ) ( ) ( )
67
Sequence Labeling Factor
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
@YonderMountain rocking Mercury Lounge
ar'st none venue venue
• Similar to chain CRF
• Features on token and label – Wikipedia match, context, etc.
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
IsWikipediaMatch word+1=“rocking”
IsUserMention ….
68
Ar#st Venue Dave MaWhews Band Slims
Term Popularity Factor
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
Dave MaWhews at Slims
venue venue ar'st ar'st
�POP (x, y,R�A = v) =
X
j
max
kSim(xj , yj , vk)
• Match each labeled
message token to best
record value token
• Token matching
is IDF-‐weighted
69
Record Uniqueness Factor
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a�UNQ(R
�) =Y
k 6=k0
�UNQ(R�k, R
�k0)
�UNQ(R�k, R
�k0) = exp{�Sim(R�
k, R0�k )}
• Discourage similar record values
Ar#st Yonder Mountain Band
ArPst Yonder Mountain
70
Record Consistency Factor
XiYi
�SEQ
R�k
R�k+1
R�k�1
�UNQ �th field(across records)
�
�POP
R�k
Ai
Yi
Xi
Ai
Yi
Xi
R�k R�+1
k
�CON
k
kth record
Figure 3: Factor graph representation of our model. Circles represent variables and squares represent factors. Forreadability, we depict the graph broken out as a set of templates; the full graph is the combination of these factortemplates applied to each variable. See Section 4 for further details.
over pairwise cliques:
⇥SEQ(x, y) = exp{�TSEQfSEQ(x, y)}
= exp
�⇧
⇤�TSEQ
⌥
j
fSEQ(x, yj , yj+1)
⇥⌃
⌅
This factor is meant to encode the typical messagecontexts in which fields are evoked (e.g. going to seeX tonight). Many of the features characterize howlikely a given token label, such as ARTIST, is for agiven position in the message sequence conditioningarbitrarily on message text context.
The feature function fSEQ(x, y) for this compo-nent encodes each token’s identity; word shape2;whether that token matches a set of regular expres-sions encoding common emoticons, time references,and venue types; and whether the token matches abag of words observed in artist names (scraped fromWikipedia; 21,475 distinct tokens from 22,833 dis-tinct names) or a bag of words observed in NewYork City venue names (scraped from NYC.com;304 distinct tokens from 169 distinct names).3 Theonly edge feature is label-to-label.
4.2 Record Uniqueness FactorOne challenge with Twitter is the so-called echochamber effect: when a topic becomes popular, or“trends,” it quickly dominates the conversation on-line. As a result some events may have only a fewreferent messages while other more popular eventsmay have thousands or more. In such a circum-stance, the messages for a popular event may collectto form multiple identical record clusters. Since we
2e.g.: xxx, XXX, Xxx, or other3These are just features, not a filter; we are free to extract
any artist or venue regardless of their inclusion in this list.
fix the number of records learned, such behavior in-hibits the discovery of less talked-about events. In-stead, we would rather have just two records: onewith two aligned messages and another with thou-sands. To encourage this outcome, we introduce apotential that rewards fields for being unique acrossrecords.
The uniqueness potential ⇥UNQ(R�) encodes thepreference that each of the values R�, . . . , R�
K foreach field ⌃ do not overlap textually. This factor fac-torizes over pairs of records:
⇥UNQ(R�) =�
k �=k0
⇥UNQ(R�k, R
�k0)
where R�k and R�
k0 are the values of field ⌃ for tworecords Rk and Rk0 . The potential over this pair ofvalues is given by:
⇥UNQ(R�k, R
�k0) = exp{��T
SIMfSIM (R�k, R
�k0)}
where fSIM is computes the likeness of the two val-ues at the token level:
fSIM (R�k, R
�k0) =
|R�k ⇥R�
k0 |max(|R�
k|, |R�k0 |)
This uniqueness potential does not encode anypreference for record values; it simply encourageseach field ⌃ to be distinct across records.
4.3 Term Popularity FactorThe term popularity factor ⇥POP is the first of twofactors that guide the clustering of messages. Be-cause speech on Twitter is colloquial, we would likethese clusters to be amenable to many variations ofthe canonical record properties that are ultimatelylearned. The ⇥POP factor accomplishes this by rep-resenting a lenient compatibility score between a
• Encourage all record values to be in single message
• Ac(ve when there is some match for all record fields
�CON (x, y,RA) =
I[�POP (x, y,R�A) > 0, 8⇥]
Ar#st Venue Dave MaWhews Band Slims
Dave MaWhews at Slims
venue venue ar'st ar'st
71
Inference • Varia(onal mean-‐field inference to approximate posterior
P (R,A,y|x) � Q(R,A,y)
=
KY
k=1
Y
�
q(R�k)
! nY
i=1
q(Ai)q(yi)
!
72
Experiments: Dataset
Twiler data: Three weekends of filtered messages: • Authors from New York, • Concert related messages (MIRA based classifier)
Resul(ng dataset: 5,800 messages • Training – 2,184 messages (one weekend) • Test – 3,662 messages (two weekends)
Gold event records: • New York city events from NYC.com • 11 events in training, 31 events in test.
73
Experiment: Baselines
Vo(ng methodology of Mann and Yarowsky (2005): • Aggregate output of baseline IE predic(ons of each message. • Select top K events based on number of votes
Baseline IE predictors. • List baseline: String overlap with given list of ar(sts and venues (Wikipedia) • CRF Vo(ng baseline: Extract record for each labeled pair of fields • CRF Low-‐Threshold: CRF vo(ng but extract records with lower extrac(on threshold
74
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
10 20 30 40 50
Precision
(Manual Evelua(
on)
Number of Records Kept
Low Thresh CRF List Our Work Our Work + Con
Precision
75
0,2
0,25
0,3
0,35
0,4
0,45
0,5
0,55
0,6
0,65
0,7
1,00 1,5 2 2,5 3 3,5 4 4,5 5
Recall against G
old Even
t Records
k, as a mul(ple of the number of gold records
Low Thresh CRF List Our Work
Recall
76
Paper & Code
• Paper hlp://people.csail.mit.edu/regina/my_papers/twiler_acl2011.pdf
• Code hlp://groups.csail.mit.edu/rbg/code/twiler
77
Conclusion
• Social media presents unique challenges and opportuni(es for NLP technologies
• Linguis(cally-‐rich models can compensate for noise inherent in social media streams
• Joint modeling of rich linguis(c rela(ons boosts predic(on accuracy
78