introduction to natural language processing and speech computer science research practicum fall 2012...

Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg

Artificial Intelligence AI is no longer a single subdiscipline in computer science Natural Language Processing Speech/Spoken Language Processing Robotics Logic/Planning Cognitive Radio Machine Learning 1

Artificial Intelligence What is intelligence? How does computer science make intelligent tools, systems, algorithms? Does computer science theory contribute to the definition of intelligence? 2

Language and Speech What is the relationship between language and intelligence/thought/cognition? 3

Language and Speech Most people consider language to be the most direct access to cognition and thought. Language is core to Artificial Intelligence 4

Natural Language Processing Information Retrieval (search) Information Extraction Knowledge Base Population Summarization Question Answering Named Entity Recognition Named Entity Linking, Co-reference resolution Parsing Sentiment Analysis 5

Information Retrieval Input: Query Output: Relevant Documents Simplest approach: Identify every document that contains the word or words in the query What about related words? run is related to running runs and marathon How do you rank for relevance? 6

Information Extraction Identify specific information from a single document or set of documents. Who works for what organization Who was born when? died when? Who did what to whom. This is *very* complex. Domain specific systems are developed How many different ways are there to say the same thing? 7

Named Entity Recognition and Linking Bo Obama is Fat. POTUS says so. The President called his dog fat. Mr. Obama, speaking to an interviewer said that The White House dog needs to go on a diet. Recognize that Bo Obama POTUS, The President Mr. Obama, The White House are all ENTITIES? How do you recognize that POTUS, The President, Mr. Obama, him all refer to the same person? 8

Parsing Understanding grammatical structure from text. Important step in some relation extraction, question answering, etc. 9

Sentiment Analysis Can you tell the difference between a positive review and a negative one? Some reviews come with labels Some labels have no reviews Some reviews have no stars 10

Spoken Language Processing Automatic Speech Recognition Rich Transcription Speaker Recognition Speech Synthesis Text Normalization Discourse and Dialog Turn taking Emotion Recognition 11

Speech Recognition Converting speech to text. Acoustic Modeling Speech to Phoneme Pronunciation Modeling How are words pronounced? Language Modeling What sequences of words are most common? 12

13 Rich Transcription ALSO FROM NORTH STATION I THINK THE ORANGE LINE RUNS BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE MAP IS REALLY OBVIOUS ABOUT THIS BUT INSTEAD OF TRANSFERRING AT PARK STREET YOU CAN TRANSFER AT UH WHATS THE STATION NAME DOWNTOWN CROSSING UM AND THATLL GET YOU BACK TO THE RED LINE JUST AS EASILY

14 Rich Transcription Also, from the North Station... (I think the Orange Line runs by there too so you can also catch the Orange Line... ) And then instead of transferring (um I- you know, the map is really obvious about this but) Instead of transferring at Park Street, you can transfer at (uh whats the station name) Downtown Crossing and (um) thatll get you back to the Red Line just as easily.

Speaker/Author Recognition What makes one speaker or author distinguishable from another? Email hacks, Chat transcripts, Anonymous authors. What are the acoustics which distinguish across two speakers? Spectral Qualities Prosodic Qualities Lexical, syntactic and content usage 15

Speech Synthesis Generating Speech from Text There are tools like Festival, HTS and Mary TTS that make this relatively easy Unit Selection Use a corpus of a single speaker and paste together small slices of speech to make new words Watson http://www.youtube.com/watch?v=WFR3lOm_xhEhttp://www.youtube.com/watch?v=WFR3lOm_xhE Parametric Synthesis Learn the spectral shape of different speech sounds, and synthesize them from oscillators and additive noise. Mary TTS Web client http://mary.dfki.de:59125/ 16

Discourse and Dialog How do you accomplish some task through discourse? Understanding the semantics of a user turn Generating an appropriate prompt Dialog/Task planning. Semantic Frame filling. 17

Emotion Recognition What are the acoustic properties of emotion expression? Loudness, speaking rate, pitch, hesitation etc. This type of analysis can extend to other speaker states Intoxication Sleepiness Age Gender Personality Factors Deception 18 Three Hundred Twelve. Three Thousand Twelve.

Corpus Analysis A corpus is a body of linguistic material Corpora (plural of corpus) are generally shared across research groups Allow for reproducible findings Division of Labor Describing phenomena is an important first step in most research. What is the distribution of ratings? What are the correlations between features and labels? Are there errors in the annotation? 19

Some famous corpora Penn Treebank Parse trees and part of speech ACE and KBP Information Extraction Switchboard Conversational telephone speech TIMIT Phonetic Transcription Boston Radio News Corpus Prosodic Annotation 20

The standard approach 21 Identify labeled training data Decide what to label What is a data point? Extract features based on the entity Train a supervised classifier Machine Learning Evaluate Cross-validation or a held-out test set.

How does machine learning fit in? 22 Automatically identifying patterns in data Automatically making decisions based on data Hypothesis: Data Learning Algorithm Behavior Data Programmer or Expert Behavior

Challenges Conversational text Social Media: Facebook, Twitter, reddit Email Chat/IM Spoken Dialog Systems Text Dialog Systems Sentiment Analysis Reviews Collaborative Filtering Natural Language Generation 23

Publicly available web-data Social Media twitter, google plus, forums, etc. Reviews amazon, tripadvisor, etc. Wikipedia. Find missing links in wikipedia Find potentially incorrect information in wikipedia YouTube videos, soundcloud songs. Can you classify topics? Music genres? 24

Use of web technologies The feedback loop. The use of the tool provides information that can be used to improve the tool. The use of the product provides training data. Which search results are best. Which ads are useful Which recommendations are correct 25

Feedback in Google Rank the top hits in response to a query When someone clicks on a link, boost its ranking/relevan ce Same for ads UI/UX experimnets 26

Feedback in Amazon Try to give users an offer. If they take it increase its value. 27

Feedback in Netflix Suggestions for people like you How do you group people How do you group movies 28

Project ideas Look at the most recent conferences in NLP and Speech ICASSP, Interspeech, ASRU ACL, EMNLP, NAACL-HLT, CoLING Also, Journals Computational Linguistics Computer Speech and Language IEEE transactions on Audio Speech and Language Processing Consider real-world problems and applications 29