introduction to natural language processing and speech computer science research practicum fall 2012...

30
Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg

Upload: vivian-ross

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg
  • Slide 3
  • Artificial Intelligence AI is no longer a single subdiscipline in computer science Natural Language Processing Speech/Spoken Language Processing Robotics Logic/Planning Cognitive Radio Machine Learning 1
  • Slide 4
  • Artificial Intelligence What is intelligence? How does computer science make intelligent tools, systems, algorithms? Does computer science theory contribute to the definition of intelligence? 2
  • Slide 5
  • Language and Speech What is the relationship between language and intelligence/thought/cognition? 3
  • Slide 6
  • Language and Speech Most people consider language to be the most direct access to cognition and thought. Language is core to Artificial Intelligence 4
  • Slide 7
  • Natural Language Processing Information Retrieval (search) Information Extraction Knowledge Base Population Summarization Question Answering Named Entity Recognition Named Entity Linking, Co-reference resolution Parsing Sentiment Analysis 5
  • Slide 8
  • Information Retrieval Input: Query Output: Relevant Documents Simplest approach: Identify every document that contains the word or words in the query What about related words? run is related to running runs and marathon How do you rank for relevance? 6
  • Slide 9
  • Information Extraction Identify specific information from a single document or set of documents. Who works for what organization Who was born when? died when? Who did what to whom. This is *very* complex. Domain specific systems are developed How many different ways are there to say the same thing? 7
  • Slide 10
  • Named Entity Recognition and Linking Bo Obama is Fat. POTUS says so. The President called his dog fat. Mr. Obama, speaking to an interviewer said that The White House dog needs to go on a diet. Recognize that Bo Obama POTUS, The President Mr. Obama, The White House are all ENTITIES? How do you recognize that POTUS, The President, Mr. Obama, him all refer to the same person? 8
  • Slide 11
  • Parsing Understanding grammatical structure from text. Important step in some relation extraction, question answering, etc. 9
  • Slide 12
  • Sentiment Analysis Can you tell the difference between a positive review and a negative one? Some reviews come with labels Some labels have no reviews Some reviews have no stars 10
  • Slide 13
  • Spoken Language Processing Automatic Speech Recognition Rich Transcription Speaker Recognition Speech Synthesis Text Normalization Discourse and Dialog Turn taking Emotion Recognition 11
  • Slide 14
  • Speech Recognition Converting speech to text. Acoustic Modeling Speech to Phoneme Pronunciation Modeling How are words pronounced? Language Modeling What sequences of words are most common? 12
  • Slide 15
  • 13 Rich Transcription ALSO FROM NORTH STATION I THINK THE ORANGE LINE RUNS BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE MAP IS REALLY OBVIOUS ABOUT THIS BUT INSTEAD OF TRANSFERRING AT PARK STREET YOU CAN TRANSFER AT UH WHATS THE STATION NAME DOWNTOWN CROSSING UM AND THATLL GET YOU BACK TO THE RED LINE JUST AS EASILY
  • Slide 16
  • 14 Rich Transcription Also, from the North Station... (I think the Orange Line runs by there too so you can also catch the Orange Line... ) And then instead of transferring (um I- you know, the map is really obvious about this but) Instead of transferring at Park Street, you can transfer at (uh whats the station name) Downtown Crossing and (um) thatll get you back to the Red Line just as easily.
  • Slide 17
  • Speaker/Author Recognition What makes one speaker or author distinguishable from another? Email hacks, Chat transcripts, Anonymous authors. What are the acoustics which distinguish across two speakers? Spectral Qualities Prosodic Qualities Lexical, syntactic and content usage 15
  • Slide 18
  • Speech Synthesis Generating Speech from Text There are tools like Festival, HTS and Mary TTS that make this relatively easy Unit Selection Use a corpus of a single speaker and paste together small slices of speech to make new words Watson http://www.youtube.com/watch?v=WFR3lOm_xhEhttp://www.youtube.com/watch?v=WFR3lOm_xhE Parametric Synthesis Learn the spectral shape of different speech sounds, and synthesize them from oscillators and additive noise. Mary TTS Web client http://mary.dfki.de:59125/ 16
  • Slide 19
  • Discourse and Dialog How do you accomplish some task through discourse? Understanding the semantics of a user turn Generating an appropriate prompt Dialog/Task planning. Semantic Frame filling. 17
  • Slide 20
  • Emotion Recognition What are the acoustic properties of emotion expression? Loudness, speaking rate, pitch, hesitation etc. This type of analysis can extend to other speaker states Intoxication Sleepiness Age Gender Personality Factors Deception 18 Three Hundred Twelve. Three Thousand Twelve.
  • Slide 21
  • Corpus Analysis A corpus is a body of linguistic material Corpora (plural of corpus) are generally shared across research groups Allow for reproducible findings Division of Labor Describing phenomena is an important first step in most research. What is the distribution of ratings? What are the correlations between features and labels? Are there errors in the annotation? 19
  • Slide 22
  • Some famous corpora Penn Treebank Parse trees and part of speech ACE and KBP Information Extraction Switchboard Conversational telephone speech TIMIT Phonetic Transcription Boston Radio News Corpus Prosodic Annotation 20
  • Slide 23
  • The standard approach 21 Identify labeled training data Decide what to label What is a data point? Extract features based on the entity Train a supervised classifier Machine Learning Evaluate Cross-validation or a held-out test set.
  • Slide 24
  • How does machine learning fit in? 22 Automatically identifying patterns in data Automatically making decisions based on data Hypothesis: Data Learning Algorithm Behavior Data Programmer or Expert Behavior
  • Slide 25
  • Challenges Conversational text Social Media: Facebook, Twitter, reddit Email Chat/IM Spoken Dialog Systems Text Dialog Systems Sentiment Analysis Reviews Collaborative Filtering Natural Language Generation 23
  • Slide 26
  • Publicly available web-data Social Media twitter, google plus, forums, etc. Reviews amazon, tripadvisor, etc. Wikipedia. Find missing links in wikipedia Find potentially incorrect information in wikipedia YouTube videos, soundcloud songs. Can you classify topics? Music genres? 24
  • Slide 27
  • Use of web technologies The feedback loop. The use of the tool provides information that can be used to improve the tool. The use of the product provides training data. Which search results are best. Which ads are useful Which recommendations are correct 25
  • Slide 28
  • Feedback in Google Rank the top hits in response to a query When someone clicks on a link, boost its ranking/relevan ce Same for ads UI/UX experimnets 26
  • Slide 29
  • Feedback in Amazon Try to give users an offer. If they take it increase its value. 27
  • Slide 30
  • Feedback in Netflix Suggestions for people like you How do you group people How do you group movies 28
  • Slide 31
  • Project ideas Look at the most recent conferences in NLP and Speech ICASSP, Interspeech, ASRU ACL, EMNLP, NAACL-HLT, CoLING Also, Journals Computational Linguistics Computer Speech and Language IEEE transactions on Audio Speech and Language Processing Consider real-world problems and applications 29