improving vector space word representations using multilingual correlation

29
+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University

Upload: nadda

Post on 24-Feb-2016

70 views

Category:

Documents


0 download

DESCRIPTION

Improving Vector Space Word Representations Using Multilingual Correlation. Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University. Distributional Semantics. “You shall know a word by the company it keeps”. (Harris 1954; Firth, 1957). - PowerPoint PPT Presentation

TRANSCRIPT

Improving Vector Space Word Representations Using Multilingual Correlation

Improving Vector Space Word Representations Using Multilingual CorrelationManaal Faruqui and Chris DyerLanguage Technologies InstituteCarnegie Mellon University

+1Distributional SemanticsYou shall know a word by the company it keeps

(Harris 1954; Firth, 1957)I will take what is mine with fire and bloodthe end battle would be between fire and iceMy dragons are large and can breathe fire nowflame is the visible portion of a firetake place whereby fires can sustain their own heat+Translational SemanticsWhat other Information?(Bannard & Callison-Burch, 2005) That plane can seat more than 300 peopleRussian airplanes are huge Multilingual Information!plane airplane+OutlineDistributional SemanticsMonolingual context

Translational SemanticsMultilingual context

Better Semantic RepresentationsUsing Distributional + Translational semantics

+Word Vector RepresentationsHow to encode such co-occurrences?daynightcoldsleep0102winter3350the10129contextswords+Word Vector RepresentationLatent Semantic Analysis(Deerwester et al., 1990)Singular Value Decomposition

wordscontextwords+One of the earliest ways of computing word vectors6Multilingual InformationEnglish

German

French

SpanishdragonDrachedragondragnProblem ?= Append+Multilingual InformationVector Size Increases

Idiosyncratic Info.

What if word is OOV ?Disadvantages of Vector Concatenation?+Languages might be capturing idiosyncratic aspects of the meaning of the word.Instead of adding them together we want a consensus of what they mean !8Multilingual InformationI will take what is mine with fire and bloodthe end battle would be between fire and iceMy dragons are large and can breathe fire now

So, what can we do?... Das Ende der Schlacht wrde zwischen Feuer und Eis ... ... gesehen ist Feuer eine Oxidationsreaktion mit...... Das Licht des Feuers ist eine physikalische ErscheinungTwo Views: Canonical Correlation Analysis !+We want agreement across languages, not just what one language thinks of another9Canonical Correlation Analysis (CCA)Project two sets of vectors (equal cardinality) in a space where they are maximally correlatedConvex Optimization Problem with Exact Solution !CCA+Canonical Correlation Analysis (CCA)k = min(r(), r())WVXYn2d1kn1d2d2kd1XYkkn2n1X and Y are now maximally correlated !W, V = CCA(, )+Canonical Correlation Analysis (CCA)Vector Size Increases, Doesnt increaseProblems Addressed?Idiosyncratic Information, Lets you choose!What if word is OOV?, Projection vectors for everyone!+Canonical Correlation Analysis (CCA)The vocabularies cant be of equal size !Ok, but equal cardinality sets & ?Get word alignments from a parallel corpusPreserve only words in the original vocabularyFor every word in English, select the best foreign word

+Experimental SetupLSA Word Vector LearningMonolingual DataEnglishGermanFrenchSpanishNews CorpusWMT-2011WMT-2011 WMT 2011-12WMT-2011Tokens360,000,000290,000,000263,000,000164,000,000Types180,000294,000137,000145,000Tokenizer and Lowercasing: WMT scripts+Experimental SetupLSA Word Vector LearningParallel DataDe-EnFr-EnEs-EnNews Comm + EuroparlWMTWMTWMTTokens128,000,000138,000,000134,000,000Word pairs37,00038,00038,000Word Alignment Tool: fast_align (Dyer et al, 2013)+Experimental SetupLSA Word Vector LearningCorpus Preprocessing

...hello hello hello hello hello

Context :

23.45 , 21st , 10-20-2014 , 0.5e10 NUM

anchfgugsjh, wekjfbg, bhguyq UNK+Experimental SetupWord Similarity EvaluationWS-353 (Finkelstein et al, 2001)WS-353-SIM (Agirre et al, 2009)WS-353-REL (Agirre et al, 2009)RG-65 (Rubenstein and Goodenough, 1965)MC-30 (Miller and Charles, 1991)MTurk-287 (Radinsky et al, 2011)Word Relation EvaluationSemantic Relations (Mikolov et al, 2013)Syntactic Relations (Mikolov et al, 2013)Evaluation Benchmarks+Experimental SetupMonolingual Vector Length: 80Multilingual Vector Length: ?Multilingual Vector LearningThe length in projected space can be chosen: kChoose the best value of k for WS-353k [0.1, 0.2, , 1.0]+Experimental SetupMultilingual Vector Learning

Performance on WS-353; k = 0.6Spearmans correlation Dimensions+Experimental SetupMultilingual Vector LearningSpearmans correlation+Experimental SetupMultilingual Vector LearningAccuracy+Experimental SetupRNNLM (Mikolov et al, 2011)Predict next word given the historyNeural language modelRecurrent hidden layer connections

Skip-Gram, word2vec (Mikolov et al, 2013)Predict context given the wordRemoves hidden layerVocabulary represented in Huffman coding

Multilingual Vectors: Neural Networks+Experimental SetupMultilingual Vector LearningRNNLMSkip-Gram+Experimental SetupMultilingual Vectors: ScalingSpearmans correlation on WS-353

+Experimental SetupMultilingual Vectors: Qualitative Analysis

Antonyms and Synonyms of Beautiful: Monolingual Settingt-SNE tool (van der Maaten and Hinton, 2008)+Experimental SetupMultilingual Vectors: Qualitative AnalysisAntonyms and Synonyms of Beautiful: Multilingual Setting

t-SNE tool (van der Maaten and Hinton, 2008)+ConclusionCCA: Easy to use tool in MATLABTake vectors from two languages and improve them.

Multilingual Information is ImportantEven if the problems are inherently monolingual.

More Effective for Distributional VectorsSemantics generalizes better than Syntax.

Vectors available at: http://cs.cmu.edu/~mfaruqui+Related WorkDocument representation Vinokourov et al, 2002, Platt et al, 2010Synonymy and Paraphrasing Bannard and Burch, 2005, Ganitkevitch et al, 2013 Bilingual lexicon induction Haghighi et al, 2008Vulic and Moens, 2013Bilingual word vectors Klementiev et al 2012Zou et al, 2013Translation ModelsKalbrenner & Blunsom, 2013

Compositional SemanticsHermann & Blunsom, 2014+Thanks!

Visit us at ACL-demo: wordvectors.org+