ainl 2016: romanova, nefedov

Post on 23-Jan-2017

172 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HSE-School of linguistics at Russian ParaphraseDetection Shared Task

Anastasia Romanova, Mikhail Nefedov

Saint-Petersburg, 2016

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Overview

1 Introduction

2 Task

3 Standard Features

4 Word Embedding Features

5 Results

6 Next steps

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Introduction

Higher School of Economics School of Linguistics

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Task

Compare two sentencesTwo types of classificationStandard and Non-standard runs

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Standard Features

Precision

precision = word-overlap(sentence1, sentence2)word–count(sentence1)

Recall

recall = word-overlap(sentence1, sentence2)word-count(sentence2)

BLEU scoreProposed by IBM (Papineni et al., 2002) for evaluating MachineTranslation Systems

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Standard Features

SyntaxNetReleased by Google in May, 2016Models for 40 languages

Dependency parse tree

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Standard Features

Tree Edit Distance (Zhang, Shasha, 1989)

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Standard Results

Standard run

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

Words as vectors

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

Drawbacks of the averaging approach (Rijkeand Kenter, 2015)

Vectors for words Mean vectors

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

Before preprocessing

Клинтон выступила с первой речью после поражения навыборах

After preprocessing

клинтон_S выступать_V первый_A речь_S поражение_Sвыбор_S

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

BM25 + Word2Vec

sl - longest sentencesss - shortest sentencesavgsl - average sentence length

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

All to all similarities

The boy smiles - The girls laughs

Similarity matrix

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

All to all similarities

The boy smiles - The girls laughs

Bins for all values

Bins for maximum values

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Word Embedding Features

Per-dimension similarities

Cosine similarity

Similarity bins

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Results

Non-standard run

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Next steps

Find optimal intervals for binsCreate a new Word2Vec modelTest AdaGramCompute idf on a larger corpusInclude dependency weighting into BM25

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

Contacts I

anastasiaromane@gmail.commanefedov26@gmail.com

Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection Shared Task

top related