embedding nomlex-br into openwn-pt

20
+ Embedding NomLex-BR into OpenWN-PT Valeria de Paiva (joint work with Alexandre Rademaker, Gerard de Melo and Livy Real)

Upload: valeria-de-paiva

Post on 11-May-2015

156 views

Category:

Education


3 download

DESCRIPTION

Slides not presented at GWN2014

TRANSCRIPT

Page 1: Embedding Nomlex-BR into OpenWN-PT

+

Embedding NomLex-BR into OpenWN-PT Valeria de Paiva (joint work with Alexandre Rademaker, Gerard de Melo and Livy Real)

Page 2: Embedding Nomlex-BR into OpenWN-PT

+NomLex?

http://nlp.cs.nyu.edu/nomlex/index.html

Page 3: Embedding Nomlex-BR into OpenWN-PT

+NomLex n  a dictionary of English nominalizations,

under Catherine Macleod

n  relate the nominal complements to the arguments of the corresponding verb

n  1025 entries of several types of lexical nominalizations

n  first version on January 15, 1999, latest version October 2001

n  Developed into NomLex-Plus and NomBank

n  downloadable from http://nlp.cs.nyu.edu/nomlex/index.html

Alexander’s destruction of the city happened in 330 BC.

Page 4: Embedding Nomlex-BR into OpenWN-PT

+NOMLEX-BR? n  a dictionary of Portuguese

nominalizations

n  Relate nominals to corresponding verbs

n  Over 1000 entries of several types of lexical nominalizations

n  first version of NOMLEX-BR in 2011, much expanded 2013

n  downloadable https://github.com/arademaker/nomlex-br

Construção da rodovia Transamazônica, na década de 70, pelo governo Medici, uma das obras faraonicas da ditadura militar.

Page 5: Embedding Nomlex-BR into OpenWN-PT

+Nominalizations in Portuguese

n  Nominalizations are difficult to deal in KR systems, as it is harder to obtain the arguments of the nominal predicate

n  NOMLEX project (Macleod et al., 1998) provides a well-established, open access baseline

n  nominalizations with the suffixes -ion, -ment and -er, which work well in Portuguese

n  E.g. construction/ construcao, adjournment/adiamento and writer/escritor

n  90% of the original resource easily manually translated.

Page 6: Embedding Nomlex-BR into OpenWN-PT

+ Into OpenWordnet-PT? Why?

We need a Portuguese Wordnet for our work, as complete and accurate as we can get it.

Nomlex-BR helps completenes and accuracy of OpenWN-PT

Page 7: Embedding Nomlex-BR into OpenWN-PT

+OpenWordNet-PT…

n  data is freely available

n  correspondence with Princeton WordNet

n  From Universal WordNet(de Melo and Weikum, 2009) high recall with high precision for the more salient words

n  Useful embedding: checking nominalizations from the Portuguese NOMLEX were related to the corresponding verbs showed issues in OpenWN-PT.

https://github.com/arademaker/wordnet-br

Page 8: Embedding Nomlex-BR into OpenWN-PT

+ OpenWN-PT: what does it look like?

n  Typical good entry with minor manual improvements.

n  Automatic produces candidate Portuguese words for each of some of WN3.0 synsets.

n  Check suggested words and add Portuguese gloss and examples.

Page 9: Embedding Nomlex-BR into OpenWN-PT

+ OpenWN-PT: what does it look like?

Not very useful, but sense exists

No single verb in Portuguese for this synset…

Page 10: Embedding Nomlex-BR into OpenWN-PT

+OpenWN-PT: some issues…

Capitalized items, plurals, duplicates, a few gender issues, missing items…

Page 11: Embedding Nomlex-BR into OpenWN-PT

+OpenWN-PT: RDF Representation

n  OpenWN-PT encoded and distributed in RDF/OWL.

n  Both data model and actual data in the same format. Plus existing data processing tools, including databases (“triple stores”) with SQL-like query interfaces (SPARQL).

n  Standard W3C encoding of WordNet in RDF since 2006. OpenWN-PT is modelled after and fully interoperable with Princeton WordNet.

n  find Portuguese equivalents for specific English word senses and vice versa.

n  OpenWN-PT is part of a large ecosystem of compatible resources, including domain identifiers and mappings to Wikipedia.

Page 12: Embedding Nomlex-BR into OpenWN-PT

+A small Experiment… n  Accuracy: Since the lexicon was

manually created, it is mostly accurate. Minor typos and bugs are checked when comparing to OpenWN-PT.

n  Coverage: Using DHBB to complete NOMLEX-BR, completed after submission

n  Need more systematic effort. But results were encouraging

Page 13: Embedding Nomlex-BR into OpenWN-PT

+ Conclusions n  We presented NomLex-BR, an lexicon of

nominalizations in Brazilian Portuguese.

n  NomLex-BR is embedded into OpenWordNet-PT and shares its RDF representation.

n  Recent improvements include better coverage: newer suffixes and Nomage incorporation.

n  The data is freely available from http://github.com/ arademaker/wordnet-br/ and a SPARQL Endpoint at logics.emap.fgv.br:10035.

n  Browsing via Open Multilingual Wordnet //www.casta-net.jp/ ~kuribayashi/ cgi-bin/wn-multi.cgi is fun

Page 14: Embedding Nomlex-BR into OpenWN-PT

+ NomLex-BR: next steps?..

n  Work with Claudia Freitas on leveraging Linguateca’s PAPEL, ACDC and Floresta Sintá(c)tica.

n  Lists from Linguateca’s resources complement NomLex-BR using corpora and make sure our resource is not simply a translation.

n  Classification of nominalizations?

n  Adding the Portuguese terms that satisfy different relations?OpenVerbNet-PT?

n  Glosses?

Page 15: Embedding Nomlex-BR into OpenWN-PT

+

Thanks!

Page 16: Embedding Nomlex-BR into OpenWN-PT

+References Revisiting a Brazilian Wordnet. Valeria de Paiva, Alexandre Rademaker,  (2012) Proceedings of Global Wordnet Conference, Global Wordnet Association, Matsue. OpenWordNet-PT: An Open Brazilian WordNet For Reasoning. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. In Proceedings of the 24th International Conference On Computational Linguistics. http://hdl.handle.net/10438/10274. OpenWordNet-PT: A Project Report. Alexandre Rademaker, Valeria de Paiva, Gerard de Melo, Livy Real and Maira Gatti. Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia. Global Wordnet Association, 2014. Embedding NomLex-BR Nominalizations Into OpenWordnet-PT. Coelho, Livy Maria Real, Alexandre Rademaker, Valeria De Paiva, and Gerard de Melo. 2014. In Proceedings of the 7th Global WordNet Conference. Tartu, Estonia

Page 17: Embedding Nomlex-BR into OpenWN-PT

+ OpenWN-PT: true lexical gaps?...

Page 18: Embedding Nomlex-BR into OpenWN-PT

+Other stuff to add in?…

n  Onto.PT, ES wordnet?

n  Editing interfaces?

n  BabelNet?

n  NER issues?

n  Temporal issues?

n  Work with Claudia Freitas?…Leonel?

n  Work on implicatives/factives in Portuguese?

n  FOIS workshop

Page 19: Embedding Nomlex-BR into OpenWN-PT

+References Towards a Universal Wordnet by Learning from Combined Evidence  Gerard de Melo, Gerhard Weikum (2009) 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China. Bridges from Language to Logic:  Concepts, Contexts and Ontologies Valeria de Paiva (2010)Logical and Semantic Frameworks with Applications, LSFA'10, Natal, Brazil, 2010. `A Basic Logic for Textual inference", AAAI Workshop on Inference for Textual Question Answering, 2005. ``Textual Inference Logic: Take Two", CONTEXT 2007. ``Precision-focused Textual Inference", Workshop on Textual Entailment and Paraphrasing, 2007. PARC's Bridge and Question Answering System Proceedings of Grammar Engineering Across Frameworks, 2007.

Page 20: Embedding Nomlex-BR into OpenWN-PT

+ Simplifying the PARC’s Bridge Architecture

Idea: Simplify and reproduce components in PORTUGUESE

F-structure semantics

KR

Parsing KR Mapping

Inference Engines Text

Sources

Question

Assertions

Query

Grammar Stanford Parser

Textual Inference logics

Term rewriting OpenWN-PT SUMO-PT KR mapping rules