neural networks in natural language processingjiesutd.github.io/papers/201811nuist.pdf ·...
TRANSCRIPT
1
Neural Networks in Natural Language Processing-- with POS/NER as an example
Jie YANG 杨杰
Singapore University of Technology and Design
November 21, 2018
@Nanjing University of Information Science & Technology
2
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
3
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
Introduction
4
Ø Natural Language Processing: [1]
§ Subfield of Artificial Intelligent (AI).
§ Interactions between computers and human (natural) languages.
§ How to program computers to process and analyze large amounts of natural language data.
[1] https://en.wikipedia.org/wiki/Natural_language_processing
Introduction
5
Ø Natural Language Processing: application
Digital Speaker
Search Engine
Machine Translation
News Recommend
Introduction
6
Ø Neural Network: [2]
§ Inspired by the biological neural networks.
§ Connected neurons, with weights and nonlinear functions.
[2] https://en.wikipedia.org/wiki/Artificial_neural_network
Biological Neuron Neural Network
7
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
ML&NN
8
Ø Machine Learning: machine + learning
§ Machine: automatic, efficient, programmable
§ Learning: learning from data, rather than using rules§ Supervised: large annotated data§ Semi-supervised: small annotated data + large unannotated data § Unsupervised: unannotated data
ML&NN
9
Ø Supervised Learning:
§ What we have: annotated data, i.e. training data: (𝑥, 𝑦)𝑥 is input vector, 𝑦 is given label
§ What we want: find a model 𝑓() to predict the label 𝑦' of giving decode data 𝑥'.i.e. 𝑓(𝑥') = 𝑦'
e.g. 𝑓 𝑥 = 0.5𝑥, + 2.1𝑥 + 0.3 , 0.5/2.1/0.3 are parameters.
§ The way of finding the representation of 𝑓() is machine learning.
ML&NN
10
Ø Supervised Learning:
§ Train:(𝑥, 𝑦)
model 𝑓(𝑥)/parameter𝑥
Predicted 𝑦1
Real𝑦
Compare(Loss function)
RightWrong
Update parameterwith loss
Next 𝑥
§ Decode:
model 𝑓(𝑥')/parameter𝑥'
Predict𝑦'
ML&NN
11
Ø Model 𝑓(𝑥) : structure + parameters
§ Linear Regression, SVM, Decision Tree, etc.
§ Neural Network:Feed-forward NN: Convolutional NN:
Recurrent NN:
https://www.learnopencv.com/understanding-‐feedforward-‐neural-‐networks/http://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
12
Ø Problem: § Machine Learning or Neural Network models are trained to
build a fitting function, which is calculated using numbers.
For example: 𝑓 𝑥 = 0.2𝑥, + 3𝑥2 − 2.3𝑥45
if 𝑥 = 1, then 𝑦 = 𝑓 𝑥 = 𝑓 1 = 0.2 + 3 − 2.3 = 0.9
§ Language is represented with words/characters.
“我来到南京信息工程大学。”“Our neural sequence labeling framework contains three layers.”
§ How do we apply neural networks to language processing?
13
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
NN for NLP
14
Ø Word Representation:
§ Goal: map the words into numbers.
§ Method: word embeddings.
§ Format: distributed real numbers, vectors
The chairman of the Federal Reserve is Ben Bernanke
[0.4, ...,1.3, -0.6] [0.7, ...,3.2, 1.5] [0.2, -1.2, 6.1...]... ... ...
NN for NLP
15
Ø Word Embeddings:
§ Advantages: can be tuned, contains word similarity
Word embeddings mapped in two dimensionshttp://nlp.yvespeirsman.be/images/glove-‐word-‐embeddings-‐education.png
NN for NLP
16
Ø NN models in NLP:
model 𝑓(𝑥')/parameter𝑥'
Predict𝑦'
RNN/LSTM/GRU CNN Transformer
... ...Embeddings
1. Design Challenges and Misconceptions in Neural Sequence Labeling2. Attention Is All You Need
17
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
POS/NER task
18
Ø POS: Part-of-speech (POS) tagging:§ group the words with the specific class which share similar
syntactic behaviors or fit specific types.
Ø NER: Named Entity Recognition§ identify and classify the named entities from the input text
into pre-defined entity categories.
Input The complicated language in the huge new law has muddied the fight .Output DT VBN NN IN DT JJ JJ NN VBZ VBN DT NN .Input We ’re about to see if advertising works .Output PRP VBP IN TO VB IN NN VBZ .Input This time , the firms were ready .Output DT NN , DT NNS VBD JJ .
[Barack Obama] PER was born in [hawaii] LOC .
Rare [Hendrix] PER song draft sells for almost $ 17,000 .
[Volkswagen AG] ORG won 77,719 registrations .
[Burundi] LOC disqualification from [African Cup] MISC confirmed .
The bank is a division of [First Union Corp] ORG .
The chairman of the Federal Reserve] ORG is [Ben Bernanke] PER .
[US] LOC President [Trump] PER and [KP] LOC leader [Kim] PER will meet in [Singapore] LOC .
POS/NER task
19
张三 来到 南京 信息 工程 大学Input:
NER: Person Organization
POS: NR VV NR NN NN NN
NER: B-PER O B-ORG I-ORG I-ORG I-ORG
Overview
20
Ø Overview of the entire process:
Data Annotation
Model Design
Model Training
Model Evaluation
Overview
21
Ø Data Annotation:§ based on the task, manually annotate text as training
data, i.e. build (𝑥, 𝑦).
Example of YEDDA annotation interface
Overview
22
Ø Data Annotation:§ Data format example:
NER data segment POS data segment
Overview
23
Ø Overview of the entire process:
Data Annotation
Model Design
Model Training
Model Evaluation
Overview
24
Ø Model Design:§ Different tasks are benefited from different models:
§ Parsing: Transition-based, Biaffine Attention, tree-LSTM
§ Translation: EncoderDecoder+Attention, Transformer
§ Sequence Labeling tasks: LSTM+CRF
Overview
25
Ø LSTM+CRF model:
Character Representation Word Representation
Example of NCRF++
Overview
26
Ø Overview of the entire process:
Data Annotation
Model Design
Model Training
Model Evaluation
Overview
27
Ø Model Training:
§ Initialize the model parameters: random or pretraining.
§ Feed the annotated data as input: 𝑥, 𝑦 .
§ Update the model parameters to fit the annotated data.
Overview
28
Ø Overview of the entire process:
Data Annotation
Model Design
Model Training
Model Evaluation
Overview
29
Ø Model Evaluation:
§ After training, how to evaluate the model performance?
§ Different tasks require different evaluate metrics.§ POS tagging: 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = <=>>?<@4@=A?B
CDD4@=A?B§ NER: 𝐹1 = ,FG
FHG§ Some tasks are hard to evaluate, e.g. translation
I hate you .我讨厌你我厌恶你我不喜欢你
Original:Translator1:Translator2:Translator3:
Overview
30
Ø Model Evaluation:
POS task evaluation NER task evaluation
Overview
31
Ø Model Evaluation:
§ Using the trained model to decode the test data 𝑥 and get the result 𝑦1
§ Conduct the model evaluation between 𝑦1 and gold label 𝑦.
§ If necessary, go back to refine the model design or training step, or even the data annotation step.
Data Annotation
Model Design
Model Training
Model Evaluation
32
Outline
• Introduction
• Machine Learning and Neural Networks Framework
• Neural Network Models in Natural Language Processing (NLP)
• Overview of NN for NLP – example of POS/NER
• Conclusion
Conclusion
33
Ø We have gone through the whole development process of NLP in neural network.
Ø The neural network models are actually to train a function to fit the given annotated data.
Ø There are plenty of neural network models to be used in NLP.
Ø Neural network based NLP uses embedding vectors to map the words as numbers.
Ø Data + Model +Evaluation are three important parts in the development of NN based NLP.
34
Thanks!
Q&A