最先端nlp勉強会2017_acl17

45
2017.09.16 NAIST 然語処学研究室 D1 Masayoshi Kondo 紹介 最先端NLP勉強会@2017 Selective Encoding for Abstractive Sentence Summarization ACL17 Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou Harbin Institute of Technology, Harbin China Microsoft Research, Beijing China

Upload: masayoshi-kondo

Post on 21-Jan-2018

203 views

Category:

Data & Analytics


0 download

TRANSCRIPT

  1. 1. 2017.09.16 NAIST D1 Masayoshi Kondo - NLP@2017 Selective Encoding for Abstractive Sentence Summarization ACL17 Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou Harbin Institute of Technology, Harbin China Microsoft Research, Beijing China
  2. 2. 00: Seq2seq EncDec Selective Gate : 3 ROUGE (R-1, R-2, R-L) Encoding Selectcion Decoding Sentence Encoder Summary DecoderSelective gate network Train Set Annotated English Gigaword dataset Test Set Annotated English Gigaword Test-Set DUC2004 Test Set MSR-ATC Test Set
  3. 3. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  4. 4. 01: Text Summarization 0 5 10 15 20 25 30 2014 2015 2016 2017 Total using Neural Net company papers using Neural Net ( ACL / EMNLP summari) ! Neural Abstractive Summarization [EMNLP15, Rush et al.] NN Google, Facebook, IBM . Summarization Tasks (year) (count)
  5. 5. 02:Neural Text Summarization Text Summarization [Input] [Output (predicted)] the microsoft corporation will open its oce in dhaka on november ## to expand its sales and ght piracy in the market of this country , reported the daily new age on saturday . microsoft to open new oce in sri lanka. [Output (correct)] microsoft to open oce in dhaka.
  6. 6. 03:Neural Text Summarization Text Summarization Extractive Summarization : - () Abstractive Summarization : - NN (copy) Src() Trg() Src() Trg() ----------------- ----------------- ----------------- ----------------- ----------------- ----------------- ------------- ------------ ---------------- ---------------- ---------------- ----- ----------------- ----------------- ----------------- ----------------- ----------------- ----------------- ------------- ------------ xxxxxxxxxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxx xxxxxxx
  7. 7. 04:Neural Text Summarization Decoder (RNN)Encoder (RNN) Output (trg/summ) Input (src/ref) attention Encoder Decoder [Input]: Reference word_id - seq [Output]: Summary word_id - seq Deep Neural Networks Seq2Seq Neural Model (RNN-RNN) Train : [src, trg] Test : srctrg DecoderVocab1-of-V LossCross-Entropy
  8. 8. 05:Neural Text Summarization Gigaword Corpus: [src] [trg] CNN/Daily Mail[src] [trg] Train: 400 / Dev: 20/ Test: 40 Testset 2000 Shared taskDUC04 Test set ROUGE-score (ROUGE) n-gram 781 tokens / 56 tokens Train: 29 / Dev: 13000/ Test: 11000 multi-sentences
  9. 9. 06:Recent Researches in Abstractive Summarization (). Get To The Point: Summarization with Pointer-Generator Networks [ACL17 / Stanford Unv (D.Manning lab) with Google] Selective Encoding for Abstractive Sentence Summarization [ACL17 / with Microsoft] : -- Copy Mechanism : . -- Coverage Mechanism : repetition. : -- Selective Mechanism (Selective Gate) : EncDec(). Dataset CNN/DailyMail dataset Model bilstm-lstm-attention (src : , trg : ) Dataset Gigaword and etc Model seq2seq-attention
  10. 10. 07:Recent Researches in Abstractive Summarization Learning to Generate Market Comments from Stock Prices [Y.Miyao, ACL17] . Generate : Program Induction for Rationale Generation: Learning to Solve and Explain Algebraic Word Problems [ACL17/Oxford with DeepMind] Neural AMR: Sequence-to-Sequence Models for Parsing and Generation [Ioannis Konstas et al, ACL17] Generate : ( ) Parsing Dataset : Gigaword / Original dataset.
  11. 11. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  12. 12. 08: Introduction Attention Input/Output alignment - I/O alignment Input Attention Mechanism Encoder + Attention Mechanism Decoder Attention
  13. 13. 09: Introduction Encoder + Attention Mechanism Decoder Attention : SEASS (Selective Encoding for Abstractive Sentence Summarization) Encoding Selectcion Decoding Sentence Encoder Summary DecoderSelective gate network [Encoding] : RNNrst level sentence repr [Selection] : selective gate networksecond level sentence repr [Decoding] : second level sentence pepr
  14. 14. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  15. 15. Preliminaries IntroductionModel 1. Summary Encoder 2. Selective Mechanism 3. Summary Decoder -- Objective FunctionExperimentsDiscussion & Conclusion* Plus Alpha
  16. 16. 10: Selective Encoder x0 x1 xt-k xTxt xt+k 0 Ttt-k t+k Word Embed Bi-GRU: Forward / Backward Selective Gate ( )+UW +b= xt encoder output
  17. 17. 11: Model - selective mechanism Summary Encoder
  18. 18. Encoder : BiGRU 12: Model summary encoder Forword/Backwordzero-vector Forword/Backword(hidden state)concatenate representation
  19. 19. Preliminaries IntroductionModel 1. Summary Encoder 2. Selective Mechanism 3. Summary Decoder -- Objective FunctionExperimentsDiscussion & Conclusion* Plus Alpha
  20. 20. 13: Model - selective mechanism Selective Mechanism
  21. 21. Seq2Seq(MT) encoder decoder 14: Model - selective mechanism (abstractive sentence summarization)(MT) 1. 2. ( ) Selective Mechanism : representation seq2seq
  22. 22. 15: Model - selective mechanism s is the concatenated vector of the last forward hidden state hn and backward hidden state h1 . s is the sentence representation vector. For each word xi , the selective gate network generates a gate vector sGatei using hi and s, then tailored representation is hi.
  23. 23. Preliminaries IntroductionModel 1. Summary Encoder 2. Selective Mechanism 3. Summary Decoder -- Objective FunctionExperimentsDiscussion & Conclusion* Plus Alpha
  24. 24. 16: Model - summary decoder Summary Decoder
  25. 25. 17: Model - summary decoder wt-1 : previous word embedding ct-1 : previous context vector st : new hidden state Decoder : GRU with attention Context vector 1. (12) st-1 hi : va 2. (13) 3. (14)
  26. 26. 18: Model - summary decoder wt-1 : previous word embedding ct : context vector st : (current) decoder state rt : readout state Decoder : GRU with attention () [15] : readout state [16] : [17] :
  27. 27. Preliminaries IntroductionModel 1. Summary Encoder 2. Selective Mechanism 3. Summary Decoder -- Objective FunctionExperimentsDiscussion & Conclusion* Plus Alpha
  28. 28. 19: Model objective function LossNegative Log-Likelihood Loss D : a set of parallel sentence summary pairs : the model parameter : Stochastic Gradient Desent (SGD)
  29. 29. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  30. 30. 20: Experiments Training Set: Annotated English Gigaword dataset Test Set: English Gigaword Test Set DUC2004 Test Set MSR-ATC Test Set [Toutanova et al. 2016] (src, trg) : (, ) PTB 5 380 / 18.9 Rush et al.[Rush et al., EMNLP15] 2000 (summ1951[Chopra et al., 2016]) src:1trg:4 50075byte Croudsourcing6000 Test Set 785
  31. 31. Evaluation Metric: Rouge Score R-1 : uni-gram R-2 : bi-gram R-L : longest common subsequence(LCS) n-gram(overlapping) DUC Shared Task 21: Experiments Implementation Details Vocab-Size In : 119,504 / Out : 68,883 Word-Emb 300 Unit Type(Hidden-size) GRU (Hidden-Size : 512) Batch Size 64 Dropout 0.5 Optimization Method Adam(=0.001, 1=0.9, 2=0.999, =10^-8) Dev-Evaluation For every 2000 training batches Grad-Clipping [-5, 5] Beam-Search Size 12
  32. 32. 22: Experiments Baselines ABS [Rush et al.EMNLP15] CNN-enc + Attention / NNLM(FFNN)-dec ABS+ ABS Loss CAs2s [Chopra et al. 2016] ABSABS CNN-Enc + Attention / RNN-dec Feats2s [Nallapati et al. 2016] RNN-Seq2Seq + POSNER Luong-NMT [Loung et al. 2015] LSTM(500-dim)enc-dec s2s-att Seq2SeqAttention
  33. 33. 23: Experiments - Rush - - - Gigaword test set
  34. 34. 24: Experiments DUC2004 test set
  35. 35. 25: Experiments MSR-ATC test set
  36. 36. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  37. 37. Eectiveness of Selective Encoding 26: Discussion : (SEASS) (seq2seq-attention) . Selective Encoding. Gigaword 10 ~ 80 Gigaword (ROUGE-2 F1) Saliency Heat Map of Selective Gate Selective Gate() ()selective gate
  38. 38. 27: Discussion : Eectiveness of Selective Encoding
  39. 39. 28: Discussion : Saliency Heat Map of Selective Gate [Input] : the council of europe s human rights commissioner slammed thursday as unacceptable conditions in france s overcrowded and dilapidated jails , where some ## inmates have committed suicide this year . [System] : council of europe slams french prison conditions. [True] : council of europe again slams french prison conditions.
  40. 40. 29: Conclusion seq2seq Selective Encode Model Selective Mechanism () Encoding / Selection / Decoding English Gigaword, DUC2004, MSR-ATC test set Selective Gate Input Output
  41. 41. Preliminaries IntroductionModelExperimentsDiscussion & Conclusion* Plus Alpha
  42. 42. 31 : *Plus Alpha SEASS Get To The Point: Summarization with Pointer-Generator Networks [ACL17 / Stanford Unv (D.Manning lab) with Google] Selective Encoding for Abstractive Sentence Summarization [ACL17 / with Microsoft] : -- Copy Mechanism : . -- Coverage Mechanism : repetition. : -- Selective Mechanism (Selective Gate) : EncDec(). Dataset CNN/DailyMail dataset Model bilstm-lstm-attention (src : , trg : ) Dataset Gigaword and etc Model seq2seq-attention SEASS
  43. 43. 32 : *Plus Alpha SEASS Get To The Point: Summarization with Pointer-Generator Networks SEASS (pointer-generator mechanism/coverage mechanism) [Enc (Get To The Point ~)] Bi-LSTM [ : SEASS Enc ] Bi-GRU + Selective Mechanism CNN/Daily Mail dataset [src] [trg] 781 tokens / 56 tokens Train: 29 / Dev: 13000/ Test: 11000 multi-sentences. / Vocab-size: 50k :400words / :100words 120words
  44. 44. 33: (CNN/Dailymail Dataset: 50k) Model Rouge-1 Rouge-2 Rouge-L # of params Abigail et al. 2017 - ENC : BiLSTM - pointer-generator 36.44 15.66 33.42 - Abigail et al. 2017 37.88 16.39 33.46 - SEASS [ACL17] - BiGRU - Selective Enc 37.44 16.00 33.35 - ACL17short text summarizationNNlong text summarization (SEASS.) Abigail et al. 2017 + coverage 39.53 17.28 36.38 - Abigail et al. 2017 + coverage 39.86 17.50 35.38 - SEASS [ACL17] + coverage 38.65 16.88 34.36 -
  45. 45. END