learning to select, track, and generate for data-to-text · record1 and updates the state of the...
TRANSCRIPT
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2102–2113Florence, Italy, July 28 - August 2, 2019. c©2019 Association for Computational Linguistics
2102
Learning to Select, Track, and Generate for Data-to-Text
Hayate Iso∗†Yui Uehara‡ Tatsuya Ishigaki\‡Hiroshi Noji‡
Eiji Aramaki†‡ Ichiro Kobayashi[‡Yusuke Miyao]‡Naoaki Okazaki\‡Hiroya Takamura\‡
†Nara Institute of Science and Technology ‡Artificial Intelligence Research Center, AIST\Tokyo Institute of Technology [Ochanomizu University ]The University of Tokyo{iso.hayate.id3,aramaki}@is.naist.jp [email protected]
{yui.uehara,ishigaki.t,hiroshi.noji,takamura.hiroya}@[email protected] [email protected]
Abstract
We propose a data-to-text generation modelwith two modules, one for tracking and theother for text generation. Our tracking mod-ule selects and keeps track of salient infor-mation and memorizes which record has beenmentioned. Our generation module generatesa summary conditioned on the state of track-ing module. Our model is considered to simu-late the human-like writing process that gradu-ally selects the information by determining theintermediate variables while writing the sum-mary. In addition, we also explore the ef-fectiveness of the writer information for gen-eration. Experimental results show that ourmodel outperforms existing models in all eval-uation metrics even without writer informa-tion. Incorporating writer information fur-ther improves the performance, contributing tocontent planning and surface realization.
1 Introduction
Advances in sensor and data storage technolo-gies have rapidly increased the amount of dataproduced in various fields such as weather, fi-nance, and sports. In order to address the infor-mation overload caused by the massive data, data-to-text generation technology, which expresses thecontents of data in natural language, becomesmore important (Barzilay and Lapata, 2005). Re-cently, neural methods can generate high-qualityshort summaries especially from small pieces ofdata (Liu et al., 2018).
Despite this success, it remains challengingto generate a high-quality long summary fromdata (Wiseman et al., 2017). One reason for thedifficulty is because the input data is too large fora naive model to find its salient part, i.e., to deter-mine which part of the data should be mentioned.
∗Work was done during the internship at Artificial Intel-ligence Research Center, AIST
In addition, the salient part moves as the sum-mary explains the data. For example, when gen-erating a summary of a basketball game (Table 1(b)) from the box score (Table 1 (a)), the inputcontains numerous data records about the game:e.g., Jordan Clarkson scored 18 points. Existingmodels often refer to the same data record mul-tiple times (Puduppully et al., 2019). The mod-els may mention an incorrect data record, e.g.,Kawhi Leonard added 19 points: the summaryshould mention LaMarcus Aldridge, who scored19 points. Thus, we need a model that finds salientparts, tracks transitions of salient parts, and ex-presses information faithful to the input.
In this paper, we propose a novel data-to-text generation model with two modules, one forsaliency tracking and another for text generation.The tracking module keeps track of saliency in theinput data: when the module detects a saliencytransition, the tracking module selects a new datarecord1 and updates the state of the tracking mod-ule. The text generation module generates a doc-ument conditioned on the current tracking state.Our model is considered to imitate the human-likewriting process that gradually selects and tracksthe data while generating the summary. In ad-dition, we note some writer-specific patterns andcharacteristics: how data records are selected to bementioned; and how data records are expressed astext, e.g., the order of data records and the wordusages. We also incorporate writer informationinto our model.
The experimental results demonstrate that, evenwithout writer information, our model achievesthe best performance among the previous modelsin all evaluation metrics: 94.38% precision of re-lation generation, 42.40% F1 score of content se-lection, 19.38% normalized Damerau-Levenshtein
1We use ‘data record’ and ‘relation’ interchangeably.
2103
Distance (DLD) of content ordering, and 16.15%of BLEU score. We also confirm that addingwriter information further improves the perfor-mance.
2 Related Work
2.1 Data-to-Text Generation
Data-to-text generation is a task for generating de-scriptions from structured or non-structured dataincluding sports commentary (Tanaka-Ishii et al.,1998; Chen and Mooney, 2008; Taniguchi et al.,2019), weather forecast (Liang et al., 2009; Meiet al., 2016), biographical text from infobox inWikipedia (Lebret et al., 2016; Sha et al., 2018;Liu et al., 2018) and market comments from stockprices (Murakami et al., 2017; Aoki et al., 2018).
Neural generation methods have become themainstream approach for data-to-text generation.The encoder-decoder framework (Cho et al., 2014;Sutskever et al., 2014) with the attention (Bah-danau et al., 2015; Luong et al., 2015) and copymechanism (Gu et al., 2016; Gulcehre et al., 2016)has successfully applied to data-to-text tasks.However, neural generation methods sometimesyield fluent but inadequate descriptions (Tu et al.,2017). In data-to-text generation, descriptions in-consistent to the input data are problematic.
Recently, Wiseman et al. (2017) introducedthe ROTOWIRE dataset, which contains multi-sentence summaries of basketball games with box-score (Table 1). This dataset requires the selectionof a salient subset of data records for generatingdescriptions. They also proposed automatic evalu-ation metrics for measuring the informativeness ofgenerated summaries.
Puduppully et al. (2019) proposed a two-stagemethod that first predicts the sequence of datarecords to be mentioned and then generates asummary conditioned on the predicted sequences.Their idea is similar to ours in that the both con-sider a sequence of data records as content plan-ning. However, our proposal differs from theirsin that ours uses a recurrent neural network forsaliency tracking, and that our decoder dynami-cally chooses a data record to be mentioned with-out fixing a sequence of data records.
2.2 Memory modules
The memory network can be used to maintainand update representations of the salient informa-tion (Weston et al., 2015; Sukhbaatar et al., 2015;
Graves et al., 2016). This module is often usedin natural language understanding to keep trackof the entity state (Kobayashi et al., 2016; Hoanget al., 2018; Bosselut et al., 2018).
Recently, entity tracking has been popular forgenerating coherent text (Kiddon et al., 2016; Jiet al., 2017; Yang et al., 2017; Clark et al., 2018).Kiddon et al. (2016) proposed a neural checklistmodel that updates predefined item states. Ji et al.(2017) proposed an entity representation for thelanguage model. Updating entity tracking stateswhen the entity is introduced, their method selectsthe salient entity state.
Our model extends this entity tracking modulefor data-to-text generation tasks. The entity track-ing module selects the salient entity and appropri-ate attribute in each timestep, updates their states,and generates coherent summaries from the se-lected data record.
3 Data
Through careful examination, we found that in theoriginal dataset ROTOWIRE, some NBA gameshave two documents, one of which is sometimes inthe training data and the other is in the test or val-idation data. Such documents are similar to eachother, though not identical. To make this datasetmore reliable as an experimental dataset, we cre-ated a new version.
We ran the script provided by Wiseman et al.(2017), which is for crawling the ROTOWIRE
website for NBA game summaries. The script col-lected approximately 78% of the documents in theoriginal dataset; the remaining documents disap-peared. We also collected the box-scores associ-ated with the collected documents. We observedthat some of the box-scores were modified com-pared with the original ROTOWIRE dataset.
The collected dataset contains 3,752 instances(i.e., pairs of a document and box-scores). How-ever, the four shortest documents were not sum-maries; they were, for example, an announcementabout the postponement of a match. We thusdeleted these 4 instances and were left with 3,748instances. We followed the dataset split by Wise-man et al. (2017) to split our dataset into train-ing, development, and test data. We found 14 in-stances that didn’t have corresponding instances inthe original data. We randomly classified 9, 2, and3 of those 14 instances respectively into training,development, and test data. Finally, the sizes of
2104
TEAM H/V WIN LOSS PTS REB AST FG PCT FG3 PCT . . .
KNICKS H 16 19 104 46 26 45 46 . . .BUCKS V 18 16 105 42 20 47 32 . . .
PLAYER H/V PTS REB AST BLK STL MIN CITY . . .
CARMELO ANTHONY H 30 11 7 0 2 37 NEW YORK . . .DERRICK ROSE H 15 3 4 0 1 33 NEW YORK . . .COURTNEY LEE H 11 2 3 1 1 38 NEW YORK . . .GIANNIS ANTETOKOUNMPO V 27 13 4 3 1 39 MILWAUKEE . . .GREG MONROE V 18 9 4 1 3 31 MILWAUKEE . . .JABARI PARKER V 15 4 3 0 1 37 MILWAUKEE . . .MALCOLM BROGDON V 12 6 8 0 0 38 MILWAUKEE . . .MIRZA TELETOVIC V 13 1 0 0 0 21 MILWAUKEE . . .JOHN HENSON V 2 2 0 0 0 14 MILWAUKEE . . .. . . . . . . . . . . . . . . . . . . . .
(a) Box score: Top contingency table shows number of wins andlosses and summary of each game. Bottom table shows statisticsof each player such as points scored (PLAYER’s PTS), and totalrebounds (PLAYER’s REB).
The Milwaukee Bucks defeated the New York Knicks,105-104, at Madison Square Garden on Wednesday. TheKnicks (16-19) checked in to Wednesday’s contest lookingto snap a five-game losing streak and heading into the fourthquarter, they looked like they were well on their way to thatgoal. . . . Antetokounmpo led the Bucks with 27 points, 13rebounds, four assists, a steal and three blocks, his secondconsecutive double-double. Greg Monroe actually checkedin as the second-leading scorer and did so in his customarybench role, posting 18 points, along with nine boards, fourassists, three steals and a block. Jabari Parker contributed15 points, four rebounds, three assists and a steal. MalcolmBrogdon went for 12 points, eight assists and six rebounds.Mirza Teletovic was productive in a reserve role as well,generating 13 points and a rebound. . . . Courtney Leechecked in with 11 points, three assists, two rebounds, asteal and a block. . . . The Bucks and Knicks face off onceagain in the second game of the home-and-home series, withthe meeting taking place Friday night in Milwaukee.
(b) NBA basketball game summary: Each summaryconsists of game victory or defeat of the game andhighlights of valuable players.
Table 1: Example of input and output data: task defines box score (1a) used for input and summary document ofgame (1b) used as output. Extracted entities are shown in bold face. Extracted values are shown in green.
t 199 200 201 202 203 204 205 206 207 208 209
Yt Jabari Parker contributed 15 points , four rebounds , three assists
Zt 1 1 0 1 0 0 1 0 0 1 0
EtJABARI JABARI - JABARI - - JABARI - - JABARI -PARKER PARKER PARKER PARKER PARKER
At FIRST NAME LAST NAME - PLAYER PTS - - PLAYER REB - - PLAYER AST -Nt - - - 0 - - 1 - - 1 -
Table 2: Running example of our model’s generation process. At every time step t, model predicts each randomvariable. Model firstly determines whether to refer to data records (Zt = 1) or not (Zt = 0). If random variableZt = 1, model selects entity Et, its attribute At and binary variables Nt if needed. For example, at t = 202, modelpredicts random variable Z202 = 1 and then selects the entity JABARI PARKER and its attribute PLAYER PTS.Given these values, model outputs token 15 from selected data record.
our training, development, test dataset are respec-tively 2,714, 534, and 500. On average, each sum-mary has 384 tokens and 644 data records. Eachmatch has only one summary in our dataset, asfar as we checked. We also collected the writerof each document. Our dataset contains 32 differ-ent writers. The most prolific writer in our datasetwrote 607 documents. There are also writers whowrote less than ten documents. On average, eachwriter wrote 117 documents. We call our newdataset ROTOWIRE-MODIFIED.2
4 Saliency-Aware Text Generation
At the core of our model is a neural languagemodel with a memory state hLM to generate asummary y1:T = (y1, . . . , yT ) given a set of datarecords x. Our model has another memory statehENT, which is used to remember the data records
2For information about the dataset, please followthis link: https://github.com/aistairc/rotowire-modified
that have been referred to. hENT is also used to up-date hLM, meaning that the referred data recordsaffect the text generation.
Our model decides whether to refer to x, whichdata record r ∈ x to be mentioned, and how to ex-press a number. The selected data record is used toupdate hENT. Formally, we use the four variables:
1. Zt: binary variable that determines whether themodel refers to input x at time step t (Zt = 1).
2. Et: At each time step t, this variable indi-cates the salient entity (e.g., HAWKS, LEBRON
JAMES).3. At: At each time step t, this variable indicates
the salient attribute to be mentioned (e.g., PTS).4. Nt: If attribute At of the salient entity Et is
a numeric attribute, this variable determines ifa value in the data records should be output inArabic numerals (e.g., 50) or in English words(e.g., five).
To keep track of the salient entity, our modelpredicts these random variables at each time step
2105
t through its summary generation process. Run-ning example of our model is shown in Table 2and full algorithm is described in Appendix A. Inthe following subsections, we explain how to ini-tialize the model, predict these random variables,and generate a summary. Due to space limitations,bias vectors are omitted.
Before explaining our method, we describe ournotation. Let E and A denote the sets of en-tities and attributes, respectively. Each recordr ∈ x consists of entity e ∈ E , attribute a ∈ A,and its value x[e, a], and is therefore representedas r = (e, a,x[e, a]). For example, the box-score in Table 1 has a record r such that e =ANTHONY DAVIS, a = PTS, and x[e, a] = 20.
4.1 InitializationLet r denote the embedding of data record r ∈ x.Let e denote the embedding of entity e. Note thate depends on the set of data records, i.e., it de-pends on the game. We also use e for static em-bedding of entity e, which, on the other hand, doesnot depend on the game.
Given the embedding of entity e, attribute a,and its value v, we use the concatenation layerto combine the information from these vectorsto produce the embedding of each data record(e, a, v), denoted as re,a,v as follows:
re,a,v = tanh(W R(e⊕ a⊕ v)
), (1)
where ⊕ indicates the concatenation of vectors,and W R denotes a weight matrix.3
We obtain e in the set of data records x, by sum-ming all the data-record embeddings transformedby a matrix:
e = tanh
(∑a∈A
W Aa re,a,x[e,a]
), (2)
where W Aa is a weight matrix for attribute a.
Since e depends on the game as above, e is sup-posed to represent how entity e played in the game.
To initialize the hidden state of each module, weuse embeddings of<SOD> for hLM and averagedembeddings of e for hENT.
4.2 Saliency transitionGenerally, the saliency of text changes during textgeneration. In our work, we suppose that the
3We also concatenate the embedding vectors that repre-sents whether the entity is in home or away team.
saliency is represented as the entity and its at-tribute being talked about. We therefore proposea model that refers to a data record at each time-point, and transitions to another as text goes.
To determine whether to transition to anotherdata record or not at time t, the model calculatesthe following probability:
p(Zt = 1 | hLMt−1,h
ENTt−1) = σ(W z(h
LMt−1 ⊕ hENT
t−1)),(3)
where σ(·) is the sigmoid function. If p(Zt = 1 |hLMt−1,h
ENTt−1) is high, the model transitions to an-
other data record.When the model decides to transition to another,
the model then determines which entity and at-tribute to refer to, and generates the next word(Section 4.3). On the other hand, if the model de-cides not transition to another, the model generatesthe next word without updating the tracking stateshENTt = hENT
t−1 (Section 4.4).
4.3 Selection and trackingWhen the model refers to a new data record(Zt = 1), it selects an entity and its attribute. Italso tracks the saliency by putting the informa-tion about the selected entity and attribute into thememory vector hENT. The model begins to selectthe subject entity and update the memory states ifthe subject entity will change.
Specifically, the model first calculates the prob-ability of selecting an entity:
p(Et = e | hLMt−1,h
ENTt−1)
∝
{exp
(hENTs W OLDhLM
t−1)
if e ∈ Et−1exp
(eW NEWhLM
t−1)
otherwise, (4)
where Et−1 is the set of entities that have alreadybeen referred to by time step t, and s is defined ass = max{s : s ≤ t− 1, e = es}, which indicatesthe time step when this entity was last mentioned.
The model selects the most probable entity asthe next salient entity and updates the set of enti-ties that appeared (Et = Et−1 ∪ {et}).
If the salient entity changes (et 6= et−1), themodel updates the hidden state of the trackingmodel hENT with a recurrent neural network witha gated recurrent unit (GRU; Chung et al., 2014):
hENT′t =
hENTt−1 if et = et−1
GRUE(e,hENTt−1) else if et 6∈ Et−1
GRUE(W Ssh
ENTs ,hENT
t−1) otherwise.(5)
2106
Note that if the selected entity at time step t, et, isidentical to the previously selected entity et−1, thehidden state of the tracking model is not updated.
If the selected entity et is new (et 6∈ Et−1), thehidden state of the tracking model is updated withthe embedding e of entity et as input. In contrast,if entity et has already appeared in the past (et ∈Et−1) but is not identical to the previous one (et 6=et−1), we use hENT
s (i.e., the memory state whenthis entity last appeared) to fully exploit the localhistory of this entity.
Given the updated hidden state of the trackingmodel hENT
t , we next select the attribute of thesalient entity by the following probability:
p(At = a | et,hLMt−1,h
ENT′t ) (6)
∝ exp(ret,a,x[et,a]W
ATTR(hLMt−1 ⊕ hENT′
t )).
After selecting at, i.e., the most probable attributeof the salient entity, the tracking model updates thememory state hENT
t with the embedding of the datarecord ret,at,x[et,at] introduced in Section 4.1:
hENTt = GRUA(ret,at,x[et,at],h
ENT′t ). (7)
4.4 Summary generationGiven two hidden states, one for language modelhLMt−1 and the other for tracking model hENT
t , themodel generates the next word yt. We also incor-porate a copy mechanism that copies the value ofthe salient data record x[et, at].
If the model refers to a new data record (Zt =1), it directly copies the value of the data recordx[et, at]. However, the values of numerical at-tributes can be expressed in at least two differentmanners: Arabic numerals (e.g., 14) and Englishwords (e.g., fourteen). We decide which one to useby the following probability:
p(Nt = 1 | hLMt−1,h
ENTt ) = σ(W N(hLM
t−1 ⊕ hENTt )),
(8)
where W N is a weight matrix. The model thenupdates the hidden states of the language model:
h′t = tanh(W H(hLM
t−1 ⊕ hENTt )
), (9)
where W H is a weight matrix.If the salient data record is the same as the pre-
vious one (Zt = 0), it predicts the next word yt viaa probability over words conditioned on the con-text vector h′t:
p(Yt | h′t) = softmax(W Yh′t). (10)
Subsequently, the hidden state of language modelhLM is updated:
hLMt = LSTM(yt ⊕ h′t,h
LMt−1), (11)
where yt is the embedding of the word generatedat time step t.4
4.5 Incorporating writer informationWe also incorporate the information about thewriter of the summaries into our model. Specif-ically, instead of using Equation (9), we concate-nate the embedding w of a writer to hLM
t−1 ⊕ hENTt
to construct context vector h′t:
h′t = tanh(W ′H(hLM
t−1 ⊕ hENTt ⊕w)
), (12)
where W ′H is a new weight matrix. Since this newcontext vector h′t is used for calculating the proba-bility over words in Equation (10), the writer infor-mation will directly affect word generation, whichis regarded as surface realization in terms of tra-ditional text generation. Simultaneously, contextvector h′t enhanced with the writer information isused to obtain hLM
t , which is the hidden state ofthe language model and is further used to select thesalient entity and attribute, as mentioned in Sec-tions 4.2 and 4.3. Therefore, in our model, thewriter information affects both surface realizationand content planning.
4.6 Learning objectiveWe apply fully supervised training that maximizesthe following log-likelihood:
log p(Y1:T , Z1:T , E1:T , A1:T , N1:T | x)
=
T∑t=1
log p(Zt = zt | hLMt−1,h
ENTt−1)
+∑
t:Zt=1
log p(Et = et | hLMt−1,h
ENTt−1)
+∑
t:Zt=1
log p(At = at | et,hLMt−1,h
ENT′t )
+∑
t:Zt=1,atis num attr
log p(Nt = nt | hLMt−1,h
ENTt )
+∑
t:Zt=0
log p(Yt = yt | h′t)
4In our initial experiment, we observed a word repetitionproblem when the tracking model is not updated during gen-erating each sentence. To avoid this problem, we also updatethe tracking model with special trainable vectors vREFRESHto refresh these states after our model generates a period:hENT
t = GRUA(vREFRESH,hENTt )
2107
MethodRG CS CO
BLEU# P% P% R% F1% DLD%
GOLD 27.36 93.42 100. 100. 100. 100. 100.TEMPLATES 54.63 100. 31.01 58.85 40.61 17.50 8.43
Wiseman et al. (2017) 22.93 60.14 24.24 31.20 27.29 14.70 14.73Puduppully et al. (2019) 33.06 83.17 33.06 43.59 37.60 16.97 13.96
PROPOSED 39.05 94.43 35.77 52.05 42.40 19.38 16.15
Table 3: Experimental result. Each metric evaluates whether important information (CS) is described accurately(RG) and in correct order (CO).
5 Experiments
5.1 Experimental settings
We used ROTOWIRE-MODIFIED as the dataset forour experiments, which we explained in Section 3.The training, development, and test data respec-tively contained 2,714, 534, and 500 games.
Since we take a supervised training approach,we need the annotations of the random variables(i.e., Zt, Et, At, and Nt) in the training data, asshown in Table 2. Instead of simple lexical match-ing with r ∈ x, which is prone to errors in theannotation, we use the information extraction sys-tem provided by Wiseman et al. (2017). Althoughthis system is trained on noisy rule-based annota-tions, we conjecture that it is more robust to errorsbecause it is trained to minimize the marginalizedloss function for ambiguous relations. All trainingdetails are described in Appendix B.
5.2 Models to be compared
We compare our model5 against two baselinemodels. One is the model used by Wisemanet al. (2017), which generates a summary with anattention-based encoder-decoder model. The otherbaseline model is the one proposed by Puduppullyet al. (2019), which first predicts the sequence ofdata records and then generates a summary condi-tioned on the predicted sequences. Wiseman et al.(2017)’s model refers to all data records everytimestep, while Puduppully et al. (2019)’s modelrefers to a subset of all data records, which is pre-dicted in the first stage. Unlike these models, ourmodel uses one memory vector hENT
t that tracksthe history of the data records, during generation.We retrained the baselines on our new dataset. Wealso present the performance of the GOLD and
5Our code is available from https://github.com/aistairc/sports-reporter
TEMPLATES summaries. The GOLD summary isexactly identical with the reference summary andeach TEMPLATES summary is generated in thesame manner as Wiseman et al. (2017).
In the latter half of our experiments, we exam-ine the effect of adding information about writers.In addition to our model enhanced with writer in-formation, we also add writer information to themodel by Puduppully et al. (2019). Their methodconsists of two stages corresponding to contentplanning and surface realization. Therefore, byincorporating writer information to each of thetwo stages, we can clearly see which part of themodel to which the writer information contributesto. For Puduppully et al. (2019) model, we attachthe writer information in the following three ways:
1. concatenating writer embedding w with the in-put vector for LSTM in the content planningdecoder (stage 1);
2. concatenating writer embedding w with the in-put vector for LSTM in the text generator (stage2);
3. using both 1 and 2 above.
For more details about each decoding stage, read-ers can refer to Puduppully et al. (2019).
5.3 Evaluation metrics
As evaluation metrics, we use BLEU score (Pap-ineni et al., 2002) and the extractive metrics pro-posed by Wiseman et al. (2017), i.e., relation gen-eration (RG), content selection (CS), and contentordering (CO) as evaluation metrics. The extrac-tive metrics measure how well the relations ex-tracted from the generated summary match thecorrect relations6:
6The model for extracting relation tuples was trained ontuples made from the entity (e.g., team name, city name,player name) and attribute value (e.g., “Lakers”, “92”) ex-
2108
- RG: the ratio of the correct relations out of allthe extracted relations, where correct relationsare relations found in the input data records x.The average number of extracted relations isalso reported.
- CS: precision and recall of the relations ex-tracted from the generated summary againstthose from the reference summary.
- CO: edit distance measured with normalizedDamerau-Levenshtein Distance (DLD) betweenthe sequences of relations extracted from thegenerated and reference summary.
6 Results and Discussions
We first focus on the quality of tracking model andentity representation in Sections 6.1 to 6.4, wherewe use the model without writer information. Weexamine the effect of writer information in Sec-tion 6.5.
6.1 Saliency tracking-based modelAs shown in Table 3, our model outperforms allbaselines across all evaluation metrics.7 One ofthe noticeable results is that our model achievesslightly higher RG precision than the gold sum-mary. Owing to the extractive evaluation nature,the generated summary of the precision of the rela-tion generation could beat the gold summary per-formance. In fact, the template model achieves100% precision of the relation generations.
The other is that only our model exceeds thetemplate model regarding F1 score of the con-tent selection and obtains the highest performanceof content ordering. This imply that the trackingmodel encourages to select salient input records inthe correct order.
6.2 Qualitative analysis of entity embeddingOur model has the entity embedding e, which de-pends on the box score for each game in additionto static entity embedding e. Now we analyze thedifference of these two types of embeddings.
We present a two-dimensional visualizations ofboth embeddings produced using PCA (Pearson,
tracted from the summaries, and the corresponding attributes(e.g., “TEAM NAME”, “PTS”) found in the box- or line-score.The precision and the recall of this extraction model are re-spectively 93.4% and 75.0% in the test data.
7The scores of Puduppully et al. (2019)’s model signifi-cantly dropped from what they reported, especially on BLEUmetric. We speculate this is mainly due to the reduced amountof our training data (Section 3). That is, their model might bemore data-hungry than other models.
Solomon Hill
Joe InglesPJ Tucker
Kevin Garnett
Jimmy Butler Shabazz Muhammad
Carlos Boozer
Jonathan Gibson
Bobby Portis
Jabari BrownJames Jones
Giannis Antetokounmpo
Deron Williams
Patrick Beverley
Shavlik Randolph
Alan WilliamsJared Sullinger
Joel Freeland
Amir Johnson
Dahntay Jones
Amar'e Stoudemire
Mario HezonjaNene
Terrence RossLarry Drew II Tony Snell
Tyler Ulis
Cheick Diallo
Jamal Crawford
Devin HarrisGordon HaywardAlex Stepheson
Hedo TurkogluC.J. Watson
Joffrey Lauvergne
Karl-Anthony Towns
Dennis SchroderJarell Eddie
Isaiah Whitehead
Jordan McRae
Joel Embiid
Otto Porter Jr.
Arron Afflalo
Jae Crowder
Jannero Pargo
Al Horford
Boris Diaw
Serge Ibaka
Tyreke Evans
Shaun LivingstonAndre Roberson
Dwight Howard
Alexis AjincaRoy Hibbert
Hollis Thompson
Gerald Green
Isaiah Canaan
Pascal Siakam
Michael Kidd-Gilchrist
Darrun Hilliard
Mario Chalmers
Kelly Oubre Jr.
John Henson
Shelvin Mack
Quincy Pondexter
Thomas Robinson
Brian Roberts
Kris Humphries
Jerome Jordan
Mike Miller
Alec BurksZach Randolph Ricky Rubio
Nikola Mirotic
Randy Foye
Danny Granger
Nikola Pekovic
Ed Davis
Francisco GarciaDonald Sloan
Nick CollisonEvan Fournier
Nerlens Noel
Jordan Hill
Semaj Christon
Ersan Ilyasova
Jarell MartinBen Gordon
Damien Inglis
Marreese Speights
Georgios PapagiannisJon Leuer
Denzel Valentine
Clint Capela
Willie Green
Sean Kilpatrick
Chris Douglas-Roberts
DeMarcus Cousins
Shawn Marion
Trey BurkeBrook Lopez
James Harden
Kevon LooneyMalcolm BrogdonNorris ColeLance Thomas
Jeremy Lin
Klay Thompson
Reggie Jackson
Channing FryeJoe Harris
Jerian GrantDion Waiters
Tiago Splitter
Anthony Morrow
Aaron Brooks
Andrew Wiggins
Jusuf Nurkic
Greg Monroe
Justise Winslow
Ty Lawson
Bryce Cotton
Dorell Wright
Tim Hardaway Jr.Carmelo Anthony
Alex LenDeMarre Carroll
DeMar DeRozan
Jason Smith
Kevin Love
Marco Belinelli
Bruno Caboclo
KJ McDaniels
Mike MuscalaSpencer Hawes
Marcin Gortat
Alex Kirk
Donatas Motiejunas
Jordan AdamsAndrew Harrison
Nick CalathesTomas Satoransky
Tyrus ThomasLance Stephenson
Raul NetoStanley Johnson
T.J. McConnell
Rasual Butler
Ricky LedoLuis Scola
Dario Saric
Brandon RushAlex AbrinesDarius Miller Evan Turner
Doug McDermott
Chris Johnson
Steve Novak
Tarik Black
Blake Griffin
Goran Dragic
Derrick Rose
Matt Bonner
Omer AsikCourtney Lee
Okaro White
Gerald Henderson Shawn Long
Jordan Clarkson
Mike Scott
Derrick Favors
Eric Gordon
David WestKyle O'Quinn
Dirk Nowitzki
Kyrie Irving
Kendrick Perkins
Victor Oladipo JaMychal Green
Chris Bosh
Buddy Hield
Bismack Biyombo
Cody Zeller
Russ SmithJimmer Fredette
Garrett Temple
Dante Cunningham
Ish Smith
Robert Sacre
Sam Dekker
Malcolm Delaney
Raymond Felton
Austin Daye
Wayne Ellington
Vince Carter
Kris Dunn
Andre IguodalaShawne Williams
Eric BledsoeCorey Brewer
Paul George
Alonzo Gee
Nemanja NedovicAndre Drummond
Reggie Evans
Dante ExumReggie Bullock
Bradley Beal
Kirk HinrichSteve Blake
Mike Dunleavy
CJ McCollumMarcus Smart
Jared Cunningham
Kevin Durant
Jorge GutierrezTaj Gibson
T.J. WarrenIan Mahinmi
Drew Gooden
Nick Young
Anthony Brown
O.J. Mayo
Trevor Ariza
LeBron James
Ray McCallum
Rondae Hollis-Jefferson Skal Labissiere
Ryan Hollins
Andrea Bargnani
Allen Crabbe
Chris Kaman
Sheldon Mac
Robin Lopez
Kobe Bryant
Shane Larkin
Marcus ThorntonMatt Barnes
Chase Budinger
Troy Williams
Marvin Williams
Kemba Walker
Luc Mbah a MouteKenyon Martin
Cameron PayneAustin RiversJabari Parker Elliot WilliamsJustin Hamilton
Justin Holiday
Ryan AndersonEmmanuel Mudiay
Omri CasspiJahlil Okafor
Aaron Gordon
Ron Baker
Avery Bradley
Derrick Williams
Kyle Korver
Ronny Turiaf
Juancho HernangomezCory Jefferson
DeAndre Liggins
Greivis VasquezTobias Harris
Kyle Anderson
D.J. Augustin
Rodney McGruder
Bernard James
Andrew Goudelock
Tyler HansbroughIan Clark
Festus Ezeli
Tyler Johnson
Tyler Zeller
Troy DanielsFred VanVleet
John Lucas III
JJ Redick
Brandon Bass
Damian JonesJeff Teague
Paul Millsap Lavoy Allen
Jared Dudley
Zaza Pachulia
Danny Green
Henry Walker
Chris Copeland
Miles Plumlee
Matthew Dellavedova
Cristiano FelicioIsaiah Thomas
Jerami Grant
Jarrod Uthoff
Sasha Vujacic
JR Smith
Malachi Richardson
Adreian Payne
CJ Miles
Meyers LeonardStephen Curry
Jose Calderon
John Wall
Elijah Millsap
Tony Allen
Brandon DaviesHarrison Barnes
JaKarr SampsonRajon Rondo
Dorian Finney-Smith
Brandon Jennings
Lou WilliamsDeAndre Jordan
Danilo Gallinari
Kelly OlynykKosta Koufos
Jeff Withey
Hassan WhitesideRodney StuckeyDomantas SabonisQuincy Acy
Jarrett Jack
Glen Davis
Quincy Miller
AJ PriceKhris Middleton
Gary Neal
James Young
JaVale McGee
Michael Beasley
Deyonta Davis
Russell WestbrookAnthony Davis E'Twaun Moore
Jonas Valanciunas
Tony Wroten
Perry Jones IIIHenry SimsSpencer Dinwiddie
Carl Landry
Markel Brown
Wesley Matthews
Dwyane Wade
Dewayne Dedmon
Mason Plumlee
Tim Frazier
Gorgui Dieng
Brandon Knight
Ramon Sessions
Darrell Arthur
Nicolas Batum
Iman Shumpert
Gary Harris
Yogi Ferrell
Tyler Ennis
Beno Udrih
Draymond Green
Danuel House Jr.
Al Jefferson
Cole Aldrich
Thaddeus Young
Frank Kaminsky
J.J. Barea
Nate Wolters
Marcelo Huertas
Kevin Seraphin
Manny Harris
Devyn Marble
Chris Andersen
Mitch McGary
Davis Bertans
Cory JosephAron Baynes
Alexey Shved
Charlie Villanueva
Kyle Singler
Kristaps PorzingisDamian Lillard
DeJuan BlairTyson Chandler
Joey Dorsey
Tony Parker
Johnny O'Bryant III
Willy Hernangomez
Kawhi Leonard
Jeff Ayres
Marcus Morris
Bojan Bogdanovic
Zoran DragicJonathon Simmons
Salah Mejri
Nemanja Bjelica
Alan Anderson
Larry Nance Jr.
Chris McCulloughShabazz Napier
Michael Carter-Williams
Norman Powell
Timothe Luwawu-Cabarrot
LaMarcus Aldridge
Sebastian Telfair
Justin AndersonWesley Johnson
Furkan Aldemir
Montrezl Harrell
Brice JohnsonTristan Thompson
Tim Duncan
Nikola Jokic
Zach LaVine
Mo Williams
Robert Covington
Terrence JonesJaylen Brown
Robbie HummelJoakim Noah
Steven AdamsAnthony Bennett
Phil Pressey
Joe Johnson
Arinze Onuaku
Coty Clarke
Chris Paul
Seth CurryMonta Ellis Dwight BuycksChasson Randle
Brandan Wright
Kentavious Caldwell-PopeMaurice Harkless
Jeremy Evans
Nikola Vucevic
Ronnie Price Andre Miller JJ Hickson
Andrew Bogut
Andrew Nicholson
Julius RandleMarkieff Morris
Chandler Parsons
Timofey Mozgov
Jason Richardson
Anderson Varejao
James Ennis III
Manu Ginobili
Jodie Meeks
Ivica Zubac
Jrue Holiday
Richaun Holmes
Luis Montero
Jordan Crawford
Caris LeVert
Pierre Jackson
Nik Stauskas
Darren Collison
Wilson Chandler
Josh Huestis
Patrick Patterson
Damjan RudezJakob Poeltl
Jeff Green
Pau Gasol
Rodney Hood Willie Cauley-Stein
Greg Stiemsma
Rudy Gobert
Jerryd Bayless
Kenneth Faried
Jason ThompsonPaul Zipser
Glenn Robinson III
Trevor Booker
Jonas Jerebko
Marc GasolToney Douglas
Tyus Jones
Richard Jefferson
Archie GoodwinMirza Teletovic
Jordan Hamilton
Thabo Sefolosha
John Jenkins
Landry Fields
Brandon Ingram
Joe Young
Samuel DalembertPero Antic
Lou Amundson
Jeremy LambDwight Powell
Luol Deng
Patty Mills
Anthony Tolliver
Thanasis Antetokounmpo
Rashad Vaughn
Mike Conley
Enes Kanter
Devin Booker
Elfrid Payton
Langston Galloway
D'Angelo Russell
Will Barton
Kostas Papanikolaou
Kevin MartinRyan Kelly
James Michael McAdoo
Kent Bazemore
Al-Farouq AminuPaul Pierce
Josh Smith
Josh McRoberts
James JohnsonBen McLemore Mindaugas Kuzminskas
David Lee
Rudy Gay
George Hill
Jordan Mickey
Jameer NelsonCleanthony Early
Myles TurnerJason Terry
John Salmons
James Anderson
Kyle Lowry
Tayshaun Prince
Gigi DatomeLeandro Barbosa
Nate Robinson
Figure 1: Illustrations of static entity embeddings e.Players with colored letters are listed in the ranking top100 players for the 2016-17 NBA season at https://www.washingtonpost.com/graphics/sports/nba-top-100-players-2016/.Only LeBron James is in red and the other players intop 100 are in blue. Top-ranked players have similarrepresentations of e.
1901). As shown in Figure 1, which is the vi-sualization of static entity embedding e, the top-ranked players are closely located.
We also present the visualizations of dynamicentity embeddings e in Figure 2. Although wedid not carry out feature engineering specific tothe NBA (e.g., whether a player scored doubledigits or not)8 for representing the dynamic en-tity embedding e, the embeddings of the playerswho performed well for each game have similarrepresentations. In addition, the change in embed-dings of the same player was observed dependingon the box-scores for each game. For instance, Le-Bron James recorded a double-double in a gameon April 22, 2016. For this game, his embeddingis located close to the embedding of Kevin Love,who also scored a double-double. However, hedid not participate in the game on December 26,2016. His embedding for this game became closerto those of other players who also did not partici-pate.
6.3 Duplicate ratios of extracted relationsAs Puduppully et al. (2019) pointed out, a gen-erated summary may mention the same relationmultiple times. Such duplicated relations are notfavorable in terms of the brevity of text.
Figure 3 shows the ratios of the generated sum-maries with duplicate mentions of relations in thedevelopment data. While the models by Wisemanet al. (2017) and Puduppully et al. (2019) respec-
8In the NBA, a player who accumulates a double-digitscore in one of five categories (points, rebounds, assists,steals, and blocked shots) in a game, is regarded as a goodplayer. If a player had a double in two of those five cate-gories, it is referred to as double-double.
2109
4 2 0 2 4April 22, 2016
4
2
0
2
4 LeBron James
Kevin Love
Tristan ThompsonJR Smith
Kyrie Irving
Iman Shumpert
Richard JeffersonChanning Frye
Matthew Dellavedova
Dahntay Jones
James Jones
Timofey Mozgov
Mo Williams
Tobias HarrisMarcus MorrisAndre DrummondKentavious Caldwell-Pope
Reggie Jackson
Anthony Tolliver
Steve Blake
Aron BaynesStanley Johnson
Joel Anthony
Spencer Dinwiddie
Darrun Hilliard
Jodie Meeks
4 2 0 2 4 6December 26, 2016
Richard Jefferson
Kevin Love
Tristan ThompsonDeAndre Liggins
Kyrie Irving
Iman ShumpertKay Felder
Mike Dunleavy
Channing FryeJames Jones
Jordan McRae
LeBron JamesJR Smith
Jon LeuerMarcus Morris
Andre DrummondKentavious Caldwell-PopeReggie Jackson
Tobias Harris
Ish Smith
Stanley JohnsonAron Baynes
Darrun HilliardHenry EllensonMichael Gbinije
Beno Udrih
Figure 2: Illustrations of dynamic entity embedding e. Both left and right figures are for Cleveland Cavaliers vs.Detroit Pistons, on different dates. LeBron James is in red letters. Entities with orange symbols appeared onlyin the reference summary. Entities with blue symbols appeared only in the generated summary. Entities withgreen symbols appeared in both the reference and the generated summary. The others are with red symbols. 2
represents player who scored in the double digits, and 3 represents player who recorded double-double. Playerswith4 did not participate in the game. ◦ represents other players.
Wiseman+'17 Pudupully+'19 Proposed0.00
0.25
0.50
0.75
1.00
64.0%
20.2%15.8%
84.2%
12.7%
95.8%
12> 2
Figure 3: Ratios of generated summaries with dupli-cate mention of relations. Each label represents numberof duplicated relations within each document. WhileWiseman et al. (2017)’s model exhibited 36.0% dupli-cation and Puduppully et al. (2019)’s model exhibited15.8%, our model exhibited only 4.2%.
tively showed 36.0% and 15.8% as duplicate ra-tios, our model exhibited 4.2%. This suggests thatour model dramatically suppressed generation ofredundant relations. We speculate that the track-ing model successfully memorized which inputrecords have been selected in hENT
s .
6.4 Qualitative analysis of output examples
Figure 5 shows the generated examples from val-idation inputs with Puduppully et al. (2019)’smodel and our model. Whereas both generationsseem to be fluent, the summary of Puduppullyet al. (2019)’s model includes erroneous relationscolored in orange.
Specifically, the description about DERRICK
ROSE’s relations, “15 points, four assists, threerounds and one steal in 33 minutes.”, is also usedfor other entities (e.g., JOHN HENSON and WILLY
HERNAGOMEZ). This is because Puduppully et al.(2019)’s model has no tracking module unlike our
model, which mitigates redundant references andtherefore rarely contains erroneous relations.
However, when complicated expressions suchas parallel structures are used our model also gen-erates erroneous relations as illustrated by the un-derlined sentences describing the two players whoscored the same points. For example, “11-pointefforts” is correct for COURTNEY LEE but not forDERRICK ROSE. As a future study, it is necessaryto develop a method that can handle such compli-cated relations.
6.5 Use of writer information
We first look at the results of an extension ofPuduppully et al. (2019)’s model with writer in-formation w in Table 4. By adding w to con-tent planning (stage 1), the method obtained im-provements in CS (37.60 to 47.25), CO (16.97to 22.16), and BLEU score (13.96 to 18.18). Byadding w to the component for surface realization(stage 2), the method obtained an improvement inBLEU score (13.96 to 17.81), while the effects onthe other metrics were not very significant. Byadding w to both stages, the method scored thehighest BLEU, while the other metrics were notvery different from those obtained by adding wto stage 1. This result suggests that writer infor-mation contributes to both content planning andsurface realization when it is properly used, andimprovements of content planning lead to muchbetter performance in surface realization.
Our model showed improvements in most met-rics and showed the best performance by incor-
2110
MethodRG CS CO
BLEU# P% P% R% F1% DLD%
Puduppully et al. (2019) 33.06 83.17 33.06 43.59 37.60 16.97 13.96+ w in stage 1 28.43 84.75 45.00 49.73 47.25 22.16 18.18+ w in stage 2 35.06 80.51 31.10 45.28 36.87 16.38 17.81+ w in stage 1 & 2 28.00 82.27 44.37 48.71 46.44 22.41 18.90
PROPOSED 39.05 94.38 35.77 52.05 42.40 19.38 16.15+ w 30.25 92.00 50.75 59.03 54.58 25.75 20.84
Table 4: Effects of writer information. w indicates that WRITER embeddings are used. Numbers in bold are thelargest among the variants of each method.
The Milwaukee Bucks defeated the New York Knicks, 105-104, at Madison Square Garden on Wednesday evening. TheBucks (18-16) have been one of the hottest teams in the league,having won five of their last six games, and they have now wonsix of their last eight games. The Knicks (16-19) have nowwon six of their last six games, as they continue to battle for theeighth and final playoff spot in the Eastern Conference. GiannisAntetokounmpo led the way for Milwaukee, as he tallied 27points, 13 rebounds, four assists, three blocked shots and onesteal, in 39 minutes . Jabari Parker added 15 points, four re-bounds, three assists, one steal and one block, and 6-of-8 fromlong range. John Henson added two points, two rebounds, oneassist, three steals and one block. John Henson was the onlyother player to score in double digits for the Knicks, with 15points, four assists, three rebounds and one steal, in 33 min-utes. The Bucks were led by Derrick Rose, who tallied 15points, four assists, three rebounds and one steal in 33 minutes.Willy Hernangomez started in place of Porzingis and finishedwith 15 points, four assists, three rebounds and one steal in 33minutes. Willy Hernangomez started in place of Jose Calderon( knee ) and responded with one rebound and one block. TheKnicks were led by their starting backcourt of Carmelo An-thony and Carmelo Anthony, but combined for just 13 pointson 5-of-16 shooting. The Bucks next head to Philadelphia totake on the Sixers on Friday night, while the Knicks remainhome to face the Los Angeles Clippers on Wednesday.
(a) Puduppully et al. (2019)
The Milwaukee Bucks defeated the New York Knicks, 105-104, atMadison Square Garden on Saturday. The Bucks (18-16) checked into Saturday’s contest with a well, outscoring the Knicks (16-19) bya margin of 39-19 in the first quarter. However, New York by just a25-foot lead at the end of the first quarter, the Bucks were able to pullaway, as they outscored the Knicks by a 59-46 margin into the second.45 points in the third quarter to seal the win for New York with therest of the starters to seal the win. The Knicks were led by GiannisAntetokounmpo, who tallied a game-high 27 points, to go along with13 rebounds, four assists, three blocks and a steal. The game wasa crucial night for the Bucks’ starting five, as the duo was the mosteffective shooters, as they posted Milwaukee to go on a pair of lowlow-wise (Carmelo Anthony) and Malcolm Brogdon. Anthony added11 rebounds, seven assists and two steals to his team-high scoring total.Jabari Parker was right behind him with 15 points, four rebounds,three assists and a block. Greg Monroe was next with a bench-leading18 points, along with nine rebounds, four assists and three steals.Brogdon posted 12 points, eight assists, six rebounds and a steal.Derrick Rose and Courtney Lee were next with a pair of {11 / 11}-point efforts. Rose also supplied four assists and three rebounds, whileLee complemented his scoring with three assists, a rebound and a steal.John Henson and Mirza Teletovic were next with a pair of {two / two}-point efforts. Teletovic also registered 13 points, and he added a re-bound and an assist. Jason Terry supplied eight points, three reboundsand a pair of steals. The Cavs remain in last place in the EasternConference’s Atlantic Division. They now head home to face theToronto Raptors on Saturday night.
(b) Our model
Table 5: Example summaries generated with Puduppully et al. (2019)’s model (left) and our model (right). Namesin bold face are salient entities. Blue numbers are correct relations derived from input data records but are notobserved in reference summary. Orange numbers are incorrect relations. Green numbers are correct relationsmentioned in reference summary.
porating writer information w. As discussed inSection 4.5, w is supposed to affect both contentplanning and surface realization. Our experimen-tal result is consistent with the discussion.
7 Conclusion
In this research, we proposed a new data-to-textmodel that produces a summary text while track-ing the salient information that imitates a human-writing process. As a result, our model outper-formed the existing models in all evaluation mea-sures. We also explored the effects of incorpo-rating writer information to data-to-text models.With writer information, our model successfully
generated highest quality summaries that scored20.84 points of BLEU score.
Acknowledgments
We would like to thank the anonymous reviewersfor their helpful suggestions. This paper is basedon results obtained from a project commissionedby the New Energy and Industrial Technology De-velopment Organization (NEDO), JST PRESTO(Grant Number JPMJPR1655), and AIST-TokyoTech Real World Big-Data Computation Open In-novation Laboratory (RWBC-OIL).
2111
ReferencesTatsuya Aoki, Akira Miyazawa, Tatsuya Ishigaki, Kei-
ichi Goshima, Kasumi Aoki, Ichiro Kobayashi, Hi-roya Takamura, and Yusuke Miyao. 2018. Gener-ating Market Comments Referring to External Re-sources. In Proceedings of the 11th InternationalConference on Natural Language Generation, pages135–139.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-gio. 2015. Neural machine translation by jointlylearning to align and translate. In Proceedings of theThird International Conference on Learning Repre-sentations.
Regina Barzilay and Mirella Lapata. 2005. Collectivecontent selection for concept-to-text generation. InProceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Lan-guage Processing, pages 331–338.
Antoine Bosselut, Omer Levy, Ari Holtzman, CorinEnnis, Dieter Fox, and Yejin Choi. 2018. SimulatingAction Dynamics with Neural Process Networks. InProceedings of the Sixth International Conferenceon Learning Representations.
David L Chen and Raymond J Mooney. 2008. Learn-ing to sportscast: a test of grounded language ac-quisition. In Proceedings of the 25th internationalconference on Machine learning, pages 128–135.
Kyunghyun Cho, Bart van Merrienboer, Caglar Gul-cehre, Dzmitry Bahdanau, Fethi Bougares, Hol-ger Schwenk, and Yoshua Bengio. 2014. Learn-ing Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Pro-ceedings of the 2014 Conference on Empirical Meth-ods in Natural Language Processing, pages 1724–1734.
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho,and Yoshua Bengio. 2014. Empirical evaluation ofgated recurrent neural networks on sequence model-ing. arXiv preprint arXiv:1412.3555.
Elizabeth Clark, Yangfeng Ji, and Noah A Smith. 2018.Neural Text Generation in Stories Using Entity Rep-resentations as Context. In Proceedings of the 16thConference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies, pages 2250–2260.
Xavier Glorot and Yoshua Bengio. 2010. Understand-ing the difficulty of training deep feedforward neu-ral networks. In Proceedings of the thirteenth in-ternational conference on artificial intelligence andstatistics, pages 249–256.
Alex Graves, Greg Wayne, Malcolm Reynolds,Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwinska, Sergio Gomez Colmenarejo, EdwardGrefenstette, Tiago Ramalho, John Agapiou, et al.2016. Hybrid computing using a neural net-work with dynamic external memory. Nature,538(7626):471.
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OKLi. 2016. Incorporating Copying Mechanism inSequence-to-Sequence Learning. In Proceedings ofthe 54th Annual Meeting of the Association for Com-putational Linguistics, volume 1, pages 1631–1640.
Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati,Bowen Zhou, and Yoshua Bengio. 2016. Pointingthe Unknown Words. In Proceedings of the 54th An-nual Meeting of the Association for ComputationalLinguistics, pages 140–149.
Luong Hoang, Sam Wiseman, and Alexander Rush.2018. Entity Tracking Improves Cloze-style Read-ing Comprehension. In Proceedings of the 2018Conference on Empirical Methods in Natural Lan-guage Processing, pages 1049–1055.
Yangfeng Ji, Chenhao Tan, Sebastian Martschat, YejinChoi, and Noah A Smith. 2017. Dynamic EntityRepresentations in Neural Language Models. InProceedings of the 2017 Conference on EmpiricalMethods in Natural Language Processing, pages1830–1839.
Chloe Kiddon, Luke Zettlemoyer, and Yejin Choi.2016. Globally coherent text generation with neuralchecklist models. In Proceedings of the 2016 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 329–339.
Sosuke Kobayashi, Ran Tian, Naoaki Okazaki, andKentaro Inui. 2016. Dynamic entity representationwith max-pooling improves machine reading. InProceedings of the 15th Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,pages 850–855.
Remi Lebret, David Grangier, and Michael Auli. 2016.Neural Text Generation from Structured Data withApplication to the Biography Domain. In Proceed-ings of the 2016 Conference on Empirical Methodsin Natural Language Processing, pages 1203–1213.
Percy Liang, Michael I Jordan, and Dan Klein. 2009.Learning semantic correspondences with less super-vision. In Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th In-ternational Joint Conference on Natural LanguageProcessing of the AFNLP, pages 91–99.
Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang,and Zhifang Sui. 2018. Table-to-text Generation byStructure-aware Seq2seq Learning. In Proceedingsof the Thirty-Second AAAI Conference on ArtificialIntelligence.
Thang Luong, Hieu Pham, and Christopher D Man-ning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedingsof the 2015 Conference on Empirical Methods inNatural Language Processing, pages 1412–1421.
2112
Hongyuan Mei, Mohit Bansal, and Matthew R Walter.2016. What to talk about and how? Selective Gener-ation using LSTMs with Coarse-to-Fine Alignment.In Proceedings of the 15th Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,pages 720–730.
Soichiro Murakami, Akihiko Watanabe, AkiraMiyazawa, Keiichi Goshima, Toshihiko Yanase, Hi-roya Takamura, and Yusuke Miyao. 2017. Learningto generate market comments from stock prices.In Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics, pages1374–1384.
Graham Neubig, Chris Dyer, Yoav Goldberg, AustinMatthews, Waleed Ammar, Antonios Anastasopou-los, Miguel Ballesteros, David Chiang, DanielClothiaux, Trevor Cohn, et al. 2017. Dynet: Thedynamic neural network toolkit. arXiv preprintarXiv:1701.03980.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automaticevaluation of machine translation. In Proceedingsof the 40th annual meeting on association for com-putational linguistics, pages 311–318.
Karl Pearson. 1901. On lines and planes of closest fit tosystems of points in space. The London, Edinburgh,and Dublin Philosophical Magazine and Journal ofScience, 2(11):559–572.
Ratish Puduppully, Li Dong, and Mirella Lapata. 2019.Data-to-Text Generation with Content Selection andPlanning. In Proceedings of the Thirty-Third AAAIConference on Artificial Intelligence.
Sashank J Reddi, Satyen Kale, and Sanjiv Kumar.2018. On the convergence of adam and beyond. InProceedings of the Sixth International Conferenceon Learning Representations.
Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, SujianLi, Baobao Chang, and Zhifang Sui. 2018. Order-planning neural text generation from structured data.In Proceedings of the Thirty-Second AAAI Confer-ence on Artificial Intelligence.
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al.2015. End-to-end memory networks. In Advancesin neural information processing systems, pages2440–2448.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.Sequence to sequence learning with neural net-works. In Advances in neural information process-ing systems, pages 3104–3112.
Kumiko Tanaka-Ishii, Koiti Hasida, and Itsuki Noda.1998. Reactive content selection in the generationof real-time soccer commentary. In Proceedings ofthe 36th Annual Meeting of the Association for Com-putational Linguistics and 17th International Con-ference on Computational Linguistics, pages 1282–1288.
Yasufumi Taniguchi, Yukun Feng, Hiroya Takamura,and Manabu Okumura. 2019. Generating LiveSoccer-Match Commentary from Play Data. In Pro-ceedings of the Thirty-Third AAAI Conference onArtificial Intelligence.
Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu,and Hang Li. 2017. Context gates for neural ma-chine translation. Transactions of the Associationfor Computational Linguistics, 5:87–99.
Jason Weston, Sumit Chopra, and Antoine Bordes.2015. Memory Networks. In Proceedings of theThird International Conference on Learning Repre-sentations.
Sam Wiseman, Stuart Shieber, and Alexander Rush.2017. Challenges in Data-to-Document Generation.In Proceedings of the 2017 Conference on Empiri-cal Methods in Natural Language Processing, pages2253–2263.
Zichao Yang, Phil Blunsom, Chris Dyer, and WangLing. 2017. Reference-Aware Language Models.In Proceedings of the 2017 Conference on Empiri-cal Methods in Natural Language Processing, pages1850–1859.
A Algorithm
The generation process of our model is shownin Algorithm 1. For a concise description, weomit the condition for each probability notation.<SOD> and <EOD> represent “start of the doc-ument” and “end of the document”, respectively.
B Experimental settings
We set the dimensions of the embeddings to 128,and those of the hidden state of RNN to 512 and allof parameters are initialized with the Xavier ini-tialization (Glorot and Bengio, 2010). We set themaximum number of epochs to 30, and choose themodel with the highest BLEU score on the devel-opment data. The initial learning rate is 2e-3 andAMSGrad is also used for automatically adjustingthe learning rate (Reddi et al., 2018). Our imple-mentation uses DyNet (Neubig et al., 2017).
2113
Algorithm 1: Generation processInput: Data records s,Annotations Z1:T , E1:T , A1:T , N1:T
1 Initialize {re,a,v}r∈x, {e}e∈E , hLM0 , hENT
0
2 t← 03 et, yt ← NONE, < SOD >4 while yt 6=< EOD > do5 t← t+ 16 if p(Zt = 1) ≥ 0.5 then
/* Select the entity */7 et ← arg max p(Et = e′t)8 if et 6∈ Et−1 then
/* If et is a new entity */
9 hENT′t ← GRUE(et,h
ENTt−1)
10 Et ← Et−1 ∪ {et}11 else if et 6= et−1 then
/* If et has been observed before, but is different fromthe previous one. */
12 hENT’t ← GRUE(W ShENT
s ,hENTt−1),
13 where s = max{s : s ≤ t− 1, e = es}14 else15 hENT’
t ← hENTt−1
/* Select an attribute for the entity, et. */16 at ← arg max p(At = a′t)
17 hENTt ← GRUA(ret,at,x[et,at],h
ENT′t )
18 if at is a number attribute then19 if p(Nt = 1) ≥ 0.5 then20 yt ← numeral of x[et, at]21 else22 yt ← x[et, at]23 end24 else25 yt ← x[et, at]
26 h′t ← tanh(WH(hLM
t−1 ⊕ hENTt )
)27 hLM
t ← LSTM(yt ⊕ h′t,hLMt−1)
28 else29 et, at,h
ENTt ← et−1, at−1,h
ENTt−1
30 h′t ← tanh(WH(hLM
t−1 ⊕ hENTt )
)31 yt ← arg max p(Yt)
32 hLMt ← LSTM(yt ⊕ h′t,h
LMt−1)
33 end34 if yt is “.” then35 hENT
t ← GRUA(vREFRESH,hENTt )
36 end37 return y1:t−1;