cs146hw2
TRANSCRIPT
-
8/19/2019 CS146hw2
1/3
Homework 2: MTCS 146 – Spring 2016
March 1, 2016
Exercise 2.1
The alignment is < 2, 1, 4, 5, 3 >.
Exercise 2.2
We start of setting τ e,f = 1 for all English words e , French words f . Also, we start off thealgorithm saying ne,f = 0 for all e and f .First iteration. The first iteration is easy to compute. From the first sentence:
pmange = τ She,mange + τ eats,mange + τ bread,mange = 1 + 1 + 1 = 3 (1)
With this we add 13
to neats,mange, which started off at 0. So right now it is at 1
3.
The second sentence has a remarkably similar calculation:
pmange = τ He,mange + τ eats,mange + τ soup,mange = 1 + 1 + 1 = 3 (2)
Then again we need to add 13
to neats,mange. So we have neats,mange = 2
3. Now at this point,
for all e, f where e and f are English and French words in our corpus respectively, ne,f = 1
3
- with the exceptions of neats,mange = neats,du = 2
3.
We should note that
neats,◦ = neats,elle + neats,mange + neats,du + neats,pain + neats,il + neats,boeuf (3)
. In the maximization step, we then set τ eats,mange =
neats,mange
neats,◦ =
2
3
8
3 =
1
4 .So, at the end of the first iteration, τ eats,mange =
1
4 and neats,mange =
2
3.
Second iteration. First sentence:
pmange = τ She,mange + τ eats,mange + τ bread,mange = 1
8 +
1
4 +
1
8 =
1
2 (4)
So now, we add to neats,mange+ =1
4
1
2
= 12
The second sentence:
pmange = τ He,mange + τ eats,mange + τ soup,mange = 1
8 +
1
4 +
1
8 =
1
2 (5)
So now, we add to neats,mange+ =1
4
1
2
= 12
. neats,mange is now at 1.
What about τ ? Well, as neats,mange is now at 1, τ is just 1
neats,◦.
neats,◦ = neats,elle + neats,mange + neats,du + neats,pain + neats,il + neats,boeuf (6)
1
-
8/19/2019 CS146hw2
2/3
Homework 2: MTCS 146 – Spring 2016
March 1, 2016
= 1
3 + 1 + 1 +
1
3 +
1
3 +
1
3 (7)
= 10
3 (8)
Then τ eats,mange is the reciprocal of this, so 3
10.
In summary:
1. After the first iteration, τ eats,mange = 1
4 and neats,mange =
2
3.
2. After the second iteration, τ eats,mange = 3
10 and neats,mange = 1.
Exercise 2.3
In using the IBM Model 1, we make the assumption that each French word corresponds toexactly one English word. This is not a one-to-one mapping: an alignment does not requireeach English word to correspond to a French word. ne,◦ is the number of times that anEnglish word is aligned to any French word, but if there is not a one-to-one correspondencewe cannot guarantee that this is equal to the number of times e occurs in the corpus.
Exercise 2.4
adheres to the definition of an alignment and is therefore an entirely legal align-ment. An alignment of this sort could occur with a particular expression or idiom that makesno sense in a literal translation.For example, in one section of Ulysses Leopold Bloom finds an old potato in his pocket ashe’s closing his front door and realizes he left his keys inside. In the stream-of-consciousnessnarrative, this sentence becomes ”Keys: not here. Potato I have.” An Argentinian translatorof the novel interpreted the ”potato” sentence as a turn of phrase (not as a literal potato)and translated it with the phrase ”Lost my carrots,” meaning Bloom was starting to lose hisgrip of reality. Here the colloquial Argentinian expression would have a < 0, 0, 0 > alignment
with the original English sentence (except if you’re willing to take some liberties when itcomes to root vegetables.
Exercise 2.5
If all initial values for τ e,f are the same, it is easily shown that the subsequent iterations of the EM-algorithm are not functions of this initial value.
2
-
8/19/2019 CS146hw2
3/3
Homework 2: MTCS 146 – Spring 2016
March 1, 2016
Where do our initial values of τ get used? Say we assign our τ e,f to k for all e, f , wherek is a positive integer. We see that for any m pm is the summation of these values of τ ,
so ck
(where c
is a positive integer). Now, for a particular ne,f , we will throughout thealgorithm increment by τ e,f /pf which, of course (as all our τ values have been initializedequally, k
ck = 1
c.
Concretely, this means that our actual initial values for τ have cancelled out and all ourarithmetic in assigning the new τ values is no longer based in k . Hence, our future values of τ are independent of this initial value.We do see that the values of pk are dependent on these initial assignments of τ (in theexplanation above, we denoted it as ck .) By extension, the likelihood Ld(Φ) of the data willalso be affected by this initial assignment.
Exercise 2.6
The word ’the’ will align most with ’pain’ simply because ’the’ is the most common word inthe English language. Most sentences include at least one or two instances of the article, sothere will just be more alignments where it aligns with ’mange’ than any other word.However, the fact that ’pain’ aligns often with ’the’ is not a fact exclusive to a particularFrench word. τ is calculated as a proportion, not as an absolute count of alignment incidences.’the’ aligns with so many different words that for any f τ the,f will be comparatively low.
Exercise 2.7
We can mathematically illustrate the difference between these two equations by invokingequation (2.14) from the textbook:
P (Ak = j |e, f k) = P (f k|Ak = j, e)
P (f k|e) · P (Ak = j |e) (9)
This equation establishes the relationship between the two probabilities compared and con-trasted in this question. What this equation shows is that knowledge of the French sentencecomes along with an ability to weight certain alignments more favorably than others. (Thisis shown by the way the right hand side splits into the factor conditioned on e and the fac-
tor regarding the probabilities of f k. When the French sentence is known, for example, wecould weight alignment probabilities based on length, thereby making alignments unequal inprobability.Lacking this information, though, as in equation (2.23), we can not discriminate betweenvarious French sentences and by the IBM model must assign them all the same probability.Therefore, each particular alignment, solely conditioned by the English sentence, is equallylikely.
3