cs146hw2

Upload: jonathan-adam

Post on 08-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 CS146hw2

    1/3

    Homework 2: MTCS 146 – Spring 2016

    March 1, 2016

    Exercise 2.1

    The alignment is  <  2, 1, 4, 5, 3 >.

    Exercise 2.2

    We start of setting  τ e,f  = 1 for all English words   e , French words   f . Also, we start off thealgorithm saying  ne,f  = 0 for all  e  and   f .First iteration.  The first iteration is easy to compute. From the first sentence:

     pmange =  τ She,mange + τ eats,mange + τ bread,mange = 1 + 1 + 1 = 3 (1)

    With this we add   13

      to neats,mange, which started off at 0. So right now it is at  1

    3.

    The second sentence has a remarkably similar calculation:

     pmange  =  τ He,mange + τ eats,mange + τ soup,mange = 1 + 1 + 1 = 3 (2)

    Then again we need to add   13

      to  neats,mange. So we have neats,mange =  2

    3. Now at this point,

    for all  e, f   where  e   and   f  are English and French words in our corpus respectively,  ne,f   =  1

    3

    - with the exceptions of  neats,mange =  neats,du =  2

    3.

    We should note that

    neats,◦ =  neats,elle + neats,mange + neats,du + neats,pain + neats,il + neats,boeuf    (3)

    . In the maximization step, we then set  τ eats,mange =

      neats,mange

    neats,◦ =

    2

    3

    8

    3 =

      1

    4 .So, at the end of the first iteration,  τ eats,mange =

      1

    4  and  neats,mange =

      2

    3.

    Second iteration.  First sentence:

     pmange  =  τ She,mange + τ eats,mange + τ bread,mange = 1

    8 +

     1

    4 +

     1

    8 =

     1

    2  (4)

    So now, we add to neats,mange+ =1

    4

    1

    2

    =   12

    The second sentence:

     pmange =  τ He,mange  + τ eats,mange + τ soup,mange = 1

    8 +

     1

    4 +

     1

    8 =

     1

    2  (5)

    So now, we add to neats,mange+ =1

    4

    1

    2

    =   12

    .  neats,mange is now at 1.

    What about  τ ? Well, as  neats,mange is now at 1, τ   is just  1

    neats,◦.

    neats,◦ =  neats,elle + neats,mange + neats,du + neats,pain + neats,il + neats,boeuf    (6)

    1

  • 8/19/2019 CS146hw2

    2/3

    Homework 2: MTCS 146 – Spring 2016

    March 1, 2016

    = 1

    3 + 1 + 1 +

     1

    3 +

     1

    3 +

     1

    3  (7)

    = 10

    3   (8)

    Then  τ eats,mange  is the reciprocal of this, so  3

    10.

    In summary:

    1. After the first iteration, τ eats,mange =  1

    4  and  neats,mange =

      2

    3.

    2. After the second iteration, τ eats,mange =  3

    10  and  neats,mange = 1.

    Exercise 2.3

    In using the IBM Model 1, we make the assumption that each French word corresponds toexactly one English word. This is not a one-to-one mapping: an alignment does not requireeach English word to correspond to a French word.   ne,◦   is the number of times that anEnglish word is aligned to any French word, but if there is not a one-to-one correspondencewe cannot guarantee that this is equal to the number of times  e  occurs in the corpus.

    Exercise 2.4

     adheres to the definition of an alignment and is therefore an entirely legal align-ment. An alignment of this sort could occur with a particular expression or idiom that makesno sense in a literal translation.For example, in one section of  Ulysses  Leopold Bloom finds an old potato in his pocket ashe’s closing his front door and realizes he left his keys inside. In the stream-of-consciousnessnarrative, this sentence becomes ”Keys: not here. Potato I have.” An Argentinian translatorof the novel interpreted the ”potato” sentence as a turn of phrase (not as a literal potato)and translated it with the phrase ”Lost my carrots,” meaning Bloom was starting to lose hisgrip of reality. Here the colloquial Argentinian expression would have a  <  0, 0, 0 >  alignment

    with the original English sentence (except if you’re willing to take some liberties when itcomes to root vegetables.

    Exercise 2.5

    If all initial values for  τ e,f  are the same, it is easily shown that the subsequent iterations of the EM-algorithm are not functions of this initial value.

    2

  • 8/19/2019 CS146hw2

    3/3

    Homework 2: MTCS 146 – Spring 2016

    March 1, 2016

    Where do our initial values of  τ   get used? Say we assign our  τ e,f   to   k   for all   e, f , wherek   is a positive integer. We see that for any   m   pm   is the summation of these values of   τ ,

    so  ck 

      (where  c 

      is a positive integer). Now, for a particular  ne,f , we will throughout thealgorithm increment by   τ e,f /pf   which, of course (as all our   τ   values have been initializedequally,   k

    ck =   1

    c.

    Concretely, this means that our actual initial values for   τ   have cancelled out and all ourarithmetic in assigning the new  τ  values is no longer based in  k . Hence, our future values of τ  are independent of this initial value.We do see that the values of   pk   are dependent on these initial assignments of   τ   (in theexplanation above, we denoted it as  ck .) By extension, the likelihood  Ld(Φ) of the data willalso be affected by this initial assignment.

    Exercise 2.6

    The word  ’the’  will align most with  ’pain’   simply because  ’the’  is the most common word inthe English language. Most sentences include at least one or two instances of the article, sothere will just be more alignments where it aligns with  ’mange’  than any other word.However, the fact that   ’pain’   aligns often with   ’the’   is not a fact exclusive to a particularFrench word.   τ  is calculated as a proportion, not as an absolute count of alignment incidences.’the’  aligns with so many different words that for any   f   τ the,f  will be comparatively low.

    Exercise 2.7

    We can mathematically illustrate the difference between these two equations by invokingequation (2.14) from the textbook:

    P (Ak  =  j |e, f k) = P (f k|Ak  =  j, e)

    P (f k|e)  · P (Ak  =  j |e) (9)

    This equation establishes the relationship between the two probabilities compared and con-trasted in this question. What this equation shows is that knowledge of the French sentencecomes along with an ability to weight certain alignments more favorably than others. (Thisis shown by the way the right hand side splits into the factor conditioned on  e  and the fac-

    tor regarding the probabilities of  f k. When the French sentence is known, for example, wecould weight alignment probabilities based on length, thereby making alignments unequal inprobability.Lacking this information, though, as in equation (2.23), we can not discriminate betweenvarious French sentences and by the IBM model must assign them all the same probability.Therefore, each particular alignment, solely conditioned by the English sentence, is equallylikely.

    3