meta-prod2vec: simple product embeddings with side-information

50
MetaProd2Vec: Simple Product Embeddings with SideInforma:on Flavian Vasile, Elena Smirnova @Criteo Alexis Conneau @FAIR

Upload: recsysfr

Post on 14-Jan-2017

422 views

Category:

Internet


2 download

TRANSCRIPT

Page 1: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

   

Meta-­‐Prod2Vec:  Simple  Product  Embeddings  with  Side-­‐Informa:on      

       

Flavian  Vasile,  Elena  Smirnova  @Criteo  Alexis  Conneau  @FAIR  

Page 2: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Contents  

•  Product  Embeddings  for  Recommenda:on    

•  Embedding  CF  signal:  Word2Vec  and  Prod2Vec  

•  Meta-­‐Prod2vec:  Embedding  with  Side-­‐Informa:on    

•  Experimental  Results  

•  Conclusions  

Page 3: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Product  Embeddings  for  Recommenda5on      

Page 4: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Product  Embeddings  for  Recommenda5on      Represent  items  (and  some/mes  users)  as  vectors  in  the  same  space  and  use  their  distances  to  compute  recommenda/ons.  

Page 5: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

•  At  a  certain  level,  nothing  new!  

•  We  already  had  Matrix  Factoriza/on  

•  It  is  yet  another  way  of  crea/ng  latent  representa/ons  for  Recommenda/on  

Page 6: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Some  of  the  NN  methods  can  be  translated  back  into  MF  techniques.      Differences:  •  new  ways  to  compute  matrix  entries    •  new  loss  func/ons  

Page 7: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Where  do  we  fit?  •  Hybrid  model  that  uses  CF  with  content  

side-­‐informa/on  •  Incursion  on  the  embedding  methods  

using  side  info  

Page 8: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Embedding  CF  signal:    Word2Vec  and  Prod2Vec    

Page 9: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

(Word-­‐to-­‐hidden  matrix)  x  (Hidden-­‐to-­‐Word  context  matrix)    

 Word2Vec:  Skip-­‐gram  

Page 10: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Word2Vec  In  this  space,  words  that  appear  in  similar  contexts  will  tend  to  be  close:    

Page 11: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 The  same  idea  can  be  applied  to  other  sequen:al  data,  such  as  user  shopping  sessions  -­‐  Prod2vec.    

Page 12: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Words  =  products  Sentences  =  shopping  sessions  

Grbovic  et  al.  E-­‐commerce  in  Your  Inbox:  Product  RecommendaBons  at  Scale,  WWW  2013  

 Prod2Vec  

Page 13: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Prod2Vec  The  resul:ng  embedding  will  co-­‐locate  products  that  appear  in  the  vicinity  of  the  same  products.    

Page 14: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Prod2Vec  loss  func5on        

Page 15: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Meta-­‐Prod2vec:  Embedding  with  Side-­‐Informa5on      

Page 16: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Meta-­‐Prod2vec:  Embedding  with  Side-­‐Informa5on      Idea:  Use  not  only  the  product  sequence  informa:on,  but  also  product  meta-­‐data.    

Page 17: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Where  is  it  useful?  Product  cold-­‐start,  when  sequence  informa:on  is  sparse.  

Page 18: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

How  can  it  help?  We  place  addi:onal  constraints  on  product  co-­‐occurrences  using  external  info.    We  can  create  more  noise-­‐robust  embeddings  for  products  suffering  from  cold-­‐start.  

Page 19: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Type  of  product  side-­‐informa5on:    •  Categories  •  Brands  •  Title  &  Descrip:on  •  Tags  

Page 20: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

How  does  Meta-­‐Prod2Vec  leverage  this  informa5on  for  cold-­‐start?  

Page 21: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Mo5va5ng  example:  

Page 22: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Let’s  say  we  are  trying  to  build  a  recommender  system  for  songs...    

Page 23: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

We  want  to  build  a  very  simple  solu5on  that  based  on  the  last  song  the  user  heard,  recommends  the  next  song.    

Page 24: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Two  different  recommenda:on  situa:ons:    •  Simple:  the  previous  song  is  popular  •  Hard  one:  the  previous  song  is  

rela:vely  unknown  (suffers  from  cold  start).    

Page 25: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Simple  case:              Query  song:  Shake  It  Off  by  Taylor  SwiL.    Best  next  song:  It’s  all  about  the  Bass  by  Meghan  Trainor.      CF  and  Prod2Vec  both  work!  

Page 26: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Hard  case:              Query  song:  S/ll  by  Taylor  SwiL,  but  is  one  of  her  earlier  songs,  e.g.  You’re  Not  Sorry.    Best  next  song:  ?  

   ?  

Page 27: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Hard  case  +  unlucky:    •  Just  one  user  listened  to  You’re  Not  Sorry  •  He  also  listened  to  Rammstein’s  Du  Hast!  

Page 28: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Hard  case  +  unlucky:              

Your  Recommenda5on  Is  Not  Working!  

Page 29: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

This  is  where  Meta-­‐Prod2Vec  comes  in  handy!  

Page 30: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

When  compu:ng  how  plausible  it  is  for  a  user  to  like  a  pair  of  songs,  you  can  place  addi5onal  constraints  by  taking  into  account  the  song  ar5sts.    

Page 31: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Prod2Vec  constraints    

             You’re  not  sorry   Du  Hast  

P(Du  Hast|Youʹ′re  Not  Sorry)  -­‐>  the  next  song  depends  on  the  current  song  

Page 32: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Prod2Vec  constraints    

             You’re  not  sorry   Du  Hast  

Youʹ′re  Not  Sorry  is  a  fringe  song  -­‐>  low  evidence  for  the  posi/ve  and  nega/ve  pairs    

Page 33: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Ar5st  metadata  constraints    

             You’re  not  sorry   Du  Hast  

Taylor  SwiU   Rammstein  However,  the  associated  singer  is  popular  -­‐>  good  evidence  that  Taylor  SwiL  and  Rammstein  do  not  really  co-­‐occur  (have  distant  vectors)    

Page 34: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Ar5st  and  Song  constraints  (1)    

             You’re  not  sorry   Du  Hast  

Taylor  SwiU   Rammstein  Furthermore,  we  can  enforce  that  the  songs  and  their  ar5sts  should  be  close...    

Page 35: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Ar5st  and  Song  constraints  (2)    

         You’re  not  sorry   Du  Hast  

Taylor  SwiU   Rammstein  Finally,  we  add  two  more  constraints  between  the  ar/sts  and  the  previous/next  song  (they  s/ll  have  more  support  than  the  original  pairs)  

Page 36: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Meta-­‐Prod2Vec  constraints    

             You’re  not  sorry   Du  Hast  

Taylor  SwiU   Rammstein  

#1.  P(Rammstein  |  Youʹ′re  Not  Sorry)  the  ar/st  of  the  next  song  should  be  plausible    given  the  current  song    #2.  P(Du  Hast  |  Taylor  SwiW)  the  next  song  should  depend  on  the    current  ar/st  selec/on    #3.  P(Youʹ′re  Not  Sorry  |Taylor  SwiW)    and  P(Du  Hast  |  Rammstein)    the  current  ar/st  selec/on  should  also  influence    the  current  song  selec/on    #4.  P(Rammstein  |  Taylor  SwiW)  the  probability  of  the  next  ar/st  should    be  high  given  the  current  ar/st.    

Page 37: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

PuXng  it  all  together:    Meta-­‐Prod2Vec  loss        

Page 38: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Rela5onship  with  MF  with  Side-­‐Info:    

Page 39: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

MP2V  Implementa5on    •  No  changes  in  the  Word2Vec  code!  

•  Changes  just  in  the  input  pairs:  we  generate  (propor:onally  to  the  importance  hyperparameter)  4  addi:onal  types  of  pairs.  

Page 40: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Experimental  Results    

Page 41: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Task  &  Metrics    Task:  Next  Event  Predic:on    Metrics:  •  Hit  ra:o  at  K  (HR@K)    •  Normalized  Discounted  Cumula:ve  Gain  (NDCG@K)    

Page 42: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Methods    BestOf:  (rank  by)  popularity  CoCounts:  cosine  similarity  of  candidate  item  to  query  item    Prod2Vec:  cosine  similarity  of  item  embedding  vectors  Meta-­‐Prod2Vec:  cosine  similarity  of  improved  embedding  vectors  Mix(Prod2Vec,  CoCounts):  linear  combina:on  of  the  two  scores  Mix(Meta-­‐Prod2Vec,  CoCounts):  same  as  previous  

Page 43: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Dataset:  30Music  Dataset    •  playlists  data  from  Last.fm  API  •  sample  of  100k  user  sessions    •  resul:ng  vocabulary  size:  433k  songs  

and  67k  ar:sts.      

Page 44: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Global  Results    

Method   Type   HR@20   NDCG@20  BestOf   Head   0.0003   0.002  

CoCounts   Head   0.0160   0.141  

Prod2Vec   Tail   0.0101   0.113  

MetaProd2Vec   Tail   0.0124   0.125  

Mix(Prod2Vec,  CoCounts)   Global   0.0158   0.152  

Mix(MetaProd2Vec,  CoCounts)   Global   0.0180   0.161  

Page 45: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Results  on  Cold  Start  (HR@20)       Method   Type   Pair  freq  =  0   Pair  freq  <  3  BestOf   Head   0.0002   0.0002  

CoCounts   Head   0.0000   0.0197  

Prod2Vec   Tail   0.0003   0.0078  

MetaProd2Vec   Tail   0.0013   0.0198  

Mix(Prod2Vec,  CoCounts)   Global   0.0002   0.0200  

Mix(MetaProd2Vec,  CoCounts)   Global   0.0007   0.0291  

Page 46: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Conclusions  and    Next  Steps  

Page 47: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Conclusions  and    Next  Steps    Using  side-­‐info  for  product  embeddings  helps,  especially  on  cold-­‐start.    

Page 48: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Conclusions  and    Next  Steps  •  Beeer  ways  to  mix  Head  and  Tail  

recommenda:on  methods  

•  Mix  CF  and  Meta-­‐Data  at  test  :me    -­‐  product  embeddings  using  all  available  signal  (CF,  categorical,  text  and  image  product  informa:on)  

Page 49: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Thanks!    

Page 50: Meta-Prod2Vec: Simple Product Embeddings with Side-Information

 Ques5ons?