personalized next-song recommendation in online karaokes(recsys 2013)

Personalized next-song recommendationin online karaokes

Xiang Wu, et al.

Recsys2013 読み会@y_benjo

流れ

• 問題設定

• 手法

• 評価

• 感想

問題設定

•カラオケでユーザが次に何を歌うかを当てたい• 既存研究• music recommendationはユーザのratingを当てる

• 「次に何を歌うか?」とは問題設定がそもそも異なる

?

手法

• Personalized Markov Embedding(PME)• 曲同士の近さ(short term)

• 曲とユーザの近さ(long term)

• の2つを学習する

•学習にはMetric Embeddingを使う

when the cheerful “Uptown Girl” is just sung, even if you do likethese two songs. Thus, besides capturing the general (long-term)preference of each user, it is also important to model the sequentialnature of singing lists for the user’s contextual (short-term) prefer-ence[13] when making personalized next-song recommendations.

To address the above challenges, in this paper, we extend theexisting Logistic Markov Embedding (LME)[5] algorithm and pro-pose Personalized Markov Embedding (PME), a next-song recom-mendation strategy for online karaoke users. Specifically, we firstembed songs and users into a Euclidean space in which distancesbetween songs and users reflect the strength of their relationship-s. This embedding could e!ciently combine users’ long-term andshort-term preferences together. Then, given each user’s last song,we can generate recommendations by ranking the candidate songsaccording to the embedding. Moreover, our PME can be trainedwithout any content information of songs, namely just with the in-teraction history of users’ singing. Finally, we perform an experi-mental evaluation on a real world data set, and the results demon-strate that the PME algorithm outperforms several state-of-the-arts,including the non-personalized LME algorithm.

2. NEXT-SONG RECOMMENDATION2.1 Problem Formalization

Given the songs that have been performed previously by eachkaraoke user, our goal is to recommend a ranked list of songs toa given user for the current context (the last song). We formal-ize this problem as follows. Let S =

!s1, s2, · · · , s|S |

"be the song

set and U =!u1, u2, · · · , u|U |

"be the user set. The neighbouring

song records, which are usually similar, performed by the same us-er during short time intervals form a session. For instance, the j-thsession of user u is represented by pu, j =

#p(1)

u, j, p(2)u, j, · · · , p

(|pu, j|)u, j

$

where p(k)u, j is the k-th song in session pu, j. All sessions produced

by user u is pu =%pu,1, pu,2, · · · , pu,|pu |

&. Thus, the entire data set

can be represented as D = {(u, pu) |u ! U }. In other words, givenD, the current user u and the last song s performed by user u, wewant to train a model to generate a candidate song list for user uto choose from for his/her next performance. Along this line, weshould estimate the transition probabilities between songs for eachuser (session) by measuring user preferences.

2.2 Personalized Markov EmbeddingIn this subsection, we describe the way to simultaneously mea-

sure users’ long-term and short-term preferences and the relation-ships between users and songs by PME. Thus, the next-song rec-ommendation list can be generated naturally.

To this end, we map songs and users into points in a Rd space,and use vector x (s) and vector y (u) to denote song s’s and user u’scoordinates in this space, respectively. The Euclidean distance be-tween two points reflects the relation between corresponding user-s/songs. If two songs stay apart from each other, it is not likely thatthey will show up in the same session. Also, if a user stays close toa song, he/she may often sing it. Worth noting that the relations areasymmetric, and this can be inferred from the later illustration.

First, we assume the probability Pr (sb|sa, u) of a transition fromsong sa to song sb made by user u is related to the Euclidean dis-tances "x (sa) # x (sb)"2 and "y (u) # x (sb)"2, which can be viewedas user u’s short-term and long-term preferences. Straightforward-ly, it can be described as:

Pr (sb |sa, u) $ e#"x(sa)#x(sb)"22#"y(u)#x(sb)"22 (1)

Note that Pr (sb|sa) is proportional to e#"x(sa)#x(sb)"22 in LME[5].Thus, the user information is ignored by LME, while being consid-ered by our PME.

Since'|S |

b=1 Pr (sb|sa, u) = 1, we add a denominator to the expo-nential value for normalization, and Equation (1) becomes:

Pr (sb |sa, u) =e#"x(sa)#x(sb)"22#"y(u)#x(sb)"22'

s!S e#"x(sa)#x(s)"22#"y(u)#x(s)"22(2)

And now, we can get the coordinate mappings through a likelihoodmaximization approach:

(X,Y) = arg maxX,Y

(

(u,pu)!D

|pu |(

j=1

|pu, j |(

k=2Pr#p(k)

u, j

))))p(k#1)u, j , u

$(3)

where the song-mapping matrix X =*x (s1) , x (s2) , · · · , x +s|S |

,-

and the user-mapping matrix Y =*y (u1) , y (u2) , · · · , y +u|U |

,-.

Then, we could transform Equation (3) into its equivalent formby applying the ln function:

(X,Y) = arg maxX,Y

.

(u,pu)!D

|pu |.

j=1

|pu, j |.

k=2ln Pr

#p(k)

u, j

))))p(k#1)u, j , u

$

= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb ln Pr (sb |sa, u )

= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# "y (u) # x (sb)"22 # ln.

s!Se#"x(sa)#x(s)"22#"y(u)#x(s)"22

23333334

def= arg max

X,YL1 (D |X,Y )

(4)

where cu,sa ,sb is the number of occurrence of song sb after song sa byuser u in the whole data set D. While both !L1(D|X,Y )

!x(i) and !L1(D|X,Y )!y(i)

are non-convex, we find that gradient descent algorithm can stillfind proper solutions. However, both of the partial derivatives areso complex that the time complexity of a single iteration is as highas O%|U | |S |2

&even after optimization.

Then, to overcome the time-consuming problem, we propose E-quation (5) to simulate Equation (2). In this way, the two types ofEuclidean distances can be decoupled:

Pr (sb |sa) Pr (sb |u) =e#"x(sa)#x(sb)"22'

s!S e#"x(sa)#x(s)"22

e#"y(u)#x(sb)"22'

s!S e#"y(u)#x(s)"22(5)

where Pr (sb|sa) is the transition probability from song sa to song sband Pr (sb|u) is the probability of user u singing song sb. Note thatEquation (5) is not simply an assembled model, since all parameterswill be trained simultaneously.

Following a similar process of Equation (3) and Equation (4), wecould get:

(X,Y) = arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb ln Pr (sb |sa) Pr (sb |u)

= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# ln.

s!Se#"x(sa)#x(s)"22 # "y (u) # x (sb)"22

# ln.

s!Se#"y(u)#x(s)"22

23333334

def= arg max

X,YL2 (D |X,Y )

(6)

It can be found that the time complexity of one iteration has de-creased to O (|U | |S |), if

's!S e#"x(sa)#x(s)"22 and

's!S e#"y(u)#x(s)"22 are

cached for approximate calculation.

曲同士曲とユーザ次に選ぶ曲

Metric Embeddingとは?

• 大雑把に言うと，• 近いものは近く• 遠いものは遠く• に配置されるような空間(に写像する関数)を学習する

• ※metric learningは距離関数を学習するので少し違う

それを踏まえて式を見ていく

•学習するのはxとy• R^d次元の曲/曲空間と曲/ユーザ空間を考えたい

• x: 曲をR^d次元の空間に写像する関数

• y: 曲をR^d次元の空間に写像する関数





!s1, s2, · · · , s|S |

"be the song

set and U =!u1, u2, · · · , u|U |



#p(1)

u, j, p(2)u, j, · · · , p

(|pu, j|)u, j

$











Since'|S |



s!S e#"x(sa)#x(s)"22#"y(u)#x(s)"22(2)


(X,Y) = arg maxX,Y

(

(u,pu)!D

|pu |(

j=1

|pu, j |(

k=2Pr#p(k)

u, j

))))p(k#1)u, j , u

$(3)


,-


,-.


(X,Y) = arg maxX,Y

.

(u,pu)!D

|pu |.

j=1

|pu, j |.

k=2ln Pr

#p(k)

u, j

))))p(k#1)u, j , u

$

= arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# "y (u) # x (sb)"22 # ln.

s!Se#"x(sa)#x(s)"22#"y(u)#x(s)"22

23333334

def= arg max

X,YL1 (D |X,Y )

(4)








e#"y(u)#x(sb)"22'

s!S e#"y(u)#x(s)"22(5)



(X,Y) = arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# ln.

s!Se#"x(sa)#x(s)"22 # "y (u) # x (sb)"22

# ln.

s!Se#"y(u)#x(s)"22

23333334

def= arg max

X,YL2 (D |X,Y )

(6)





曲同士曲とユーザ次に選ぶ曲

超大雑把な例え

• 曲 - 曲空間(R^2)

アニソン

ボカロ

演歌

メタル

アイドル

懐メロサザン

ロック

学習: 最尤推定





!s1, s2, · · · , s|S |

"be the song

set and U =!u1, u2, · · · , u|U |



#p(1)

u, j, p(2)u, j, · · · , p

(|pu, j|)u, j

$











Since'|S |



s!S e#"x(sa)#x(s)"22#"y(u)#x(s)"22(2)


(X,Y) = arg maxX,Y

(

(u,pu)!D

|pu |(

j=1

|pu, j |(

k=2Pr#p(k)

u, j

))))p(k#1)u, j , u

$(3)


,-


,-.


(X,Y) = arg maxX,Y

.

(u,pu)!D

|pu |.

j=1

|pu, j |.

k=2ln Pr

#p(k)

u, j

))))p(k#1)u, j , u

$

= arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# "y (u) # x (sb)"22 # ln.

s!Se#"x(sa)#x(s)"22#"y(u)#x(s)"22

23333334

def= arg max

X,YL1 (D |X,Y )

(4)








e#"y(u)#x(sb)"22'

s!S e#"y(u)#x(s)"22(5)



(X,Y) = arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# ln.

s!Se#"x(sa)#x(s)"22 # "y (u) # x (sb)"22

# ln.

s!Se#"y(u)#x(s)"22

23333334

def= arg max

X,YL2 (D |X,Y )

(6)





学習: 最尤推定

(X,Y) = arg maxX,Y

.

(u,pu)!D

|pu |.

j=1

|pu, j |.

k=2ln Pr

#p(k)

u, j

))))p(k#1)u, j , u

$

= arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# "y (u) # x (sb)"22 # ln.

s!Se#"x(sa)#x(s)"22#"y(u)#x(s)"22

23333334

def= arg max

X,YL1 (D |X,Y )

!!! O(|U| * |S|^2) !!!

学習

• 重すぎるので，曲/曲と曲/ユーザの項を分解

• これでO(|U| * |S|)

• あとは正則化して勾配法で最急降下法で学習





!s1, s2, · · · , s|S |

"be the song

set and U =!u1, u2, · · · , u|U |



#p(1)

u, j, p(2)u, j, · · · , p

(|pu, j|)u, j

$











Since'|S |



s!S e#"x(sa)#x(s)"22#"y(u)#x(s)"22(2)


(X,Y) = arg maxX,Y

(

(u,pu)!D

|pu |(

j=1

|pu, j |(

k=2Pr#p(k)

u, j

))))p(k#1)u, j , u

$(3)


,-


,-.


(X,Y) = arg maxX,Y

.

(u,pu)!D

|pu |.

j=1

|pu, j |.

k=2ln Pr

#p(k)

u, j

))))p(k#1)u, j , u

$

= arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# "y (u) # x (sb)"22 # ln.

s!Se#"x(sa)#x(s)"22#"y(u)#x(s)"22

23333334

def= arg max

X,YL1 (D |X,Y )

(4)








e#"y(u)#x(sb)"22'

s!S e#"y(u)#x(s)"22(5)



(X,Y) = arg maxX,Y

.

u!U

.

sa!S

.


= arg maxX,Y

.

u!U

.

sa!S

.

sb!Scu,sa ,sb

/0000001# "x (sa) # x (sb)"22

# ln.

s!Se#"x(sa)#x(s)"22 # "y (u) # x (sb)"22

# ln.

s!Se#"y(u)#x(s)"22

23333334

def= arg max

X,YL2 (D |X,Y )

(6)





結果

• 評価指標• Prec@K, Recall@K, F-1@K, MAP@K

• 比較手法• bigram, LME, LME + UE• LMEは曲のみを metric embedding する手法

• 注目する部分• 学習データに登場する回数が少ない曲ほど高精度に当て

られている

Embedding例

Specifically, we first separate transitions in the test set into 5 dif-ferent splits according to their occurrence time in the training data,with each split having roughly the same amount of transitions. Forinstance, the transitions in the test set that didn’t occur or occurredonly once in the training set and transitions in the test set that oc-curred 2 to 5 times are evaluated separately. We calculate the Recallvalue on Top-10 recommendations of three typical algorithms (Bi-gram, LME and PME) for all 5 splits in R50, and the final resultsare shown in Figure 3.

1.0

%

6.1

% 18

.0%

47

.3%

97

.4%

2.1

%

6.7

% 17

.5%

45

.5%

96

.2%

8.3

%

13

.6%

26

.0%

50

.2%

94

.0%

[0,1] [2,5] [6,16] [17,75] [76,4268]

Re

ca

ll in

te

st

da

ta

Occurrence in training data

Bigram LME PME

Figure 3: Comparisons of the Recall value on Top-10 recom-mendations of Bigram, LME and PME with di!erent splits.

From Figure 3 we can see that for the transitions that don’t occurmany times (less than 17) in the training set, PME performs muchbetter than Bigram and LME. However, the advantage of PME be-comes smaller with the increase of the occurrence. Specifically,PME achieves an average lead of 7.6% over the first three ranges,which take up about 60% percent of the test data, and Bigram isslightly better than PME only on the last range. Since most of thetransitions have comparably low occurrence rate, PME could out-performs LME and Bigram in real applications, i.e., PME is betterat predicting the unseen and sparse data.

User 1

User 2

User 3

-4

-3

-2

-1

0

1

2

3

4

5

-4 -3 -2 -1 0 1 2 3 4

Songs

Songs of User 1

Songs of User 2

Songs of User 3

5

Figure 4: Visualization of PME in R2.Case Study. Figure 4 is a visualization of the trained PME model

in R2 where all songs are represented by blue dots and 3 randomlypicked users are represented by circles with di!erent colors. Wefind out the songs sung by these 3 users in the training set, and thenhighlight them by semitransparent circles with their sizes propor-tional to the singing frequency of the corresponding user.

It shows that PME can successfully extract user preferences, s-ince the embedding position of each user is near to his/her previousactions. In the stage of recommendation, the songs that are closeto the given user, or in other words close to the songs which thegiven user have sung, are more likely to be recommended to thisuser according to Equation (5).

4. CONCLUSIONSIn this paper, we presented Personalized Markov Embedding, a

next-song recommendation strategy for online karaoke users. Wefirst proposed to embed the songs and users into a Euclidean spacein which distances between songs and users reflect the strength oftheir relationships. We also used a simulation to ensure that the em-bedding process can converge in time. In this way, the short-termand long-term preferences of each user can be captured withoutrequiring any content information. Then, the next-song recommen-dations are conducted according to the embedding. Finally, theperformance of the proposed PME are evaluated on a real worlddata set, and the experimental study delivers encouraging results.

5. ACKNOWLEDGEMENTSThe work was supported by grants from Natural Science Foun-

dation of China (Grant No. 61073110), the Key Program of Na-tional Natural Science Foundation of China (Grant No. 60933013),Research Fund for the Doctoral Program of Higher Education ofChina (20113402110024), and the National Major Special Scienceand Technology Project (Grant No. 2011ZX04016-071).

6. REFERENCES[1] P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C.

Lai. Class-based n-gram models of natural language. ComputationalLinguistics, 18(4):467–479, 1992.

[2] R. Cai, C. Zhang, C. Wang, L. Zhang, and W.-Y. Ma. Musicsense:contextual music recommendation using emotional allocationmodeling. In Proc. of MM’07, pages 553–556. ACM, 2007.

[3] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li.Context-aware query suggestion by mining click-through and sessiondata. In Proc. of KDD’08, pages 875–883. ACM, 2008.

[4] H.-C. Chen and A. L. Chen. A music recommendation system basedon music data grouping and user interests. In Proc. of CIKM’01,volume 5, pages 231–238, 2001.

[5] S. Chen, J. L. Moore, D. Turnbull, and T. Joachims. Playlistprediction via metric embedding. In Proc. of KDD’12, pages714–722. ACM, 2012.

[6] S. F. Chen and J. Goodman. An empirical study of smoothingtechniques for language modeling. Computer Speech and Language,13(4):359–393, 1999.

[7] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl.Evaluating collaborative filtering recommender systems. ACM TOIS,22(1):5–53, 2004.

[8] K. Hoashi, K. Matsumoto, and N. Inoue. Personalization of userprofiles for content-based music retrieval based on relevancefeedback. In Proc. of MM’03, pages 110–119. ACM, 2003.

[9] N. Koenigstein, G. Dror, and Y. Koren. Yahoo! musicrecommendations: modeling music ratings with temporal dynamicsand item taxonomy. In Proc. of RecSys’11, pages 165–172. ACM,2011.

[10] Q. Liu, E. Chen, H. Xiong, C. H. Ding, and J. Chen. Enhancingcollaborative filtering by user interest expansion via personalizedranking. IEEE TSMC-B, 42(1):218–233, 2012.

[11] K. Oku, S. Nakajima, J. Miyazaki, S. Uemura, and H. Kato. Arecommendation method considering users’ time series contexts. InProc. of ICUIMC’09, pages 465–470. ACM, 2009.

[12] E. Pampalk. Computational models of music similarity and theirapplication in music information retrieval. 2006.

[13] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Factorizingpersonalized markov chains for next-basket recommendation. InProc. of WWW’10, pages 811–820. ACM, 2010.

[14] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl.Grouplens: an open architecture for collaborative filtering ofnetnews. In Proc. of CSCW’94, pages 175–186. ACM, 1994.

[15] M. Tiemann and S. Pauws. Towards ensemble learning for hybridmusic recommendation. In Proc. of RecSys’07, pages 177–178.ACM, 2007.

感想

• Metric Embedding面白そう• R^20とかR^50に飛ばすのは次元の呪いとか問題無い?

• データが少し密?• 13000ユーザに対してアイテムが943

• 勾配の計算だるそう