연결자 기반의 시맨틱 정보 검색 모델 (connectives based semantic information...
DESCRIPTION
연결자 기반의 시맨틱 정보 검색 모델 (Connectives based semantic information retrieval model). Contents. Motivation Limitations of keyword-based information retrieval Related Work Overview of Semantic Search Overview of Recommendation A Semantic Information Retrieval Model - PowerPoint PPT PresentationTRANSCRIPT
연결자 기반의 시맨틱 정보 검색 모델 (Connectives based semantic information retrieval model)
Contents
Motivation Limitations of keyword-based information retrieval
Related Work Overview of Semantic Search
Overview of Recommendation
A Semantic Information Retrieval Model Unified Graph Model for Semantic Information Retrieval
Modeling
– Modeling of Connectives
– Modeling of Relationships
Probabilistic Approach to Ranking
Experiments
Conclusion
IDS Lab. - 2
Motivation
The number and variety of items available on the Web has grown explosively Two types of technologies are widely used to overcome information
overload problems
IDS Lab. - 3
Recom-mend
(push ser-vice)
Search (pull service)
Information Retrieval System
user query user profile
Massive Data Useful Information
Motivation
Most information retrieval (IR) systems are based on key-words due to its simplicity and efficiency [Xu et al., 2008]
Documents and users’ needs (e.g., queries, profiles) are represented with keywords
– Keywords are referred to as connectives that connects documents to users’ needs
Exact Matching of Keywords Since keyword-based IR systems exploit the exact matching of key-
words btw documents and users’ needs, it is impossible to return se-mantically relevant documents
IDS Lab. - 4
pid Paper text
p1 “Index selection for OLAP”, ICDE, H Gupta et al., 1997
p2 “Range Queries in OLAP Data Cubes”, SIGMOD, CT Ho et al., 1997
p3 “Implementing Data Cubes Efficiently”, SIGMOD, V Harinarayan et al., 1996
p4 “Data Cube: A Relational Aggregation Operator…”, MS technical report, J Gray et al., 1995
Query or Profile = “OLAP”
p1 “Index selection for OLAP”, ICDE, H Gupta et al., 1997
p2 “Range Queries in OLAP Data Cubes”, SIGMOD, CT Ho et al., 1997
p3 “Implementing Data Cubes Efficiently”, SIGMOD, V Harinarayan et al., 1996
p4 “Data Cube: A Relational Aggregation Operator…”, MS technical report, J Gray et al., 1995
[Balmin et al., 2004]
Limitations of Keyword-based IR [problem]
Semantic Ambiguity of Keywords [Dragut et al., 2006]
Homonym (“apple” as fruit vs. “apple” as company)
Synonym (“movies” vs. “films”)
Example of Homonym
IDS Lab. - 5
Information Seeker Web Documents (items)
query term : “apple” index terms : “apple”
concepts : “fruit” concepts : “computer”
Limitations of Keyword-based IR [solution]
Semantic Ambiguity of Keywords Query Expansion Approaches
– Co-occurrence [Wolfmanet al., 1999]
– Ontology, Thesaurus [Vogel et al., 2005][Gong et al., 2005][Shen et al., 2006]
IDS Lab. - 6
Query Expansion using Term Co-occurrence Query Expansion using Concept Keywords
Contents related to"Apple" computer(irrelevant to the user)
Contents related to"Apple" fruit(relevant to the user)
Limitations of Keyword-based IR [problem]
Sparse Annotation In keyword-based IR systems, users’ needs and items are represented
with “bag of keywords”
– Vector-space model
Due to the sparse annotation for items, it is hard to compute the de-gree of relevance exactly
– Some items may be not provided to users, although they are semantically rele-vant to the given needs
0 1 1 0 1 0 0 0
0 1 0 0 0 0 1 0
1 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 0 0 1 0 0 0 0
d1
d2
d3
d4
d5
t1 t2 t3 t4 t5 t6 t7 t8
1 0 0 0 0 0 0 1q
t1 t2 t3 t4 t5 t6 t7 t8
cos(q, d1) = 0 ; it is impossible to retrieve, although the semantic relevance is high
document-term matrix
IDS Lab. - 7
Limitations of Keyword-based IR [solution]
Sparse Annotation
IDS Lab. - 8
0 1 1 0 1 0 0 0
0 1 0 0 0 0 1 0
1 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 0 0 1 0 0 0 0
d1
d2
d3
d4
d5
t1 t2 t3 t4 t5 t6 t7 t8
1 0 0 0 0 0 0 1q
t1 t2 t3 t4 t5 t6 t7 t8
cos(q, d1) = 0 ; it is impossible to retrieve, although the semantic relevance is high
document-term matrix
2 1
1 1
1 0
1 1
1 0
d1
d2
d3
d4
d5
c1 c2
document-concept matrix
cos(q, d1) = 0.95
Semantic Information Retrieval
IDS Lab. - 9
Semantic Ambiguity of Keywords Although documents contain the keywords derived from users’ needs (queries, profiles), they may be irrelevant to the given users’ needs
Sparse Annotation Due to the sparse annotation for documents, it is hard to compute exact relevance between documents and users’ needs
Semantic Information Retrieval
using conceptual matching (semantic relevance) instead of keyword matching (literal relevance) between documents and users’ needs
concepts (not keywords) are utilized as connectives
Overview of Semantic Search
Logic-based Approaches Expressing users’ needs (i.e., queries) with specific ontology lan-
guages (e.g., RDQL, OWL-QL)
Logically inferred search results are provided to users – OWL-QL [Fikes et al., 2004]
– ONTOWEB [Kim, 2005]
OWL-QL [Fikes et al., 2004]
예시
IDS Lab. - 10
Overview of Semantic Search
Link-based Approaches (Graph Traverse Approaches) Searching semantically relevant documents through the hyperlinks
between Web documents– TAP [Guha et al., 2003]
– Hybrid Spread Activation [Roch et al., 2004]
– ObjectRank [Balmin et al., 2004]
IDS Lab. - 11
P2 “Range Queries in OLAP Data Cubes”
P1 “Index Selection for OLAP”
P4 “Data Cube: A Relational…”
P5 “Modeling Multidimensional Databases”
P3 “Implementing Data Cubes Effi-ciently”
Initial Results of “OLAP”
ObjectRank
Overview of Semantic Search
Concept-based Approaches Representing documents and users’ needs with concepts derived
from domain knowledge
– Some studies regards controlled vocabulary as concepts
IDS Lab. - 12
“music”
Sports Team: Seattle Sonics
Company: Starbucks
City: Seattle
Person: Howard Schultz
Music: When I Fall in Love
Concepts(connectives)
Overview of Semantic Search
Concept-based Approaches Result Processing
– Re-ranking documents according to each user’s conceptual profiles
– Keyword matching based approach
OBIWAN [Gauch et al., 2004]
DySe [Rinaldi et al., 2009]
OntoSearch [Jiang et al., 2009]
Query Expansion
– Converting queries and documents in a keyword space to those in a concept space
– Conceptual matching based approach
Adaptive Vector Approach [Vallet et al., 2005][Castells et al., 2007]
Folksonomy Approach [Wu et al., 2006][Xu et al., 2008]
IDS Lab. - 13
Overview of Semantic Search
Adaptive Vector Approach [Castells et al., 2007]
IDS Lab. - 14
concept vectors of query & document
SPARQL query Results
Concepts of Knowledge Base
Overview of Semantic Search
Folksonomy Approach [Xu et al., 2008]
Regarding tags as concepts
– A user annotates a document with tags which represent his/her interests
– A document has tags which represent the semantics of the document
IDS Lab. - 15
User annotated tags Web page has tags
Overview of Semantic Search
Comparison of Three Approaches
IDS Lab. - 16
Logic-based Link-based Concept-based
Goal Data Retrieval Web Documents RetrievalNot limited
(web pages, images, music etc.)
QueryOntology Language
(a barrier for ordinary users)
Keywords Keywords
Ranking X O O
Semantic Ambiguity Low High Medium
SparseAnnotation - High Medium
Connec-tives - Web Documents Concepts
(tags, categories etc.)
Overview of Semantic Search
Issues of Previous Concept-based Semantic Searches Sparse Annotation
– It is difficult to completely annotate the semantics of documents (or users’ queries) with a few connectives
Lexical analysis utilizing the exact matching of connectives (i.e., concepts and keywords)
There is a possibility that semantically relevant documents cannot be provided
Example
– Concept Vector of user: <1, 1, 0>
– Concept Vector of Document: <0, 0, 1>
IDS Lab. - 17
0),( dusim
Hollywood
Movie
Romance
Concepts(connectives)
u
Romeo & Juliet(1968)
Romeo & Juliet(1996)
Overview of Semantic Search
Issues of Previous Concept-based Semantic Searches Authority
– It is hard to determine the ranks if some documents have the same degree of relevance
– Some search engines such as Google, the authority of documents is used
Documents that are frequently referenced by others have high authority
Example
IDS Lab. - 18
Hollywood
Movie
Romance
u
Romeo & Juliet(1968)
Romeo & Juliet(1996)
semantic relevance: 0.5authority: 0.2
semantic relevance: 0.5authority: 0.8
Concepts(connectives)
Overview of Recommendation
Content-based Filtering Recommending documents similar to those a given user has preferred
in the past
– Similar to keyword search
Collaborative Filtering Identifying like-minded users whose preferences are similar to those of
the given user
Recommending documents that the like-minded users have preferred
IDS Lab. - 19
Overview of Recommendation
Example of Collaborative Filtering Identifying like-minded users whose preferences are similar to
those of the given user
– The preference of user1 is similar to that of userm
Recommending documents that the like-minded users have pre-ferred
IDS Lab. - 20
documents
users
d1 d2 d3 d4 d5 d6
user1 3 - 4 - 6 6
user2 - 2 - 5 -
userm 4 - 3 - 5 ?
recommend
Overview of Recommendation
Limitations of Previous Recommendation Systems Content-based Filtering
– Ambiguity of keywords
– Sparse Annotation
Collaborative Filtering
– Sparse Annotation
Dimension Reduction [Billsus et al., 1998][Sarwar et al., 2000]
Removing insignificant users or documents Loss of information
Hybrid Approaches of Content-based and Collaborative Filtering [Balabanovic et al., 1997][Pazzani et al., 1999]
Keywords-based connectives
Clustering of Users [Chee et al., 2002]
Bad quality of recommendations
Tag [Zanardi et al., 2008][Kim et al., 2010]
Explicit feedback (users’ annoyance or hesitation)
IDS Lab. - 21
Unified Graph Model for Semantic Information Retrieval
A Unified Graph for Semantic Information Retrieval Objects are interrelated to each other in the real world
We assume that 4 types of objects are interrelated to each other
– Users, documents, terms, concepts (Complete 4-partite graph)
– The graph can be expanded to an n-partite graph depending on applications (or domains)
IDS Lab. - 22
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
d1
d2
d3
c1
c2
Document-Concept Relationship
Unified Graph Model for Semantic Information Retrieval
Derivatives in a Unified Graph Keyword Search
– Documents containing keywords submitted by a user are regarded as search results
IDS Lab. - 23
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
connectives
u1
t1
d3
containing
subm
ittin
g
Unified Graph Model for Semantic Information Retrieval
Derivatives in a Unified Graph Conventional Collaborative Filtering
– Identifying like-minded users whose preferences are similar to those of an active user
The preferences can be derived from the click-through log (or rating log)
IDS Lab. - 24
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
connectives
u1
u2
d3
d1
accessing
accessing
accessing
Unified Graph Model for Semantic Information Retrieval
Derivatives in a Unified Graph Concept-based Semantic Search
– Representing a user’s query and documents with their corresponding concepts
– Documents containing concepts derived from a user’s query are regarded as search results
IDS Lab. - 25
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
connectives
u1
c1
d3
t1
relating
submitting
containing
Unified Graph Model for Semantic Information Retrieval
Derivatives in a Unified Graph Semantic Collaborative Filtering
– Identifying like-minded users by utilizing the concepts derived from users’ pref-erences,
Although users have accessed different document, it is possible to compute the semantic relevance between them
IDS Lab. - 26
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
connectives
u1
u2
d3
c1
accessing
preferring
preferring
Unified Graph Model for Semantic Information Retrieval
Semantic Information Retrieval in a Unified Graph Ambiguity of Keywords
– Exploiting concept connectives
Sparse Annotation
– Exploiting lexical analysis and non-lexical analysis through heterogeneous con-nectives
Authority
– Exploiting collaborative filtering to derived implicit authority
Documents that like-minded users preferred have high authority
IDS Lab. - 27
users
terms concepts
documentsaccessing
containing
subm
itti
ng relating
preferring
containing
keyword search
collaborative filtering
concept-based semantic search
semantic collaborative filtering
Unified Graph Model for Semantic Information Retrieval
Analysis of Unified Graph
IDS Lab. - 28
Links btw. Users & Documents
documentsaccessing
Many users access a few documents
A few users access many documents
termscontaining
Links btw. Documents & Terms
Many documents contain a few terms
A few documents contain many terms
Sparse Relationship
(Sparsity : 0.999)
(Sparsity : 0.998)
Unified Graph Model for Semantic Information Retrieval
Analysis of Unified Graph
IDS Lab. - 29
concepts
documents
relating
Links btw. Concepts (ODP) & Doc-uments
Links btw. Concepts (Wikipedia) & Doc-uments
Dense Relationship
(Sparsity : 0.614)
(Sparsity : 0.575)
Many concepts are related to many documents
Many concepts are related to many documents
Unified Graph Model for Semantic Information Retrieval
Analysis of Unified Graph
IDS Lab. - 30
Links btw. Terms & Concepts (ODP)
Links btw. Terms & Concepts (Wikipedia)
Many terms are contained in a few concepts
A few terms are contained in many concepts
Many terms are contained in a few concepts
A few terms are contained in many concepts
terms conceptscontaining
(Sparsity : 0.999)
(Sparsity : 0.998)
Sparse Relationship
Unified Graph Model for Semantic Information Retrieval
Analysis of Unified Graph
IDS Lab. - 31
Links btw. Users & Concepts (ODP)
users
concepts
documents
preferring
Links btw. Users & Concepts (Wikipedia)
(Sparsity : 0.418)
(Sparsity : 0.371)
Dense Relationship
Many users prefer many concepts
Many users prefer many concepts
Unified Graph Model for Semantic Information Retrieval
Types of Relationships
IDS Lab. - 32
users
terms concepts
documentsaccessing
containing
subm
itti
ngrelatin
g
preferring
containing
Dense Relationships
Sparse Relationships
Unified Graph Model for Semantic Information Retrieval
Research Questions What kind of relationship exists between the performance of semantic IR and the den-
sity between objects (i.e., nodes in a unified graph)?
What combination of relationships (or connectives) can contribute to the improvement of performance in semantic IR?
– Whether both dense relationships and sparse relationships contribute to the improve-ment of performance or not
IDS Lab. - 33
Performance(e.g., precision)
Density
Unified Graph Model for Semantic Information Retrieval
IDS Lab. Seminar - 34
-
Conventional Collaborative
Filtering (CCF)
Semantic Collaborative
Filtering (SCF)
CCF + SCF
- -
Keyword Search
(KS)
Semantic Search(SS)
KS + SS
Recommen-dation
Search
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents
users
terms concepts
documents
users
terms concepts
documents
No Dense Relationship 1 Dense Relationship(user-concept)
1 Dense Relationship(document-concept)
2 Dense Relationships(user-concept &
document-concept)
Combination of Relationships
Unified Graph Model for Semantic Information Retrieval
IDS Lab. Seminar - 35
-
Conventional Collaborative
Filtering (CCF)
Semantic Collaborative
Filtering (SCF)
CCF + SCF
- -
Keyword Search
(KS)
Semantic Search(SS)
KS + SS
Recommen-dation
Search
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents users
terms concepts
documents users
terms concepts
documents
users
terms concepts
documents
users
terms concepts
documents
users
terms concepts
documents
Comparison of Research Coverage
Coverage of Previous Approaches
Coverage of Our Approach
Modeling for Connectives - Document
Document Each document is represented by a |V| dimensional term vector
To remove the effect of document length, the term vector is normal-ized
IDS Lab. - 36
|V|i,ki,i,1 w,...,w,...,wid ,where wn,k is the weight (tf-idf) of the kth term in dn and V is the set of index terms
t1
t2
t3
1d
222 )(,...,
)(,...,
)( ji,
|V|i,
ji,
ki,
ji,
i,1
w
w
w
w
w
wid
Modeling for Connectives – User
User Explicit Approach
– A user is represented by keywords that the user explicitly provides to IR sys-tems
Implicit Approach
– By analyzing a user’s access log, it is possible to represent the user with key-words derived from his/her access log
A user is defined as the average of term vectors
– The derived term vector is normalized to remove the length effect
IDS Lab. - 37
D D
pupu
n
p
pd
unup du
1
, where and D pppp un
ui
uu ddd ,...,,...,1u
access
t3
t4
t2
t1
t6t5
d3
d1 access
access
d2
|V|i,ki,i,1 w,...,w,...,wpuid
Modeling for Connectives – Concept
Concept Definition from the American Heritage Dictionary
– A general idea derived or inferred from specific instances or occurrences
A concept is defined as the average of term vectors derived from ob-jects (or attributes) that belong to the concept
– If the objects are documents, the concept modeling is similar to the user model-ing
– The derived term vector is also normalized to remove
the length effect
IDS Lab. - 38
O O
xcxc
i
x
xo
cicx oc
1
, where and
O xxxx cm
ci
cc ooo ,...,,...,1
|V|i,ki,i,1 w,...,w,...,wxcio
concept
t3
t4
t1
t2
t6t5
belong tobelong to
belong to
objector
attribute
objector
attribute
objector
attribute
Modeling for Relationships
Relationships Explicit Relationship
– Relationships that explicitly exist between two types of objects
Example in Document-Term Relationships
IDS Lab. - 39
users
terms concepts
documents
Document –TermUser-Term
(Explicit Approach)
Concept-Term
User-Document (User Access Log)
, where w(di, tk) denotes the weight of kth term in di
otherwise
t iftw
tw
tk
t j
k
kj
0
),(
),(
)|Pr(i
i
i
i
dd
d
d
Modeling for Relationships
Relationships Implicit Relationship
– Relationships that are inferred (or derived) from explicit relationships
IDS Lab. - 40
users
terms concepts
documents
User Modeling(Implicit Approach)
Document-ConceptRelationship
User-ConceptRelationship
Modeling for Relationships
Relationships Implicit Relationship
– Relevance between two objects (oi, oj) is estimated with a conditional probability Pr(oi|
oj)
– Assuming that prior probabilities Pr(oi), Pr(oj), Pr(er) are constant for their random vari-
ables
IDS Lab. - 41
r
r
r
r
ejrir
r
i
er
r
jrj
r
iri
j
errjri
j
errji
j
j
jiji
ijj
iji
oeoee
o
ee
oeo
e
oeo
o
eeoeoo
eeooo
o
oooo
ooo
ooo
)|Pr()|Pr()Pr(
)Pr(
)Pr()Pr(
)|Pr()Pr(
)Pr(
)|Pr()Pr(
)Pr(
1
)Pr()|Pr()|Pr()Pr(
1
)Pr()|Pr()Pr(
1
)Pr(
)Pr()|Pr(
)|Pr()Pr(
)Pr()|Pr(
the law of total probability
the definition of conditional probability
assuming oi and oj are conditionally independent on er
Bayes’ theorem
re
jririjji oeoeoooo )|Pr()|Pr()|Pr()|Pr(
relevance between oi & oj connectives connecting oi with oj
Bayes’ theorem
Modeling for Relationships
Relationships Implicit Relationship
IDS Lab. - 42
id
ipipp duduu )|Pr()|Pr()|Pr()|Pr( kkk t tt
between Users and Terms
kt
kk tt )|Pr()|Pr()|Pr()|Pr( xiixxi cddccd
id
xipipxxp cduduccu )|Pr()|Pr()|Pr()|Pr(
users
terms concepts
User Modeling(Implicit Approach)
Document-ConceptRelationship
User-ConceptRelationship
between Documents and Conceptsbetween Users and Concepts
id
xipi cdudkt
kk tt )|Pr()|Pr()|Pr(
documents
Probabilistic Approach to Ranking
Search Keyword Search
Semantic Search
IDS Lab. - 43
kt
kk tt )|Pr()|Pr()|(Pr qiqik udud
xc
qxixqis ucdcud )|Pr()|Pr()|(Pr
xcqxix ucdc
kk tkk
tkk tttt )|Pr()|Pr()|Pr()|Pr(
offline computation(document-concept relationship)
Probabilistic Approach to Ranking
Recommendation (Collaborative Filtering-based Ap-proach) Conventional Collaborative Filtering
Semantic Collaborative Filtering
IDS Lab. - 44
users
terms concepts
documents
p xu d
pxpxip ududdu' '
)|'Pr()'|'Pr()|'Pr(
p ru c
prprip ucucdu'
)|Pr()'|Pr()|'Pr(
offline computation(user-concept relationship)
users
terms concepts
documents
)|'Pr()|'Pr()|Pr('
''pp
u
uipp
ui uuduud
p
pp
)|'Pr()|'Pr()|Pr('
''pp
u
uipp
ui uuduud
p
pp
IDS Lab. - 45
)|Pr( qukt
users
terms concepts
documents
connectives
)|Pr( xckt
line-off
:
)|Pr( ix dc
Experiments
IDS Lab. - 46
Contributions
Proposing a Unified Model for Semantic Information Re-trieval 멀티 타입 (multi-typed) 연결자를 이용한 시맨틱 정보 검색 모델
시맨틱 기반 검색 (Search) 및 추천 (Recommendation) 을 아우르는 모델 – 관련 연구들은 제안된 모델된 특정 링크 정보를 이용한 특별한 형태임
Providing a Guide to Ranking in Semantic Information Re-trieval 제안된 모델 내에서 연결자들 사이의 관계를 고려한 랭킹 모델 고찰 및 제안
다양한 개념 연결자 타입들을 이용하여 , 시맨틱 정보 검색 모델의 특성 고찰
Resolving Limitations of Previous Approaches 통합 모델을 이용하여 이전 연구들의 한계점들을 극복
– Ambiguity of Keywords
– Sparse Annotation
– Exact Matching of Concept-based Approaches
– Novelty
IDS Lab. - 47
IDS Lab. - 48
IDS Lab. - 49
Unified Equation
IDS Lab. - 50
},,{
)|()|()|(UCTK Kk
K kupkdpudp
1 UCT
1)|( utp
Graph Density
Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering (TOIS’04)
IDS Lab. - 51
graph the in links possible of number
graph the in present links actual of numberdensitygraph _
Modeling of Relationships
Lemma 1. 임의의 두 객체들의 확률 기반 유사도는 벡터공간 모델에서 두 객체들의
코사인 기반 유사도에 비례
Proof.
IDS Lab. - 52
),()|Pr( jiji oosimoo
re
jrirji oeoeoo )|Pr()|Pr()|Pr( 에서 Pr(er|oi), Pr(er|oj) 를 다음과 같이 정의하면
,)|Pr(
2
xi,
ri,
w
wir oe
2)|Pr(
xj,
rj,
w
wjr oe
22
)|Pr()|Pr()|Pr(
xj,xi,
rj,ri,
ww
ww
rejrirji oeoeoo ),( ji oosim
* 참고 : 벡터 공간 모델에서 두 객체들은 다음과 같이 정의됨
,)(
,...,)(
,...,)( 222
xi,
||i,
xi,
ri,
xi,
i,1
w
w
w
w
w
w Rio
222 )(,...,
)(,...,
)( xj,
||j,
xj,
rj,
xj,
j,1
w
w
w
w
w
w Rjo