andreas lommatzsch, jérôme kunegis, torsten schmidt ... · andreas lommatzsch, jérôme kunegis,...

29
Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Upload: others

Post on 25-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx

Whitepaper Semantic Engine

Technologies and Solutions

Page 2: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 2

Content IntroductionIntroductionIntroductionIntroduction ............................................................................................................................................................................................................................................................................................................................................................................................................................................................ 3333 MotivationMotivationMotivationMotivation........................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 3333

Fundamentals....................................................................................................................... 3 Existing Recommender Algorithms.................................................................................... 4

Content-based Filtering................................................................................................... 4 Collaborative Filtering ...................................................................................................... 4 Social filtering.................................................................................................................... 5 Hybrid Recommender ...................................................................................................... 6

Problems of existing Approaches - Requirements ........................................................... 6 ApproachApproachApproachApproach ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 7777

Semantic Data stores........................................................................................................... 8 Semantic Data Model ........................................................................................................... 8 Representing semantic Datasets as Graphs ..................................................................... 9

Normalizations and relative Weights............................................................................. 9 Learning a Network and a Prediction Model ...............................................................10 Memory based Recommender ......................................................................................10 Model based Recommender..........................................................................................11 Universal Latent Decomposition...................................................................................12 Meta Recommender.......................................................................................................12 Current Research Topics ................................................................................................13

ImplementatImplementatImplementatImplementationionionion ............................................................................................................................................................................................................................................................................................................................................................................................................................ 14141414 Application scenariosApplication scenariosApplication scenariosApplication scenarios............................................................................................................................................................................................................................................................................................................................................................................................ 16161616

Smart Media Assistant: Integration with an IPTV Application.......................................16 The Integration of the semantic Engine into personalized IPTV ...............................18 Data sources and semantic data ..................................................................................18 Recommender scenario .................................................................................................19 Evaluation and Experiences ..........................................................................................20

SERUM: Music Recommender based on encyclopedic data...........................................21 The SERUM data sources...............................................................................................22 The SERUM recommender scenarios ...........................................................................22

SummarySummarySummarySummary ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................ 25252525 BibliographyBibliographyBibliographyBibliography .................................................................................................................................................................................................................................................................................................................................................................................................................................................... 26262626

Page 3: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 3

Introduction With the growing availability of semantic datasets, the processing of these dataset comes in the focus of interest. Thus, the efficient processing of large semantic datasets can improve application in many domains. In this whitepaper, we introduce an architecture that supports the aggregation of different types of semantic data and provides components for deriving recommendations and predicting relevant relationships between dataset entities. The developed architecture supports different types of data sources (e.g. databases, semantic networks) and enables the efficient processing of large semantic datasets with several different semantic relationship types. We discuss the developed architecture and describe an implemented application for the entertainment domain.

Motivation Current information systems usually are designed to compute results only for one predefined scenario. These systems are often based on a fixed processing pipeline, supporting only one data source and one recommender algorithm. By contrast, the approach described in this whitepaper defines a universal framework, supporting different data stores (e.g. databases, rdf triple stores) and several state-of-the-art recommender algorithms. The framework is open to new recommender approaches and supports the customization to the specific requirements for the respective scenario. A meta-recommender component allows the context-aware aggregation of results from different recommender components.

Fundamentals

In the field of information retrieval, a recommender system is defined as a system that is able to find entities in a dataset that may be of interest to the user. In contrast to search engines, recommender systems do not base their results on a query, instead they rely on implicit and explicit connections between users and items, such as ratings or other past interactions. Research and development in the area of recommender systems has grown in recent years, as witnessed by the creation of a high-profile conference devoted to them. In the general case, a recommender system applies to a dataset described by a data model containing entities (such as users and items) and relationships (such as ratings and social links). In the simplest recommender system, data consists of one relationship type connecting one or two entity types. In more complex cases, the dataset contains multiple relationship types connecting any number of entity types. The simple case, with only one relationship type, corresponds to several well-studied recommendation subproblems, such as link prediction, collaborative filtering, citation analysis, etc. In the case of multiple relationship types, hybrid recommenders are normally used. As we will show, most hybrid recommenders however are specific to one data model and cannot be generalized to other data models. Therefore, each time a new data model is introduced, a new hybrid recommender has to be developed.

Page 4: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 4

Existing Recommender Algorithms

We review classical recommendation settings motivating the complexity of typical recommendation datasets. We also review existing solutions for recommendation problems that apply to specific dataset types.

Content-based Filtering

The first kind of recommender we describe uses the content of items to generate recommendations, similarly to search engines. While search engines require the user to enter a specific keyword for searching, content-based recommenders usually take keywords from another source, for instance a user profile containing words describing user interests, or from items already seen or rated [La95, AWL07].

Rating

Feature-Similarity

Recommendation

Figure 1: A content based recommender computes recommendations based Figure 1: A content based recommender computes recommendations based Figure 1: A content based recommender computes recommendations based Figure 1: A content based recommender computes recommendations based items items items items features (e.g. words).features (e.g. words).features (e.g. words).features (e.g. words).

Figure 1 shows a situation in which a user-item recommendation is found by comparing common features of two items. Content based recommendation follows the idea that items having common properties (e.g. similar term frequency statistics) have the same relevance for the user. Content-based filtering has traditionally been applied to document recommendation, using the tf-idf measure as edge weights. In the entertainment domain, content based approaches can be use for computing interesting TV recommendations based on the electronic program guide (EPG) or for computing movies recommendation based on the data retrieved from the internet movie database (IMDB)1. A disadvantage of content based filtering is that these systems do neither consider ratings nor the individual taste of the user. Content based recommender systems can be improved by integrating linguistic knowledge. For computing the similarity between to textual descriptions synonyms, homonyms and antonyms should be taken into account. Moreover, glossaries and dictionaries can be integrated for computing the similarity between texts in different languages.

Collaborative Filtering

While the content-based approach is simple, only making use of one user's relations to items, collaborative filters attempt to make use of all known relations between users and items [HKB99]. Many systems track the user and collect data, which items were

1 http://www.imdb.com/

Page 5: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 5

interesting to the user [SKD06, SZL05]. This information can be aggregated and used for collaborative filtering.

Rating

Recommendation

Figure Figure Figure Figure 2222: A : A : A : A collaborative collaborative collaborative collaborative recorecorecorecommender computes recommendations based mmender computes recommendations based mmender computes recommendations based mmender computes recommendations based only on only on only on only on useruseruseruser----item ratings without considering item features.item ratings without considering item features.item ratings without considering item features.item ratings without considering item features.

Figure 2 schematically shows the idea of collaborative filtering: Users having rated items similarly in the past tend to have the same taste. Items relevant to users with a similar taste are potentially interesting to the user. E.g., if two users U1 and U2 watched the same TV programs in the past and U1 rated a new TV program positive, this program is probably interesting also to user U2. The advantage of collaborative filleting is that the recommendations are only based on ratings and no additional knowledge is required. This allows to compute recommendations for items for that neither a description nor meta-data are available. Well known collaborative filtering systems are Amazon [LSY03], Netflix [Be06] und MovieLens [RIS94].

Social filtering

For many users, social network play a very important role. A friendship relationship in social network ("buddy list") often indicates that users have similar interests [BHS08]. For instance, if user rated a TV program very positive, this TV program is maybe also interesting to the friends of this user. This type of recommendation is particularly useful when trust is important. In this case, a trust measure can be defined between users denoting the level of confidence a user has in another user. Methods to compute trust include local measures and global approaches, which often generalize the PageRank measure.

Rating

Friendship-Relation

Recommendation

Figure 3: Figure 3: Figure 3: Figure 3: The Figure visualizes social filtering. An item ratThe Figure visualizes social filtering. An item ratThe Figure visualizes social filtering. An item ratThe Figure visualizes social filtering. An item rated positively ed positively ed positively ed positively by user U by user U by user U by user U is is is is probably also interesting to probably also interesting to probably also interesting to probably also interesting to friends of user U.friends of user U.friends of user U.friends of user U.

Figure 3 visualizes social filtering. Users connected by a friendship relationship have potentially similar interest. Thus, items recommended positively by a user, can be recommended. Well known network datasets for social filtering are Advogato [St05], Slashdot Zoo [KLB09b], Epinions [GKR04], Enron [KY04], Twitter [KLP10], Facebook [VMC09], YouTube [Mi09] or dating datasets [BP07].

Page 6: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 6

Hybrid Recommender

In many recommender systems, several of the previously described dataset types are known. For instance, a recommender system may have user ratings for items and at the same time content information about items. Recommenders that apply to such datasets are called hybrid recommenders [AST05, AT05]. While hybrid recommenders exist for many combinations of entity and relationship types, none of these can be applied to all semantic networks since they are not generic. System combining user ratings and reference networks are discussed in [Kl99, Le02, Ne01, PBM98, ZOZ07].

Problems of existing Approaches - Requirements

Current recommender systems are usually designed for one scenario. In general, these recommenders consider only one kind of recommender algorithms or combine two recommender algorithms in a static way. Thus, it is difficult to adapt these systems to new scenarios. Moreover, the integration of new data is in most cases impossible. A framework for computing recommendations and for the processing of large datasets should fulfill the following requirements:

• The recommender must provide high quality recommendations.

• The recommendations must match the user's preferences. The recommender should provide a component that provides an explanation for each recommendation (proving that the recommendation is correct).

• The recommender should be fast. A change in the user profile should result in an adaptation of recommendations.

• Semantic data are available from several different data sources. Thus, a framework for the processing of semantic data must support all relevant data sources such as databases, text files and RDF-stores [LS98]. Moreover it must be open for new data source might be relevant in the future.

• The framework should support different recommender algorithms so that dependent from the respective scenario and available data the best matching algorithms can be chosen.

• Memory-based as well as model based recommenders should be supported. Memory based recommender have the advantage that changes in the dataset are reflected in the calculated recommendations. Model based recommender use a model for enabling a more efficient computation of recommendations.

• The recommender must different recommender scenarios, such as user-item, item-item and user-user recommendations.

A universal recommender should be able to handle different kind of relationships and compute recommendations based on the weighted combination of different relevant edge relationship types. Figure 4 schematically visualizes a hybrid recommender.

Page 7: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 7

RatingFeature-SimilarityFriendship-RelationReferencesRecommendation

Figure 4: A semantic recommendeFigure 4: A semantic recommendeFigure 4: A semantic recommendeFigure 4: A semantic recommender should take into account several different r should take into account several different r should take into account several different r should take into account several different

relationship types for computing recommendations.relationship types for computing recommendations.relationship types for computing recommendations.relationship types for computing recommendations.

Approach In this section we introduce the developed system architecture and discuss the components and the implemented algorithms in detail. Our framework "semantic engine" is designed to work with large semantic datasets internally represented as semantic networks, consisting of nodes and labeled, weighted edges. To support large real-world networks the framework supports datasets with multiple semantic relationship types and uses a data representation optimized for sparse networks. The developed architecture consists of four main components, shown in Figure 5.

Integration of data sources

databases

semantic networks

Ontologies(rdf, owl)

NormalizationWeighting

Latent decomposition

Clustering

Recommender models

GUIresult visualisationexplanations

VisualisationPresentation

Select best matching modelsAggregate the results

RecommendationsAggregation

Figure Figure Figure Figure 5555: The semantic engine components.: The semantic engine components.: The semantic engine components.: The semantic engine components.

The leftmost component connects the recommender with different types of data sources. Data can be imported from ontologies or triple stores (e.g. Virtuoso). Moreover different method are supported for importing structured data (e.g. from relational databases). The imported data are preprocessed to extract the knowledge needed for computing recommendations and to speedup the calculation. The framework provides methods for data normalization, defining weights of nodes and edges and for data clustering. In addition, components for noise reduction and extracting the most relevant information are provided (e.g. based on Latent decomposition). The third component defines the rules and models for computing recommendations. The framework supports memory based recommender models as well as model based recommender models. A meta-recommender selects based on the respective scenario the most appropriate models and aggregates of recommendations computed based on different recommender models. The rightmost component in figure 1 shows the component that provides functions for visualizing the results. The component generates explanations for computed recommendations and provides an interface to external services.

Page 8: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 8

Semantic Data stores

Recommender systems are usually classified by the type of data they use as the basis for recommendation. If the objects to be recommended are analyzed by their content, the recommender is called feature-based. If only feedback from users is used, the recommender is called collaborative. Hybrid recommenders use both types of information. Due to their powerful knowledge representation formalism and associated inference mechanisms, ontology-based systems are emerging as a natural choice for the data management of the next generation of recommender systems. Therefore, the approach taken by the Semantic Store consists of an open semantic architecture design, supporting different ontologies, e.g. for personal user information, interest, etc. Information, managed in the Semantic Store, can be linked to create meta knowledge to enhance the recommendation quality. For instance, if the user expresses interest in a document, implicit through reading or explicit through rating, this information can be linked to DBpedia2 information to expand the feature set of the read document. This record linkage can be done via explicit rules or through algorithms. Also, the information in the Semantic Store is maintained in terms of data quality removing duplicates or handling temporal inconsistencies. To support any kind of recommender system, a focus lies on the easy extensibility of the ontology data model, allowing an application to define and integrate own ontologies to support their needs.

Semantic Data Model

The Semantic Store stores information in the common RDF triple notation. Each triple consists of a subject entity, an object entity and a predicate. Initially, the Semantic Store data model supports a set of open, widely accepted ontologies to allow data exchange across application borders to enable the Universal Recommender to use data from different applications to compute recommendations on a larger information basis. The following figure depicts a user model containing information from different areas modeled in different ontologies. This information is linked.

MOATSIOC

Microformats

Context

Interests

FOAF

Figure Figure Figure Figure 6666: Relevant data formats for recommender framewor: Relevant data formats for recommender framewor: Relevant data formats for recommender framewor: Relevant data formats for recommender framework.k.k.k.

2 http://dbpedia.org

Page 9: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 9

Representing semantic Datasets as Graphs

Semantic datasets can be represented as large graphs consisting of nodes and edges. The nodes stand for the entities (subject and objects in semantic triples), the edges for the relationships between the entities (the predicate in semantic triples). An example of a semantic dataset fort he entertainment domain is shown in Figure 7. The visualized dataset contains the entity types "Artist and Band", "Genre", "Albums", "Track" and "User". The edges define the semantic relationships between the entity sets. E.g., the relationship set "MusicRelease" defines which albums have been released by an artist or a band; the relationship "loved Artists" provides information which user "liked" which artists.

Figure Figure Figure Figure 7777: A grap: A grap: A grap: A graph representing a semantic dataset for the music domain.h representing a semantic dataset for the music domain.h representing a semantic dataset for the music domain.h representing a semantic dataset for the music domain.

The data used in most recommendation scenarios can be interpreted semantically as triples of entities and a predicate. The following table gives common examples.

PredicatePredicatePredicatePredicate EntitiesEntitiesEntitiesEntities WeightsWeightsWeightsWeights

Explicit feedback User, Item Signed Content Item, Feature Nonnegative

Implicit feedback User, Item Nonnegative Friend User Unweighted Friend/foe User Signed Trust User Unweighted Trust/distrust User Signed Communication User Unweighted

Selling data User, Item Nonnegative Hyperlink/reference Item Unweighted

Table 1: Table 1: Table 1: Table 1: Information Triples in the Information Triples in the Information Triples in the Information Triples in the ssssemantic emantic emantic emantic datasetsdatasetsdatasetsdatasets

Normalizations and relative Weights

For computing recommendations based on a graph, it is necessary to assign appropriate weights for each edge [BK07, ABB06]. The weights must consider the semantic ("label", "predicate") and the relevance of the edges. Usually, edge weights are scaled to values between 0 and 1. The developed framework supports additive and multiplicative normalization, such as subtracting the overall minimal weight or dividing by the overall maximum edge weight.

Page 10: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 10

In semantic datasets containing several different semantic relationships a weighting model for each edge type must be defined. The developed framework supports expert defined edge weights as well as machine learning algorithms, deriving the optimal weights based on a training dataset.

Learning a Network and a Prediction Model

We define a model for combining the edge weights of a path between two nodes. The model must consider the case of parallel edges and the case of a sequence of edges. Figure 3 shows three approaches for defining an edge algebra. In most applications, for simplicity we apply the model of weighted paths.

Shortest

Path

Resistance

distance

Weighted

Path

a+ba+ba*b

min(a,b)(a*b)/(a+b)a+b

Shortest

Path

Resistance

distance

Weighted

Path

a+ba+ba*b

min(a,b)(a*b)/(a+b)a+bba

ba

Figure Figure Figure Figure 8888: Different edge algebras: Different edge algebras: Different edge algebras: Different edge algebras

Another important step for computing recommendations is to specify a prediction model that defines which properties an entity must fulfill to be relevant for a query. The most commonly used recommender model is the semantic similarity of an entity with the input entities. This model follows the idea that entities connected with short paths (having a high weight) to the input entities and entities for that several parallel paths exist are most relevant to the input entities. Alternative recommender models are triangle closing or number of common neighbors [LHK10].

Memory based Recommender

For computing recommendations, the framework supports memory based and model based recommender algorithms. Memory based recommender calculate paths in the original dataset. Starting from a set of given input entities (e.g. entities in the user profile), the potentially relevant paths through the semantic network are examined. The entities reachable from the input entities are ordered according to a semantic similarity rating. This rating is calculated based on the edge weights of the respective paths. For parallel edges/paths the ratings are summed up; for a sequence of edges the weights are multiplied and weighted by a discount factor (dependent on the path length) .

Madonna

MadonnaDavidArnold MadonnaDavidArnold

DieAnotherDay2 Music7 LikeAVirgin3 BedtimeStories2 …DieAnotherDay2 Music7 LikeAVirgin3 BedtimeStories2 …

DiamondsAreForever2 TheWorldIsNotEnough2 Stargate_4 …DiamondsAreForever2 TheWorldIsNotEnough2 Stargate_4 …

0.5

1 1 1 ..

..1111

0.5

1 1 1 ..

Album

Artist

Album

Artist

Figure Figure Figure Figure 9999: Search rela: Search rela: Search rela: Search related entities in a semantic network (memory based ted entities in a semantic network (memory based ted entities in a semantic network (memory based ted entities in a semantic network (memory based

recommender)recommender)recommender)recommender)

Page 11: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 11

Figure 9 shows paths through a bipartite (artist-album) network. Starting from the artist "Madonna" semantically related entities are computed. The advantage of memory based recommenders is that updates in the semantic network immediately effect the computed recommendations. Moreover, the paths from an input entity to a "recommended" entity can be presented to the user as a simple explanation [SNM08]. The disadvantage of memory based recommenders is that they provide not as precise results as model based recommenders on sparse networks. Another disadvantage is, that the computation of long paths is resource demanding.

Model based Recommender

Real world datasets are often sparse, noisy and very large [GA03]. Thus, building models based on the dataset enables an effective complexity reduction by restricting the spanned search space to the most relevant dimensions. Our framework deploys algorithms for computing low-rank approximations of complex datasets [SKK02, KL09, GRG01, GG05] and for clustering similar entities [JMF99]. The clustering component supports the integration of additional expert knowledge to improve the computed network model. Figure 10 visualizes two methods for building a network model.

Figure Figure Figure Figure 10101010: The Figure schematically visualize the complexity reduction of network, : The Figure schematically visualize the complexity reduction of network, : The Figure schematically visualize the complexity reduction of network, : The Figure schematically visualize the complexity reduction of network, represented as a sparse matrixrepresented as a sparse matrixrepresented as a sparse matrixrepresented as a sparse matrix. Supported methods for building network models . Supported methods for building network models . Supported methods for building network models . Supported methods for building network models

are the dimensionality reduction based on a matrix decomposition and the are the dimensionality reduction based on a matrix decomposition and the are the dimensionality reduction based on a matrix decomposition and the are the dimensionality reduction based on a matrix decomposition and the clustering of thclustering of thclustering of thclustering of the sparse matrix.e sparse matrix.e sparse matrix.e sparse matrix.

The computation of related items in a semantic network containing clustered entities is shown in Figure 10. In general, path based search algorithms can be applied. Due to the reduced network complexity longer paths through the network can be computed efficiently requiring only a reduced amount of recourses. The disadvantage of model based recommenders is that additional effort is needed when the dataset changes. Another disadvantages is, that network models are often difficult to understand. Thus, it is not easy to provide understandable explanations with model-based recommenders.

Madonna

MadonnaDavidArnold

Cluster3: JamesBond-Soundtracks Cluster7: Madonna's pop albums …

0.1 ..

..0.40.6

0.3

AlbumCluster

Artist

Artist

0.5

Garbage

0.2

………………

Figure Figure Figure Figure 11111111: The Figure : The Figure : The Figure : The Figure visualized the svisualized the svisualized the svisualized the search earch earch earch for for for for related entities in a semantic related entities in a semantic related entities in a semantic related entities in a semantic

networknetworknetworknetwork containing a clustered entity set. containing a clustered entity set. containing a clustered entity set. containing a clustered entity set.

Page 12: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 12

Universal Latent Decomposition

A very important model for the preprocessing of large semantic datasets is the universal latent decomposition. The idea behind this method consists in representing entities in a latent space, in which relationships are predicted by using a scalar product. The advantage of this model is, that the computation of the latent model can be interpreted as a decomposition of the adjacency matrix of the complete network, allowing us to use well understood mathematical methods (e.g. graph kernels). The predictions made by the latent model have to be mapped to recommendations. Several state-of-the-art recommendation algorithms are based on a decomposition into a latent model:

• The singular value decomposition (SVD) and eigenvalue decomposition (EVD) [Mi09] and their applications to principal component analysis (PCA) and latent semantic indexing (LSI) [EC07].

• Graph kernels such as the exponential kernel, the von Neumann kernel, path counting and rank reduction methods [GA03, GKR04, HJS06, HCC04]. These can be applied to the eigenvalue or singular value decomposition of graphs, and their parameters can be learned efficiently [ISK05].

• Methods based on the Laplacian matrix such as the commute time and resistance distance [ABB06, BHS08, Kl99], the heat diffusion kernel [HKT04] and the random forest kernel [BNJ03].

• Probabilistic approaches such as probabilistic latent semantic analysis (PLSA) [HKB99], and latent Dirichlet allocation (LDA)

• Other matrix decompositions such as nonnegative matrix factorization [Ko08], maximum margin matrix factorization [La95] and low-rank approximations with missing values [KLP10].

• Higher-order decompositions such as parallel factor analysis (PARAFAC) [BCA07], the Tucker decomposition [Le02] and others [KY04].

The universal latent decomposition is a powerful model for large semantic datasets. It allows an efficient complexity reduction and noise reduction. The disadvantage is of the approach is, that it is difficult to explain and it does not provide explanations for computed recommendations.

Meta Recommender

A framework for processing semantic data should be able to integrate several different recommender algorithms and models [Po06]. That is why our framework provides a meta-recommender that selects the most appropriate recommender algorithms based on the respective scenario and the query. The results from the chosen recommenders are aggregated in a unified result list. The meta-recommender strategy can ensure the "diversity" of the recommended entities (meaning the entities have been calculated based on different models) or on the consensus of different recommender models (e.g. computed based on the aggregation strategies CombSUM or CombMIN [Lee97]).

Page 13: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 13

Current Research Topics

The implemented framework is continuously improved, taking into account new semantic dataset and recommender scenarios. Current fundamental research topics are:

• Learning the optimal parameters for scaling, normalization, recommender model selection and the deployed link algebra.

• Integration of current recommender algorithms into the meta-recommender.

• Learning and evaluation of scenario dependent delegation and result aggregation strategies

• A user study with the goal to rank recommender models according to the user acceptance.

• Context aware selection of graph kernels and submatrix weights.

• An evaluation framework that allows us to compare the implemented algorithms on the most commonly used datasets.

These topics will be researched in LSR project, starting in April 2010.

Page 14: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 14

Implementation The semantic engine framework is implemented in JAVA. The base components are optimized for the handling of large datasets and the efficient calculation of recommendations. Figure 12 visualizes the JAVA-classes for the management of semantic datasets.

AbstractEntitySet

AbstractRelationshipSet

EntitySet EntitySetClustered

RelationshipSet

SemanticEntitySet

SemanticRelationshipSet

+getName()

+getType()

«Schnittstelle»

EntitySetInterface

+getName()

+getType()

«Schnittstelle»

RelationshipInterface

de.dailab.semantic.suggest

SemanticRecommenderBI

RecommenderEnsemble

The interface defintion

of a EntitySet

(defines entity name, type,

and the set of entites)

basic implementation

for entity sets

EntitySets optimized for different

semantic datasources

EntitySet is optimized for

locally stored txt-Files

SemanticEntitySets is

optimozed for SPARQL

endpoints

EntitySetsClusters is optimized for

clustered entitysets

The interface defintion for relationshipSets

defines name, type and the edge set

base implementation

for relationship sets

RelationshipSets lesen die

Daten aus dem Filesystem

SemanticRelationshipSets lesen

die Daten von einem SPARQL

Endpoint

Computation of relevant paths

in bipartite networks

A recommender aggregates

several recommender

(path based and model based)

RecommenderEnsembleManagerMeta-recommender:

aggregates the results from

different recommenders

Wrapper

SemanticRecommenderBIPClustered

1

*

1

*

1

*

1

*

Figure 12: Figure 12: Figure 12: Figure 12: The core classes of the semantic engine fThe core classes of the semantic engine fThe core classes of the semantic engine fThe core classes of the semantic engine frameworkrameworkrameworkramework....

Page 15: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 15

For computing recommendations the following processing steps are applied:

• The datasets relevant for the current scenario are loaded.

• The datasets are normalized and the edge weights are scaled.

• Dependent from the respective scenario and the dataset the data models are computed. This can be done based on cluster algorithms or based on a matrix decomposition. If the matrix decomposition is used, the weights for each submatrix are set and a graph kernel is chosen (based on the scenario).

• For computing recommendations, based on the respective query, the meta-recommender chooses adequate recommender components and delegates the query to these recommenders. The results provided by the recommender components are aggregated in a single result list.

Dependent on the respective scenario and the type of query some steps might are left out. The semantic engine framework provides several APIs (e.g. a web service interface) so that the recommender component can be integrated in new applications.

Page 16: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 16

Application scenarios For proving the features and the capabilities of the semantic engine framework, we deployed the framework in several applications. In the following sections the Smart Media Assistant and the recommender for the music domain SERUM are explained in detail.

Smart Media Assistant: Integration with an IPTV Application

The ever-increasing amount of TV and Internet based content requires filter and recommendation mechanisms as well as an intelligent linking of multi-media content to assist the user in content selection and provide an enhanced and enriched consuming of content and information. The overall goal of the IPTV Assistants is to provide the user a quadruple play application that combines IP based TV, Internet, Communication and the integration of mobile devices to assists the user in TV content selection by providing him a personalized, easy accessible and enriched content view combined with extra interactivity and communication features for increasing social interactions. Besides common IPTV services such as Live TV, Video on Demand (VoD) and Private Video Recorder (PVR), the IPTV Assistant provides an enhanced EPG view that combines EPG data with a personalized recommendation based MyTV program. Thereby it makes use of a recommendation system that provides user-to-content and content-to-content recommendations to the IPTV Assistant application. From the EPG program view the user can trigger streaming of the current broadcasts, schedule recordings, mark program items as personal favourite, rate program items and recommend program items to other users. In a preferences configuration view the user defines a set of genres he likes. Beside the recommendation based MyTV program overview there is a special MyTV channel providing a multi-stream view showing the most relevant current TV broadcasts simultaneously. While watching TV the user is able to communicate with other users by audio and video telephony or chat. Because of using open standards, an overall connection to different Instant Messaging systems, VoIP solutions and legacy telephony networks (PSTN/ISDN) is possible. Currently the IPTV Assistant supports its full functionality on Windows platforms. For the mobile platforms Android, IPad and IPhone as well as for platform-independent Web-browser applications a limited functionality is provided. Especially the mobile applications provide additional remote control features over WLAN and UMTS networks to control the IPTV Assistant from mobile smartphones the user uses anyway all day. Mobile use cases range from an ordinary TV remote control to an IPTV session transfer from TV to the mobile device and vice versa, whereby streams can be put to or cloned between devices. This Windows based IPTV Assistant application is a hybrid application consisting of a web application based on ECMA and HTML/CSS for the EPG and recommendation views and a Java based application utilizing SIP and H.264 streaming for live TV stream

Page 17: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 17

delivery. The web based approach allows the platform-independent integration of our application as long as a web browser is available. The IPTV Assistant is based on a full-featured SIP based VoIP phone, packed with basic functionalities like audio and video telephony and additional SIP based services such as contacts and presence management, instant messaging and VoIP call and TV streaming session transfer. The presence service used to manage service- and user-related presence information (e.g. availability, status, situation) is enhanced for the IPTV context to hold valuable context information like the program currently watched (See-what-I-see) by a user or information about devices that are currently available. In our IPTV system development we are following an IP Multimedia Subsystem (IMS) based approach, which is announced by several standardization bodies like the ETSI TISPAN, ITU-T and Open IPTV Forum for future NGN based IPTV architectures. Here the IPTV service control functions and applications are built on top of an IMS based service control core that uses SIP for service control. As the IMS provides a platform for converged, personalized, and controlled person-to-person and person-to-content communication services in next generation networks, it represents a well suited infrastructure to run unified IPTV services. Our approach utilizes the IMS and signalisation according to NGN based IPTV standards to initiate and control IPTV services, provided by a dedicated IPTV application server in conjunction with a Media Resource Function. Moreover, standardized service building blocks such as Group Management and Presence Enablers are used. The recommender components are integrated in the service layer of the IPTV architecture, too.

Figure 13: A screenshot of the start page from the SERUM web application.Figure 13: A screenshot of the start page from the SERUM web application.Figure 13: A screenshot of the start page from the SERUM web application.Figure 13: A screenshot of the start page from the SERUM web application.

Page 18: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 18

The use of recommenders for personalized TV environments is common practise and the utilization of semantic techniques for the underlying data representation, on which the recommenders are working, is fairly adopted. The IPTV Assistant makes use of a smart recommender, which uses such semantic technologies to provide personalized recommendations of resources to the user. In the IPTV context the smart recommender is used to identify the most relevant TV shows and videos for a user based on his interests, watching behaviour, social networks, and the feedback provided to the system. For the provisioning of an enriched content view, the semantic recommendation system aggregates EPG and related data from several sources for transcription and presentation in a semantic store.

The Integration of the semantic Engine into personalized IPTV

In this section we describe the IPTV recommender system as an example setting for the Semantic Engine Framework. In the Internet Protocol Television system, users can watch TV programs over the Internet. In addition to the functionality provided by regular television, our IPTV includes a semantic recommender system based on the Semantic Engine. The primary entity types are users and items, which are TV programs in this case. The main relationship types connect users and items. In our example these are view, flashback, rating, record and reminder events. This scenario shows a common feature of recommender systems: several relationships connect the same entity types. Other relationship types connect secondary entities such as location, genre, series and title. User-user relationships are represented by message events and buddy lists, both common in recommender systems. This dataset also contains higher-order relationship types, in the form of tag assignments and shared events. Recommendations in this dataset can be computed by using a recommender model and a recommender index. This example also shows how difficult it is, in general, to find and build a good hybrid recommender system out of simple recommender systems, because the number of relationship types is too large to be optimized by trial and error.

Data sources and semantic data

WebTV datasets that can potentially be used consist of entities of different types and relationships that connect them. The following table gives an overview of entity and relationship types. Entity typeEntity typeEntity typeEntity type Dataset sourceDataset sourceDataset sourceDataset source Obtained byObtained byObtained byObtained by User IPTV user database Registration process Program IPTV and EPG program and

movie data EPG data stream

Program metadata Information about actors, related movies, etc collected from IMDB or DBPedia

Using open APIs

Genre EPG data Part of the EPG data and can be refined by the collected program meta data

Table 2: The Entity Types and the data sources used in the IPTV system.Table 2: The Entity Types and the data sources used in the IPTV system.Table 2: The Entity Types and the data sources used in the IPTV system.Table 2: The Entity Types and the data sources used in the IPTV system.

Page 19: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 19

Relationship typeRelationship typeRelationship typeRelationship type Connected entiConnected entiConnected entiConnected entity typesty typesty typesty types Obtained byObtained byObtained byObtained by Rating User, program Explicit user rating or

implicit interest prediction based on the user behavior and/or watch time

Genre/program information

Program, genre Collected from EPG data stream and from open data sources like IMDB and DBPedia

Implicit feedback User, program, advertisement

Interest prediction based on implicit user feedback

Advertisement information

Advertisement, genre, program

Explicitly set

User profile User, genre Explicitly entered user information during the registration process and information collected using implicit feedback

Social connection User, user Explicitly set by the user or computed based on TV watching behavior similarities

Table 3: The entity and the relationship types used in the IPTV system.Table 3: The entity and the relationship types used in the IPTV system.Table 3: The entity and the relationship types used in the IPTV system.Table 3: The entity and the relationship types used in the IPTV system.

These datasets can be used by the Universal Recommender if they are available in the Semantic Store. Not all entity and relationships are necessary for recommendation. The Universal Recommender also allows a partial Semantic Store.

Recommender scenario

The following recommendation scenarios are implemented by the Universal Recommender. ScenarioScenarioScenarioScenario Involved entity typesInvolved entity typesInvolved entity typesInvolved entity types Involved relationship Involved relationship Involved relationship Involved relationship

typestypestypestypes

Program recommendation User, program Program rating Advertisement recommendation

User, program, advertisement

Program rating, advertisement information

Table 4: The currently supported recommender scenarios in the IPTV system.Table 4: The currently supported recommender scenarios in the IPTV system.Table 4: The currently supported recommender scenarios in the IPTV system.Table 4: The currently supported recommender scenarios in the IPTV system.

For a recommendation scenario to be implemented by the Universal Recommender, the Semantic Store must provide all involved entity and relationship types. By analyzing

Page 20: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 20

the Semantic Store as a whole, the Universal Recommender is however also able to consider all other entity and relationship types in the Semantic Store.

Evaluation and Experiences

By combining the Semantic Store and the Semantic Engine a flexible recommendation engine for WebTV can be written without domain-specific knowledge of the dataset at hand, under the constraint that the dataset can be mapped to the semantic store.

Page 21: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 21

SERUM: Music Recommender based on encyclopedic data

The Project SERUM3 connects unstructured news with large semantic entertainment datasets. The key functionality is a recommender that suggests relevant entities (artists, albums, tracks) and news articles for potentially relevant artists. The basis for the recommender is an encyclopedic data set, retrieved from Freebase4. The recommendations are computed based on the semantic engine framework. The user interface is implemented as web application that allows the user to enter explicit preferences (Figure 14 shows the start page of the SERUM web application). The support of implicit feedback is planned for the near future.

Figure Figure Figure Figure 11114444: A screenshot of the start page from the SERUM web application.: A screenshot of the start page from the SERUM web application.: A screenshot of the start page from the SERUM web application.: A screenshot of the start page from the SERUM web application.

The first experiences with the SERUM web application show that the based on the encyclopedic dataset and the developed framework highly relevant recommendations are computed [PLR10]. Due to the provided explanations the users understand, why the respective have been recommended and in which extend they are semantically related

to the user preferences. The size of the used encyclopedic datasets (≈2.0 Mio nodes

and ≈1.6 Mio edges) allows computing recommendation even for only regionally known artist. Unfortunately the encyclopedic dataset does not provide ratings or information about the popularity of artists or artist. Thus we plan to enrich the dataset with rating data. This will allow us to ensuring that a recommended artist is not only semantically related, but also matches the user's individual preference.

3 an acronym for "Semantische Empfehlungen basierend auf grossen, unstrukturierten Datenmengen":

http://www.dai-labor.de/irml/serum/. SERUM is a joint project between the TU Berlin and the Neofonie GmbH. 4 http:// www.freebase.com

Page 22: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 22

The SERUM data sources

The semantic data used in the SERUM application are retrieved from February 2011 from freebase. The most important entity set are AAAArtistrtistrtistrtist, , , , AlbumAlbumAlbumAlbum, TTTTracksracksracksracks and GGGGenresenresenresenres. Figure 15 shows the number of entities for each type and the number of edges between the respective entity sets.

Figure 15Figure 15Figure 15Figure 15: The number of entities and the number of edges in the dataset : The number of entities and the number of edges in the dataset : The number of entities and the number of edges in the dataset : The number of entities and the number of edges in the dataset

deployed in SERUMdeployed in SERUMdeployed in SERUMdeployed in SERUM For the evaluation of the recommender quality, we use data from LastFM. We used a

dataset consisting of ≈500k triples containing the user ID an artist and a track. The evaluation considers the commonly used quality measure such as MAP and F-Measure [HKTR].

The SERUM recommender scenarios

Due to the fact that SERUM is based on a graph based dataset the best matching node of the user input (a string) must be determined. The system deploys several fuzzy matching algorithms for computing potentially relevant entities. Based on data about the entity type and a textual entity description the user chooses the right entity.

FigFigFigFigure ure ure ure 11116666: Entity Disambiguation: Entity Disambiguation: Entity Disambiguation: Entity Disambiguation in the SERUM web application. in the SERUM web application. in the SERUM web application. in the SERUM web application.

Page 23: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 23

The entity chosen by the user in used as input fort he recommender. The SERUM recommender deploys the music ontology for computing the most relevant albums and tracks for the user input. Additional the systems searches for the user input entity in current news articles. The last news articles are presented to the user. Furthermore, a popularity index is computed, visualizing how many news articles were found for the respective entity in the last month. Figure 17 shows the popularity index for the artist "Elvis Presley" and the recommended albums and tracks.

Figure Figure Figure Figure 11117777: Recommendations and : Recommendations and : Recommendations and : Recommendations and ppppopularityopularityopularityopularity index for a user defined entity. index for a user defined entity. index for a user defined entity. index for a user defined entity.

The key feature of the SERUM web application is to recommend artist for the user input. This is done by aggregating the results computed based on several different

Page 24: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 24

recommender models. Figure 18 shows the recommended artist for the user input "Elvis Presley". For each suggested artist the systems presents the three last news articles found. Furthermore the types assigned to each entity are shown.

Figure Figure Figure Figure 11118888: Recommended Artists: Recommended Artists: Recommended Artists: Recommended Artists and the assigned news articles for these artists. and the assigned news articles for these artists. and the assigned news articles for these artists. and the assigned news articles for these artists.

Page 25: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 25

Summary The semantic engine framework allows the efficient processing of large semantic datasets (due to the different supported recommender models) and the creation of adaptive recommender systems. The available components support several different recommender algorithms and graph models. This enables the context-aware selection of the most appropriate recommender algorithms. In contrast to many existing frameworks we support different semantic edges types within a semantic dataset and provide a meta-recommender that allows us to combine recommender results computed based on different recommender models. In future work we will focus on machine learning algorithms for optimizing the parameter settings for building the recommender models and for aggregating the edge weights for long paths. Another current research topic is the integration of addition semantic datasets allowing an improved personalization of computed recommendations.

Page 26: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 26

Bibliography

[AST05] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Information Systems, 23(1):103–145, 2005.

[AT05] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Trans. on Knowledge and Data Engineering, 17:734–749, 2005.

[ASA07] S. Albayrak, S. Wollny, A. Lommatzsch and D. Milosevic: Agent Technology for Personalized Information Filtering: The PIA System In Scalable Computing: Practice and Experience, 2007, vol. 8, pages 29-40

[ABB06] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In Proc. Int. Conf. on Machine Learning, pages 17–24, 2006.

[AWL07] S. Albayrak, S. Wollny, A. Lommatzsch, and D. Miloševíc. Agent technology for personalized information filtering: The PIA system. Scalable Computing: Practice and Experience, 8(1):29–40, 2007.

[AZ05] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Trans. on Knowledge and Data Engineering, 17:734–749, 2005.

[BR99] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.

[BAA07] C. Bauckhage, T. Alpcan, S. Agarwal, F. Metze, R. Wetzker, M. Ilíc, and S. Albayrak. An intelligent knowledge sharing system for web communities. In Proc. Int. Conf. on Systems, Man and Cybernetics, pages 3069–3074, 2007.

[BK07] R. Bell and Y. Koren. Improved neighborhood-based collaborative filtering. In Proc. KDDCup, pages 7–14, 2007.

[Be06] J. Bennett. The Cinematch system: Operation, scale coverage, accuracy, impact. Summer School on the Present and Future of Recommender Systems, 2006.

[BCA07] C. Bizer, R. Cyganiak, S. Auer, and G. Kobilarov. DBpedia.org–querying Wikipedia like a database. In Proc. Int. World Wide Web Conf., 2007.

[BNJ03] D. Blei, A. Ng, M. Jordan, and J. Lafferty. Latent Dirichlet allocation. Machine Learning Research, 3:993–1022, 2003.

[BP07] L. Brožovský and V. Petˇríˇcek. Recommender system for online dating service. In Proc. Znalosti, pages 29–40, 2007.

[BHS08] M. Brzozowski, T. Hogg, and G. Szabo. Friends and foes: Ideological social networking. In Proc. Conf. on Human Factors in Computing Systems, pages 817–820, 2008.

[EC07] K. Emamy and R. Cameron. CiteULike: A researcher’s social bookmarking service. Ariadne, (51), 2007.

[FPR07] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. Trans. on Knowledge and Data Engineering, 19(3):355–369, 2007.

Page 27: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 27

[FYP06] F. Fouss, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of graph kernels on a collaborative recommendation task. In Proc. Int. Conf. on Data Mining, pages 863–868, 2006.

[GG05] E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. In Proc. Int. Conf. on Research and Development in Information Retrieval, pages 601–602, 2005.

[GRG01] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2):133–151, 2001.

[GA03] R. Guerreiro and P. Aguiar. Estimation of rank deficient matrices from partial observations: Two-step iterative algorithms. In Proc. Int. Conf. on Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 450–466, 2003.

[GKR04] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In Proc. Int. World Wide Web Conf., pages 403–412, 2004.

[HKB99] J. Herlocker, J. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In Proc. Int. Conf. on Research and Development in Information Retrieval, pages 230–237, 1999.

[HKTR] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Information Systems, 22(1):5–53, 2004.

[Ho99] T. Hofmann. Probabilistic latent semantic analysis. In Proc. Conf. on Uncertainty in Artificial Intelligence, pages 289–296, 1999.

[HJS06] A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. BibSonomy: A social bookmark and publication sharing system. In Proc. Workshop on Conceptual Structure Tool Interoperability, pages 87–102, 2006.

[HCC04] Z. Huang, W. Chung, and H. Chen. A graph model for E-commerce recommender systems. American Society for Information Science and Technology, 55(3):259–274, 2004.

[ISK05] T. Ito, M. Shimbo, T. Kudo, and Y. Matsumoto. Application of kernels to link analysis. In Proc. Int. Conf. on Knowledge Discovery in Data Mining, pages 586–592, 2005.

[JMF99] A. K. Jain, M. N. Murty and P. J. Flynn: Data clustering: a review In ACM Computing Surveys, 1999, vol. 31, issn 0360-0300, pages 264-323

[KSC02] J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems, pages 657–664, 2002.

[KL09] J. Kunegis and A. Lommatzsch. Learning spectral graph transformations for link prediction. In ICML '09: Proc. of the 26th Int. Conf. on Machine Learning, 09, pages 1-8

[Kl99] J. Kleinberg. Authoritative sources in a hyperlinked environment. ACM, 46(5):604–632, 1999.

[KY04] B. Klimt and Y. Yang. The Enron corpus: A new dataset for email classification research. In Proc. Eur. Conf. on Machine Learning, pages 217–226, 2004.

[Ko08] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proc. Int. Conf. on Knowledge Discovery and Data Mining, pages 426–434, 2008.

Page 28: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 28

[KLA10] J. Kunegis, E. W. De Luca, and S. Albayrak. The link prediction problem in bipartite networks. In Proc. Int. Conf. in Information Processing and Management of Uncertainty in Knowledge-based Systems, 2010.

[KLB08] J. Kunegis, A. Lommatzsch, and C. Bauckhage. Alternative similarity functions for graph kernels. In Proc. Int. Conf. on Pattern Recognition, 2008.

[KLB09b] J. Kunegis, A. Lommatzsch, and C. Bauckhage. The Slashdot Zoo: Mining a social network with negative edges. In Proc. Int. World Wide Web Conf., pages 741–750, 2009.

[KLA08] J. Kunegis, A. Lommatzsch, C. Bauckhage, and S. Albayrak. On the scalability of graph kernels applied to collaborative recommenders. In Proc. ECAI Workshop on Recommender Systems, pages 35–38, 2008.

[KS07] J. Kunegis and S. Schmidt. Collaborative filtering using electrical resistance network models with negative edges. In Proc. Industrial Conf. on Data Mining, pages 269–282, 2007.

[KSL10] J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E. W. De Luca, and S. Albayrak. Spectral analysis of signed graphs for clustering, prediction and visualization. In Proc. SIAM Int. Conf. on Data Mining, 2010.

[KLP10] H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. Int. World Wide Web Conf., 2010.

[La95] K. Lang. NewsWeeder: Learning to filter Netnews. In Proc. Int. Conf. on Machine Learning, 1995.

[LHK10] J. Leskovec, D. Huttenlocher, and J. Kleinberg: Predicting positive and negative links in online social networks In Proc. of the 19th Int. conf. on WWW, 2010, pages 641-650

[LS98] O. Lassila and R. R. Swick. Resource description framework (RDF) model and syntax, 1998.

[Le02] M. Ley. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proc. Int. Symposium on String Processing and Information Retrieval, pages 1–10, 2002.

[Lee97] J. H. Lee: Analyses of multiple evidence combination In SIGIR '97: Proc. of the 20th Int. ACM SIGIR conf. on Research and development in inf. retrieval, 1997, pages 267-276

[LSY03] G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003.

[Mi09] A. Mislove. Online Social Networks: Measurement, Analysis, and Applications to Distributed Information Systems. PhD thesis, Rice University, 2009.

[Ne01] M. E. J. Newman. The structure of scientific collaboration networks. Proc. National Academy of Sciences, 98(2):404–409, 2001.

[PBM98] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.

[PLR10] T. Plumbaum, Andreas Lommatzsch, Stefan Rudnitzki, Ernesto William De Luca, Holger Düwiger, Sahin Albayrak: Adaptive Music News Recommendations based on Large Semantic Datasets In: WOMRAD 2010 - Workshop on Music Recommendation and Discovery, colocated with ACM RecSys 2010; 2010

[Po06] R. Polikar: Ensemble based systems in decision making In Circuits and Systems Magazine, IEEE, 2006, vol. 6, pages 21 - 45

Page 29: Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt ... · Andreas Lommatzsch, Jérôme Kunegis, Torsten Schmidt, Stefan Marx Whitepaper Semantic Engine Technologies and Solutions

Semantic Engine - Technologies and Solutions

DAI-Labor, TU-Berlin 29

[RIS94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An open architecture for collaborative filtering of NetNews. In Proc. Conf. on Computer Supported Cooperative Work, pages 175–186, 1994.

[SKD06] N. Sahoo, R. Krishnan, G. Duncan, and J. Callan. Collaborative filtering with multicomponent rating for recommender systems. In Proc. Workshop on Information Technologies and Systems, 2006.

[SKK00] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender systems–a case study. In Proc. ACM WebKDD Workshop, 2000.

[SNM08] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos: Providing Justifications in Recommender Systems In IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 2008, vol. 38, 6, pages 1262 - 1272, issn 1083-4427

[SKK02] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental SVD-based algorithms for highly scalable recommender systems. In Proc. Int. Conf. on Computer and Information Technology, pages 399–404, 2002.

[St05] D. Stewart. Social status in an open-source community. American Sociological Review, 70(5):823–842, 2005.

[SZL05] J. Sun, H. Zeng, H. Liu, Y. Lu, and Z. Chen. CubeSVD: A novel approach to personalized web search. In Proc. Int. World Wide Web Conf., pages 382–390, 2005.

[TPN08] G. Takács, I. Pilászy, B. Németh, and D. Tikk. Matrix factorization and neighbor based algorithms for the Netflix Prize problem. In Proc. Int. Conf. on Recommender Systems, pages 267–274, 2008.

[VMC09] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in Facebook. In Proc. Workshop on Online Social Networks, pages 37–42, 2009.

[WWZ07] Y. Wang, H. Wang, H. Zhu, and Y. Yu. Exploit semantic information for category annotation recommendation in Wikipedia. In Natural Language Processing and Information Systems, pages 48–60, 2007.

[WC04] Y. Wu and E. Y. Chang. Distance-function design and fusion for sequence data. In Proc. Int. Conf. on Information and Knowledge Management, pages 324–333, 2004.

[XSB06] L. Xiao, J. Sun, and S. Boyd. A duality view of spectral methods for dimensionality reduction. In Proc. Int. Conf. on Machine learning, pages 1041–1048, 2006.

[ZOZ07] D. Zhou, S. Orshanskiy, H. Zha, and C. Giles. Co-ranking authors and documents in a heterogeneous network. In Proc. Int. Conf. on Data Mining, pages 739–744, 2007.