gcontext: a context-based query construction service for google

19
GContext: A context-based query construction service for Google Ioannis Apostolatos and Ioannis Papadakis Ionian University, Greece

Upload: john-pap

Post on 28-Nov-2014

385 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: GContext: A context-based query construction service for Google

GContext: A context-based query construction service for Google

Ioannis Apostolatos and Ioannis Papadakis

Ionian University, Greece

Page 2: GContext: A context-based query construction service for Google

Presentation outline Introduction Rationale Proposed approach Usage scenarios Discussion

Page 3: GContext: A context-based query construction service for Google

Introduction At the web, information about virtually

anything can be found, provided that a searcher knows where to look

Searchers largely rely on large-scale web search engines – SE in order to get assistance in locating useful resources

The quality of the search results depends on the ability of the searchers to accurately express their information needs as keywords in the search engine's input box

How do SE aid their users in creating successful queries?

Page 4: GContext: A context-based query construction service for Google

Rationale The query construction phase of a search

session is crucial to the fulfillment of the searchers’ information needs

During the query construction phase, a searcher has to express his information needs according to the specific dialect (i.e. keywords-based) of the underlying SE

The searcher has to 'guess' the words that the SE has chosen to index the web resources that correspond to such needs

Page 5: GContext: A context-based query construction service for Google

Rationale Spoken languages have certain features that

should be taken under consideration: Polysemy of words

Polysemy occurs when a word has more than one sense A query that consists of an ambiguous word without

further information that correctly disambiguates it may result in a search results list with completely useless information

Synonymy of words Synonymy occurs when two or more words share the

same meaning The probability of two persons using the same term in

describing the same thing is less than 20%

Page 6: GContext: A context-based query construction service for Google

Proposed approach

A query construction/refinement service on top of Google SE that is powered by the LOD cloud and especially DBpedia

The proposed service is a two-step process:1. Initially, it provides autosuggest functionality by

reacting to the corresponding keystrokes of a searcher Prefix search is performed to an index that is comprised of

words and/or phrases originating from Wikipedia and made available through Dbpedia (‘article titles’ dataset)

Such functionality facilitates query disambiguations, since Wikipedia's disambiguations follow a pattern that is promoted by prefix search i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)

DBpedia’s suggestions are appended to Google’s original suggestions

Page 7: GContext: A context-based query construction service for Google

Proposed approach The proposed service is a two-step process:

(continue…)2. Upon selection of a suggestion, the searcher is

offered the chance to refine the initial query through the appropriate interactions that are provided by the service (i.e. query replacements and refinements)

Query replacements and refinements derive from the results of SPARQL queries that are addressed to DBpedia's endpoint

Every interaction results to the construction of an appropriate query that is addressed to Google's Custom Search, which, in turn, provides the corresponding search results

Page 8: GContext: A context-based query construction service for Google

Proposed approach – under the hood: Query replacements

Words or phrases that correspond to alternatives to the suggestion the user has chosen from the search box

They are actually Wikipedia's redirections of the article's title that the user selected from the search box SPARQL query evolves around the

<http://dbpedia.org/ontology/wikiPageRedirects> predicate

Page 9: GContext: A context-based query construction service for Google

Proposed approach – under the hood: Query refinements

Query refinements are keywords that a user can add to the initial query in order to semantically refine it. They are organized in three groups: Categories Wordnet categories and Context words

The 'Categories' group is populated with the categories of the Wikipedia's article that the user selected from the search box Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject>

predicate The 'Wordnet categories' group is populated with the wordnet

categories of the Wikipedia's title that the user selected from the search box Corr. SPARQL query evolves around the

<http://dbpedia.org/property/wordnet_type> predicate The group 'Context words' is populated with information deriving from

the infobox of the corresponding Wikipedia's article Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*>

predicate along with numerous ‘FILTER’ clauses

Page 10: GContext: A context-based query construction service for Google

Usage scenarios: Autosuggestions

Dealing with ambiguous queries: Jaguar the hero from Archie Comics

Page 11: GContext: A context-based query construction service for Google

Usage scenarios: Autosuggestions

Dealing with ambiguous queries: Jaguar the hero from Archie Comics

Page 12: GContext: A context-based query construction service for Google

Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from Archie Comics

Page 13: GContext: A context-based query construction service for Google

Usage scenarios: Query replacements

Page 14: GContext: A context-based query construction service for Google

Usage scenarios: Query refinements

Page 15: GContext: A context-based query construction service for Google

Usage scenarios: Query refinements

Page 16: GContext: A context-based query construction service for Google

Usage scenarios: Query refinements

Page 17: GContext: A context-based query construction service for Google

Discussion

So, can we compete Google? Certainly not: Linked data is full of ‘noise’

Things could improve if we all put some effort into it: http://pedantic-web.org/

SPARQL endpoints are often too slow to respond Unions are expensive “FILTER regex” clauses take forever to resolve

Maybe the Database community provides solutions that will speed things up

Size matters Google’s index size is far greater and fresher

And much more…

Page 18: GContext: A context-based query construction service for Google

Discussion

Then, why bother? We believe that GContext can be seamlessly

integrated with any major search engine that provides access to it’s search box

What about the ‘knowledge graph’? Too early to jump to any conclusions. It was

announced on May 16th, so far only partially deployed

A proof that we are on the right tracks: “… go deeper and broader” i.e. infoboxes from DBpedia “… Find the right thing” i.e. PageRedirects from DBpedia

Page 19: GContext: A context-based query construction service for Google

Discussion

Thank you very much,

Questions?