gcontext: a context-based query construction service for google

GContext: A context-based query construction service for Google

Ioannis Apostolatos and Ioannis Papadakis

Ionian University, Greece

Presentation outline Introduction Rationale Proposed approach Usage scenarios Discussion

Introduction At the web, information about virtually

anything can be found, provided that a searcher knows where to look

Searchers largely rely on large-scale web search engines – SE in order to get assistance in locating useful resources

The quality of the search results depends on the ability of the searchers to accurately express their information needs as keywords in the search engine's input box

How do SE aid their users in creating successful queries?

Rationale The query construction phase of a search

session is crucial to the fulfillment of the searchers’ information needs

During the query construction phase, a searcher has to express his information needs according to the specific dialect (i.e. keywords-based) of the underlying SE

The searcher has to 'guess' the words that the SE has chosen to index the web resources that correspond to such needs

Rationale Spoken languages have certain features that

should be taken under consideration: Polysemy of words

Polysemy occurs when a word has more than one sense A query that consists of an ambiguous word without

further information that correctly disambiguates it may result in a search results list with completely useless information

Synonymy of words Synonymy occurs when two or more words share the

same meaning The probability of two persons using the same term in

describing the same thing is less than 20%

Proposed approach

A query construction/refinement service on top of Google SE that is powered by the LOD cloud and especially DBpedia

The proposed service is a two-step process:1. Initially, it provides autosuggest functionality by

reacting to the corresponding keystrokes of a searcher Prefix search is performed to an index that is comprised of

words and/or phrases originating from Wikipedia and made available through Dbpedia (‘article titles’ dataset)

Such functionality facilitates query disambiguations, since Wikipedia's disambiguations follow a pattern that is promoted by prefix search i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)

DBpedia’s suggestions are appended to Google’s original suggestions

Proposed approach The proposed service is a two-step process:

(continue…)2. Upon selection of a suggestion, the searcher is

offered the chance to refine the initial query through the appropriate interactions that are provided by the service (i.e. query replacements and refinements)

Query replacements and refinements derive from the results of SPARQL queries that are addressed to DBpedia's endpoint

Every interaction results to the construction of an appropriate query that is addressed to Google's Custom Search, which, in turn, provides the corresponding search results

Proposed approach – under the hood: Query replacements

Words or phrases that correspond to alternatives to the suggestion the user has chosen from the search box

They are actually Wikipedia's redirections of the article's title that the user selected from the search box SPARQL query evolves around the

<http://dbpedia.org/ontology/wikiPageRedirects> predicate

Proposed approach – under the hood: Query refinements

Query refinements are keywords that a user can add to the initial query in order to semantically refine it. They are organized in three groups: Categories Wordnet categories and Context words

The 'Categories' group is populated with the categories of the Wikipedia's article that the user selected from the search box Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject>

predicate The 'Wordnet categories' group is populated with the wordnet

categories of the Wikipedia's title that the user selected from the search box Corr. SPARQL query evolves around the

<http://dbpedia.org/property/wordnet_type> predicate The group 'Context words' is populated with information deriving from

the infobox of the corresponding Wikipedia's article Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*>

predicate along with numerous ‘FILTER’ clauses

Usage scenarios: Autosuggestions

Dealing with ambiguous queries: Jaguar the hero from Archie Comics

Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from Archie Comics

Usage scenarios: Query replacements

Usage scenarios: Query refinements

Discussion

So, can we compete Google? Certainly not: Linked data is full of ‘noise’

Things could improve if we all put some effort into it: http://pedantic-web.org/

SPARQL endpoints are often too slow to respond Unions are expensive “FILTER regex” clauses take forever to resolve

Maybe the Database community provides solutions that will speed things up

Size matters Google’s index size is far greater and fresher

And much more…

http://pedantic-web.org/

Discussion

Then, why bother? We believe that GContext can be seamlessly

integrated with any major search engine that provides access to it’s search box

What about the ‘knowledge graph’? Too early to jump to any conclusions. It was

announced on May 16th, so far only partially deployed

A proof that we are on the right tracks: “… go deeper and broader” i.e. infoboxes from DBpedia “… Find the right thing” i.e. PageRedirects from DBpedia

Discussion

Thank you very much,

Questions?

gcontext: a context-based query construction service for google

Technology