gcontext: a context-based query construction service for google
DESCRIPTION
TRANSCRIPT
GContext: A context-based query construction service for Google
Ioannis Apostolatos and Ioannis Papadakis
Ionian University, Greece
Presentation outline Introduction Rationale Proposed approach Usage scenarios Discussion
Introduction At the web, information about virtually
anything can be found, provided that a searcher knows where to look
Searchers largely rely on large-scale web search engines – SE in order to get assistance in locating useful resources
The quality of the search results depends on the ability of the searchers to accurately express their information needs as keywords in the search engine's input box
How do SE aid their users in creating successful queries?
Rationale The query construction phase of a search
session is crucial to the fulfillment of the searchers’ information needs
During the query construction phase, a searcher has to express his information needs according to the specific dialect (i.e. keywords-based) of the underlying SE
The searcher has to 'guess' the words that the SE has chosen to index the web resources that correspond to such needs
Rationale Spoken languages have certain features that
should be taken under consideration: Polysemy of words
Polysemy occurs when a word has more than one sense A query that consists of an ambiguous word without
further information that correctly disambiguates it may result in a search results list with completely useless information
Synonymy of words Synonymy occurs when two or more words share the
same meaning The probability of two persons using the same term in
describing the same thing is less than 20%
Proposed approach
A query construction/refinement service on top of Google SE that is powered by the LOD cloud and especially DBpedia
The proposed service is a two-step process:1. Initially, it provides autosuggest functionality by
reacting to the corresponding keystrokes of a searcher Prefix search is performed to an index that is comprised of
words and/or phrases originating from Wikipedia and made available through Dbpedia (‘article titles’ dataset)
Such functionality facilitates query disambiguations, since Wikipedia's disambiguations follow a pattern that is promoted by prefix search i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)
DBpedia’s suggestions are appended to Google’s original suggestions
Proposed approach The proposed service is a two-step process:
(continue…)2. Upon selection of a suggestion, the searcher is
offered the chance to refine the initial query through the appropriate interactions that are provided by the service (i.e. query replacements and refinements)
Query replacements and refinements derive from the results of SPARQL queries that are addressed to DBpedia's endpoint
Every interaction results to the construction of an appropriate query that is addressed to Google's Custom Search, which, in turn, provides the corresponding search results
Proposed approach – under the hood: Query replacements
Words or phrases that correspond to alternatives to the suggestion the user has chosen from the search box
They are actually Wikipedia's redirections of the article's title that the user selected from the search box SPARQL query evolves around the
<http://dbpedia.org/ontology/wikiPageRedirects> predicate
Proposed approach – under the hood: Query refinements
Query refinements are keywords that a user can add to the initial query in order to semantically refine it. They are organized in three groups: Categories Wordnet categories and Context words
The 'Categories' group is populated with the categories of the Wikipedia's article that the user selected from the search box Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject>
predicate The 'Wordnet categories' group is populated with the wordnet
categories of the Wikipedia's title that the user selected from the search box Corr. SPARQL query evolves around the
<http://dbpedia.org/property/wordnet_type> predicate The group 'Context words' is populated with information deriving from
the infobox of the corresponding Wikipedia's article Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*>
predicate along with numerous ‘FILTER’ clauses
Usage scenarios: Autosuggestions
Dealing with ambiguous queries: Jaguar the hero from Archie Comics
Usage scenarios: Autosuggestions
Dealing with ambiguous queries: Jaguar the hero from Archie Comics
Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from Archie Comics
Usage scenarios: Query replacements
Usage scenarios: Query refinements
Usage scenarios: Query refinements
Usage scenarios: Query refinements
Discussion
So, can we compete Google? Certainly not: Linked data is full of ‘noise’
Things could improve if we all put some effort into it: http://pedantic-web.org/
SPARQL endpoints are often too slow to respond Unions are expensive “FILTER regex” clauses take forever to resolve
Maybe the Database community provides solutions that will speed things up
Size matters Google’s index size is far greater and fresher
And much more…
Discussion
Then, why bother? We believe that GContext can be seamlessly
integrated with any major search engine that provides access to it’s search box
What about the ‘knowledge graph’? Too early to jump to any conclusions. It was
announced on May 16th, so far only partially deployed
A proof that we are on the right tracks: “… go deeper and broader” i.e. infoboxes from DBpedia “… Find the right thing” i.e. PageRedirects from DBpedia
Discussion
Thank you very much,
Questions?