brainwriting - keystone-cost.eu fileanalysis procedure • task 1: organize – clustered into 6...

Brainwriting

COST KEYSTONE Meeting

Leiden, The Netherlands

24-25 March 2014

Analysis Procedure

• Task 1: Organize– Clustered into 6 groups based on the questions addressed

• Task 2: Review– 5 people were assigned to read the forms of each cluster

• Task 3: Read– Read and highlight main keywords in text with a pen

• Task 4: Write– Each reviewer wrote a “bullet list” summary report

• Task 5: Integrate– One reviewer prepared the slides by integrating duplicated content

across questions

Question 1: Challenges

• What are the 3 main challenges in

keyword-based search for structured

data sources and data analytics?


• Understand Query Semantics

– Associate Keywords to Concepts

– Match User profile

– Different Facets of the same dataset

– For being queried by different people


• Understand Query Semantics– Associate Keywords to Concepts

– Match User profile

– Different Facets of the same dataset

– For being queried by different people

• Query Compilation– Relevant Keywords

• More than flat list of keywords

– Uncertainty and Session


• Presentation of Results

– Annotate & Contextualize the results

– Give Ids for future presentation, Use and Re-

Use

– Enhance the results with more related

information

– Visualization

– Explanations


• Presentation of Results

– Annotate & Contextualize the results

– Give Ids for future presentation, Use and Re-Use

– Enhance the results with more related information

– Visualization

– Explanations

• Other

– Evolution

– Performance of different Algorithms

Question 2: Scenario/Use case

• Which practical scenario you think can

benefit from keyword search research on

big data?

Question 3: Methods

• How can the user be supported in the

formulation of keyword queries/analysis

of the obtained results?

Question 3: Methods

• Exploit an ontology/semantic network– Analysis of the keywords against an ontology

– Suggest relating keywords and ranking results

– Use sources like dbpedia / Freebase

– Links between keywords / link to other keywords

• User profiling– Popular queries/keywords from similar users

• Exploit statistics and historical data

Question 3: Methods

• Auto-completion– Exploiting images

– Query expansion / enrichment

• Disambiguation– Query and acronym

• Displaying results– Interactive / incremental learning technique

– Faceted results

– Summaries

• Finding related queries

Question 3: Methods

• What should be the result to a keyword

query in big data?

Question 3: Methods

• Two output from search engines– One for humans, the second for machines

• Structure– Ontologies

– Graph-based stuctures: nodes with multiple meanings associated to the results / application of community detection algorithms

– Clusters and subclusters providing results with different levels of granularity

– Explain why results were selected

Question 3: Methods

• Returned content

– Manage ambiguity

– Provide small chunks of data

– Provide results with a significance measure

• Return multimedia data related to a

query

Question 4:

Benchmarking/Evaluation

• What kind of benchmarking environments (which include scalability, accuracy, and feasibility) can be devised for big data analysis?

• Content

• Functionality

• Environments

• Metrics

Question 4:


• Content

– Wikipedia, wordnet, social networks,

manufacturing

– Multilayer integration framework

• Linked users, linked followers, spreading of

information,

– Combine log files from information systems

Question 4:


• Functionality

– Provenance

– Visualization techniques

– Compare different algorithms

Question 4:


• Environment

– Realtime vs. offline

– Software vs hardware/infrastructure

– Go-To center

• Evaluate software/hardware requirements for

specific analytics over big data

Question 4:


• Metrics

– Standardized benchmarks

– Guidelines for benchmarking

Question 5: Application Fields

• How can the results of KEYSTONE be

used to add value to open data and foster

the creation of data intensive

companies?


• Guidelines

– To create standardized vocabularies

– Standardized vocabularies to federate data sources

– To create open data repositories

– To anonymize data from companies

• Evaluation of current techniques and tools

• User-friendly tools

– To create search services and integrate resources


• Software focused on semantic search,

mining data, sentiment analysis and

machine learning

• Evaluation of current techniques and

tools

• User-friendly tools to create search

services and integrate resources

a) Other results

• Sentiment analysis techniques

• Machine Learning and Data Mining techniques

• To improve the quality of the retrieval

• Intuitive tools for search and publication of structured data

• Scalable Semantic search algorithms

• Connections between academics/researchers and real-world/enterprises/companies

• H2020 writing

• Gathering the expertise from different groups

brainwriting - keystone-cost.eu fileanalysis procedure • task 1: organize – clustered into 6...

Documents