brainwriting - keystone-cost.eu fileanalysis procedure • task 1: organize – clustered into 6...
TRANSCRIPT
Analysis Procedure
• Task 1: Organize– Clustered into 6 groups based on the questions addressed
• Task 2: Review– 5 people were assigned to read the forms of each cluster
• Task 3: Read– Read and highlight main keywords in text with a pen
• Task 4: Write– Each reviewer wrote a “bullet list” summary report
• Task 5: Integrate– One reviewer prepared the slides by integrating duplicated content
across questions
Question 1: Challenges
• What are the 3 main challenges in
keyword-based search for structured
data sources and data analytics?
Question 1: Challenges
• Understand Query Semantics
– Associate Keywords to Concepts
– Match User profile
– Different Facets of the same dataset
– For being queried by different people
Question 1: Challenges
• Understand Query Semantics– Associate Keywords to Concepts
– Match User profile
– Different Facets of the same dataset
– For being queried by different people
• Query Compilation– Relevant Keywords
• More than flat list of keywords
– Uncertainty and Session
Question 1: Challenges
• Presentation of Results
– Annotate & Contextualize the results
– Give Ids for future presentation, Use and Re-
Use
– Enhance the results with more related
information
– Visualization
– Explanations
Question 1: Challenges
• Presentation of Results
– Annotate & Contextualize the results
– Give Ids for future presentation, Use and Re-Use
– Enhance the results with more related information
– Visualization
– Explanations
• Other
– Evolution
– Performance of different Algorithms
Question 2: Scenario/Use case
• Which practical scenario you think can
benefit from keyword search research on
big data?
Question 3: Methods
• How can the user be supported in the
formulation of keyword queries/analysis
of the obtained results?
Question 3: Methods
• Exploit an ontology/semantic network– Analysis of the keywords against an ontology
– Suggest relating keywords and ranking results
– Use sources like dbpedia / Freebase
– Links between keywords / link to other keywords
• User profiling– Popular queries/keywords from similar users
• Exploit statistics and historical data
Question 3: Methods
• Auto-completion– Exploiting images
– Query expansion / enrichment
• Disambiguation– Query and acronym
• Displaying results– Interactive / incremental learning technique
– Faceted results
– Summaries
• Finding related queries
Question 3: Methods
• Two output from search engines– One for humans, the second for machines
• Structure– Ontologies
– Graph-based stuctures: nodes with multiple meanings associated to the results / application of community detection algorithms
– Clusters and subclusters providing results with different levels of granularity
– Explain why results were selected
Question 3: Methods
• Returned content
– Manage ambiguity
– Provide small chunks of data
– Provide results with a significance measure
• Return multimedia data related to a
query
Question 4:
Benchmarking/Evaluation
• What kind of benchmarking environments (which include scalability, accuracy, and feasibility) can be devised for big data analysis?
• Content
• Functionality
• Environments
• Metrics
Question 4:
Benchmarking/Evaluation
• Content
– Wikipedia, wordnet, social networks,
manufacturing
– Multilayer integration framework
• Linked users, linked followers, spreading of
information,
– Combine log files from information systems
Question 4:
Benchmarking/Evaluation
• Functionality
– Provenance
– Visualization techniques
– Compare different algorithms
Question 4:
Benchmarking/Evaluation
• Environment
– Realtime vs. offline
– Software vs hardware/infrastructure
– Go-To center
• Evaluate software/hardware requirements for
specific analytics over big data
Question 4:
Benchmarking/Evaluation
• Metrics
– Standardized benchmarks
– Guidelines for benchmarking
Question 5: Application Fields
• How can the results of KEYSTONE be
used to add value to open data and foster
the creation of data intensive
companies?
Question 5: Application Fields
• Guidelines
– To create standardized vocabularies
– Standardized vocabularies to federate data sources
– To create open data repositories
– To anonymize data from companies
• Evaluation of current techniques and tools
• User-friendly tools
– To create search services and integrate resources
Question 5: Application Fields
• Software focused on semantic search,
mining data, sentiment analysis and
machine learning
• Evaluation of current techniques and
tools
• User-friendly tools to create search
services and integrate resources
a) Other results
• Sentiment analysis techniques
• Machine Learning and Data Mining techniques
• To improve the quality of the retrieval
• Intuitive tools for search and publication of structured data
• Scalable Semantic search algorithms
• Connections between academics/researchers and real-world/enterprises/companies
• H2020 writing
• Gathering the expertise from different groups