event reports tomáš kliegr edbt 2008 kdd 2008 ecml/pkdd 2008 ssms 2008
TRANSCRIPT
Event reportsTomáš Kliegr
• EDBT 2008• KDD 2008• ECML/PKDD 2008• SSMS 2008
EDBT 2008
European Conference on Database Technology
Nantes, Francie
Příspěvek - PhD Workshop
• Vyší acceptance rate než v minulých letech (asi 46%)
• Dimensionality Reduction of Semantically Enriched Clickstreams
• Vynikající zpětná vazba – 5 obsáhlých recenzí• Velmi široký záběr témat• Postproceedings: Rozšířená verze
konferenčního příspěvku publikována v ACM DL
Vybrané workshop příspěvky• Improving the Accuracy of Entity Identification through Refinement
– the goal of entity identification is to correctly identify all the instances of the same entity so as to eliminate the inconsistency of data sources during data integration.
• Full-text indexing and Information Retrieval in P2P Systems – Distributed IR
• Reasoning about Taxonomies and Articulations – This work formalizes taxonomies and relationships between them as
formulas in logic. This formalization concretizes notions such as consistency and inconsistency of taxonomies and articulations (inter-taxonomic relations) between them, enables the derivation of new articulations based on a given set of taxonomies and articulations and provides a framework for testing assumptions about under-specified taxonomies.
Další čeští účastníci
• A Cost-based Join Selection for XML Twig Content-based Queries
• Radim Baca, Michal Kratky
Zaměření konference
• P2P• XML• Streaming
• Caching• Query Processing• Data Fusion
Industrial section
• Data Challenges at Yahoo!– Ricardo Baeza-Yates and Raghu Ramakrishnan
• Automatic Content Targeting on Mobile Phones
KDD 2008
14th ACM SIGKDD International Conference, Las Vegas
Příspěvek- MDM KDD Workshop
• Combining Image Captions and Visual Analysis for Image Concept Classification– Kliegr, Svátek, Nemrava, Chandramouli, Isquierdo
• Pro zajímavost, na stejném workshopu v minulosti publikoval Pavel Praks:
Multimedia Data Mining Workshop (Pavel Praks’05): Iris Recognition Using the SVD-Free Latent Semantic Indexing
Zajímavé příspěvky z workshopu
• Annotating images and image objects using a hierarchical Dirichlet process model– We apply this model for predicting labels of objects in
images containing multiple objects. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image.
• Mining the Web for Visual Concepts– Relevance feedback on Image + text data retrieved from
the web
Zajímavé příspěvky z konference
• Building Semantic Kernels for Text Classification using Wikipedia– In this paper, we overcome the shortages of the
BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents.
• Entity Categorization Over Large Document Collections– In this paper, we significantly improve the
accuracy of entity categorization by (i) considering an entity’s context across multiple documents containing it, and (ii) exploiting existinglarge lists of related entities (e.g., lists of actors, directors, books).
ArnetMiner: Extraction and Mining of Academic Social Networks
• Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results.
Heterogeneous Data Fusion for Alzheimer’s Disease Study
• In this paper, we propose to integrate heterogeneous data for AD prediction based on a kernel method. We further extend the kernel framework for selecting features (biomarkers) from heterogeneous data sources
• Experimental results show that the integration of multiple data sources leads to a coniderable improvement in the prediction accuracy.
Febrl – An Open Source Data Cleaning, Deduplication and
Record Linkage System with a Graphical User Interface
• Freely Extensible Biomedical Record• Linkage)• It contains many re-cently developed
techniques for data cleaning, deduplication and record linkage, and encapsulates them into a graphi-cal user interface (GUI).
• https://sourceforge.net/projects/febrl/
Using tagFlake for Condensing Navigable Tag Hierarchies from Tag Clouds
• Luigi Di Caro (University of Torino)
K. Selçuk Candan (Arizona State University)Maria Luisa Sapino (University of Torino)
Pictor: An Interactive System for Importing Data from a Website
• demonstration of an interactive wrapper in-duction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record-level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage.
Trendy
• Text minig• Advertising on web• The 2nd International Workshop on Data Min
ing and Audience Intelligence for Advertising (ADKDD 2008)
• Medical datamining– Workshop on Mining Medical Data and KDD Cup
2008
Further highlights
• The 2nd SNA-KDD Workshop on Social Network Mining and Analysis (SNA-KDD 2008)
• Workshop on Mining Medical Data and KDD Cup 2008
• The 2nd International Workshop on Mining Multiple Information Sources
ECML/PKDD 2008
European Conference on Machine Learning and Principles and Practice of Knowledge
Discovery in DatabasesAntwerpy
Příspěvek na WBBT Workshopu
• Wikis, Blogs and Bookmarking tools workshop• Chair: Bettina Berendt• Wikipedia As the Premiere Source for Targeted
Hypernym Discovery Tomas Kliegr, Vojtech Svatek, Krishna Chandramouli, Jan Nemrava and Ebroul Izquierdo
• http://www.kde.cs.uni-kassel.de/ws/wbbtmine2008/pdf/all_wbbtmine2008.pdf
Vybrané invited talks
• The Role of Hierarchies in Exploratory Data Mining– In a broad range of data mining tasks, the fundamental
challenge is to efficiently explore a very large space of alternatives. The difficulty is two-fold: first, the size of the space raises computational challenges, and second, it can introduce data sparsity issues even in the presence of very large datasets. In this talk, well consider how the use of hierarchies (e.g., taxonomies, or the OLAP multidimensional model) can help mitigate the problem.
Learning Language from Its Perceptual Context
• Raymond J. Mooney• The training data consists of textual human
commentaries on Robocup simulation games. A set of possible alternative meanings for each comment is automatically constructed from game event traces. Our previously developed systems for learning to parse and generate natural language (KRISP and WASP) were augmented to learn from this data and then commentate novel games. The system is evaluated based on its ability to parse sentences into correct meanings and generate accurate descriptions of game events.
Watch, Listen & Learn: Co-training on Captioned Images and Videos
• leverage the text that often accompanies visual data to learn robust models of scenes and actions from partially labeled collections. Our approach uses co-training.
Co-training
• semi-supervised learning algorithm that requires two distinct “views” of the training data
• First learns a separate classifier for each view using any labeled examples
• The most confident predictions of each classifier on the unlabeled data are then used to iteratively construct additional labeled training data.
SSMS 2008
3rd Summer School on Multimedia Semantics, Chania, Crete
Přehledové prezentace
• http://www.mesh-ip.eu/ssms08.aspx?Page=ssms08
• Prezentace možno stáhnout
Wrap up
• ECML 09– Bled, Slovenia– 7 Sep 2009 - 11 Sep 2009
• KDD 09– Paris, France– Jun 28-Jul 1, 2009
• EDBT/ICDT 2009– Saint-Petersburg, Russia– March 23-26