Impact of Open Data and Linked Open Data
Venezuela
Maria-Esther Vidal
Universidad Simón Bolívar
1
[email protected] h1p://www.ldc.usb.ve/~mvidal Twi1er @Maria11576561 Skype: mevs2006
Lights around the London's 2012 Olympic stadium describe Sir Tim Berners-‐Lee's invenKon, the World Wide Web. The Open Data InsKtute, which he co-‐founded, declares a mandate of 'Knowledge for Everyone'.
Sir Tim Berners-‐Lee (right) and Sir Nigel Shadbolt (leT)
“The ODI announced new 13 nodes: US, Canada, France, Dubai, Italy, Russia, Sweden and ArgenKna.” Oct 29 2103
Agenda
Ø Open Data Ø Linked Open Data
ü Linked Open Data in Journalism Ø Linked Open Data Applications
ü Linked Open Data at USB Ø Conclusions and Future Directions
OPEN DATA
Open Data
6
Definition http://opendefinition.org/: “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”
Availability and access Reuse and Distribution Universal Participation
7/1/13 9:33 PMOpen_Data_stickers.jpg 1,024×768 pixels
Page 1 of 1http://ec.europa.eu/digital-agenda/sites/digital-agenda/files/Open_Data_stickers.jpg
Open Data
7
Availability and Access: Data should be available as a whole, preferably downloading via the Internet.
Data should be available in a convenient format.
Should be free or at most at a reproduction cost.
Open Data
8
Reuse and Distribution: Data should be offered in a way that it can be reused, distributed and be interrelated with other datasets.
Open Data
9
Universal Participation: Any person should be able to use, reuse and distribute.
NO discrimination: Commercial vs. NOT commercial Educational vs. NOT educational
Profit vs. No Profit
Type of Open Data
Why Open Data?
11
Interoperability Transparency
Why Open Data?
12
Avoid CorrupKon
Wealth Only in Europe over 140 billion of euros per year h1p://www.economist.com/news/business/21578084-‐making-‐official-‐data-‐public-‐could-‐spur-‐lots-‐innovaKon-‐new-‐goldmine
Why Open Data?
13
Quality of Life Research and Development
Why Open Data?
14
Data Quality Improve Public AdministraKon
Why Open Data?
15
Citizens can express themselves and unite so that their voices can be heard.
h1p://www.ted.com/talks/sanjay_pradhan_how_open_data_is_changing_internaKonal_aid.html
Open Licenses
Open Source
Open Standards
Open ParKcipaKon
Open Data
What is and what is not Open Data Open Data.
“A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.”
Difference between open data and data that is publicly available lies in the use of formats that may be read, used and redistributed by any citizen. Examples of public data that is not open data: data in spreadsheets, pdf, etc. Usually open data are csv. h1p://opensource.com/government/10/12/what-‐“open-‐data”-‐means-‐–-‐and-‐what-‐it-‐doesn’t
Opening Up Data Rules
Ø Keep it simple Ø Engage early and
engage often Ø Address common fears
and misunderstandings
Four Steps
Ø Choose your Dataset(s) Ø Apply an Open License Ø Make the data available Ø Make it discoverable
Open Data Conditions
Data Providers Requirements Ø Attribution: data
providers may require to receive credit.
Ø Integrity: data providers may require that users indicate if data change.
Ø Share-alike: data providers may impose that any dataset created using their data are also open.
Distributing Open Data Ø Data is machine-readable Ø Data is available in bulk
more than using an API.
h1p://opendatahandbook.org/en/
OPEN DATA APPLICATIONS
OPEN DATA AND GOVERNMENT
Some Open Data Applications
22
Al menos 77 países cumplen nivel > 2
h1p://wheredoesmymoneygo.org/
Why Open Data?
Citizens may unite so that
their voices can be heard
Why Open Data?
h1p://www.ted.com/talks/sanjay_pradhan_how_open_data_is_changing_internaKonal_aid.html
Monrovia África
Why Open Data?
Tanzanía
h1p://www.ted.com/talks/sanjay_pradhan_how_open_data_is_changing_internaKonal_aid.html
Why Open Data?
Tanzanía
h1p://www.ted.com/talks/sanjay_pradhan_how_open_data_is_changing_internaKonal_aid.html
Why Open Data?
Tanzanía
h1p://www.ted.com/talks/sanjay_pradhan_how_open_data_is_changing_internaKonal_aid.html
h1p://www.tableausoTware.com/public/gallery/london-‐deprivaKon
OPEN DATA AND PUBLIC HEALTH
Vaccines and Immunisation in Australia
h1p://www.theguardian.com/society/datablog/interacKve/2013/oct/16/children-‐vaccinaKon-‐australia-‐map
h1p://www.cfr.org/interacKves/GH_Vaccine_Map/index.html#map
Applications of Open Data h1p://pinterest.com/socrata/open-‐data-‐applicaKons/
Kenia
OPEN DATA AND SOCIETY
ApplicaKons of Open Data h1p://www.crimemapping.com/
h1p://www.tableausoTware.com/public/gallery/tcc13friends
OPEN DATA AND FINANCES
h1p://opencorporates.com/viz/financial/index.html#bankofamerica/ch/934
Who Owns Who
h1p://opencorporates.com/viz/financial/index.html#goldman/ch/2213
Who Owns Who
OPEN DATA AND ENVIRONMENT
Why Open Data?
40
Improve Public Services
Mejorar la Administración Pública
Smart CiKes
h1p://visualizaKon.geblogs.com/visualizaKon/co2/
h1p://www.visualizing.org/full-‐screen/27036
h1p://www.visualizing.org/full-‐screen/27036
Applications of Open Data h1p://opendatachallenge.org/
LINKED OPEN DATA
h1p://www.ted.com/talks/lang/en/Km_berners_lee_the_year_open_data_went_worldwide.html
What to do with Open Data?
46
What to do with Open Data?
47
At least 77 countries comply level > 2 h1p://www.slideshare.net/mgarrigap/opendata-‐en-‐el-‐ararteko
What to do with Open Data?
48
At least 11 countries comply level > 4 h1p://www.slideshare.net/mgarrigap/opendata-‐en-‐el-‐ararteko
What to do with Open Data?
h1p://www.theguardian.com/news/datablog/2013/oct/28/uk-‐top-‐open-‐data-‐index-‐how-‐countries-‐compare#!
h1p://www.theguardian.com/news/datablog/2013/oct/28/uk-‐top-‐open-‐data-‐index-‐how-‐countries-‐compare#!
Bottom 10 by Open Data Index Score
h1p://www.theguardian.com/news/datablog/2013/oct/28/uk-‐top-‐open-‐data-‐index-‐how-‐countries-‐compare
Local Governments must use Open Data to stay connected
with the ciKzens!
IRMLs 2010-‐ESWC 2010
MoKvaKon SemanKc Web EvoluKon
80’s 00’s
90’s Now
Arpanet: four servers connected Files were transferred Tools: ftp, telnet, e-mail
Hyperlink-based systems. Protocols: http, uri, html Documents and data were published
Published Data are enhanced with semantics! Standards to annotate and describe data: XML, RDF, RDFS, OWL. Standards to query data: SPARQL. Ontologies representing almost any domain.
The Linked Open Data cloud, using the Web to connect related data that was not previously linked!
The Linked Open Data Cloud • Explosion in the number of:
– Linking Open Data resources and databases – Different quality parameters. – Controlled vocabularies:
– MeSH, GO, PO… – Highly interconnected data sources:
Different Sizes Many links • Different in- and out-degrees, etc
• Biological Web: large datasets of linking data.
• Genes, Diseases, Clinical Drugs, Proteins, and so on.
Molecular databases 1170, 95 more than 2008 and 110 more than the year before ! Services and tools published by these databases follow a similar progression! In October 2007, Cloud of Linked Data datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links. By May 2009 this had grown to 4.2 billion RDF triples, interlinked by around 142 billions RDF links! Today the Linked Open Data cloud has at least 295 datasets, 31,634,213,770 triples, and 503,998,829 links.
StaKsKcs
LINKED DATA IN JOURNALISM
Open Data in Journalism Ø It may be trendy but not new. Ø Open Data implies Open Data Journalism. Ø Data is not necessarily curated. Ø Bigger Datasets and Small Things. Ø Data Journalism is 80% perspiration, 10% great
ideas, 10% output. Ø Long and short-form. Ø Anyone can do it. Ø Visualization is important. Ø Data publishers do not have to be programmers. Ø It is all about stories.
h1p://www.theguardian.com/news/datablog/2011/jul/28/data-‐journalism
Open Data in Journalism
Shared Data
Running Events
Breaking News
Open Data
Open Data in Journalism
• Data Cleansing • Conflict Resolution
Data IntegraKon
• Meta-Data Annotation
• Vocabularies
SemanKficaKon • Visualization • Publishing the Story
PublicaKon
Shared Data
Running Events
Breaking News
Open Data
Meta-Data BBC News
h1p://www.bbc.co.uk/blogs/internet/posts/News-‐Linked-‐Data-‐Ontology
This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories.
Data Management Tools- BBC News
h1p://www.bbc.co.uk/blogs/internet/posts/Linked-‐Data-‐ConnecKng-‐together-‐the-‐BBCs-‐Online-‐Content
h1p://www.slideshare.net/moustaki/linked-‐data-‐on-‐the-‐bbc-‐2638734
More Ontologies to represent Meta-‐Data
VISUALIZING LINKED OPEN DATA
Challenges for Linked Data Visualization
EUCLID – InteracKon with Linked Data 74
• Enabling user interacKon – Users must be able to navigate through the data by exploiKng the
connecKons between Linked Data resources – The user might edit the underlying data to enrich it by:
• CreaKng addiKonal metadata • HighlighKng or correcKng errors • ValidaKng data
• SupporKng data reusability – The output (the plo1ed data or the visualizaKon itself) might be
encoded using standard ontologies and vocabularies
• Scalability – Linked Data visualizaKon techniques should support the display of
large amount of data in an efficient way
Challenges for Linked Open Data Visualization
EUCLID – InteracKon with Linked Data 75
• ExtracKng data from different repositories – A Linked Data set might be parKKoned into several repositories – The region of interest (ROI) might include data from different data
sets, requiring the access to distributed repositories
• Handling heterogeneous data – The same data (concepts) might be modeled differently, for example,
using different vocabularies – Certain values might have different formats, for example, dates
represented as DD-‐MM-‐YYYY, MM-‐DD-‐YYYY or just YYYY
• Dealing with missing values – Due to the semi-‐structuredness of Linked Data, some instances might
have missing values for certain properKes
Linked Open Data VisualizaKon Techniques
EUCLID – InteracKon with Linked Data 76
View
Bar/column chart Allows the comparison of values of different categories.
Pie chart Useful for performing comparison of percentages or proporKons.
Comparison of A1ributes / Values
77 EUCLID – InteracKon with Linked Data
Line chart Allows visualizing data as a series of data points, where the measurement points (x-‐axis) are ordered.
Histogram Graphical representaKon of the distribuKon of the data.
Image source: h1p://mbostock.github.io/protovis/ Image source: h1p://musicbrainz.fluidops.net
Image source: h1p://mbostock.github.io/protovis/ Image source: h1p://musicbrainz.fluidops.net
Arc diagram The nodes are displayed in one dimension, and the arcs represent the connecKons.
Analysis of RelaKonships and Hierarchies Graph The data entries are represented as nodes and the links as edges.
78 EUCLID – InteracKon with Linked Data
Adjacency Matrix diagram The nodes are displayed as rows and columns, and the links between the nodes are entries in the matrix.
Node-‐link visualizaKons The data is organized in hierarchies.
Source of images: h1p://mbostock.github.io/protovis/
Icicles and sunburst Hierarchies are represented by adjacencies.
Analysis of RelaKonships and Hierarchies (2) Treemaps Subdivide area into rectangles.
79 EUCLID – InteracKon with Linked Data
Circle-‐packing Containment is used to represent the hierarchies.
Rose diagrams Areas are equal angles and the data is represented by the extension of the area.
Source of images: h1p://mbostock.github.io/protovis/
Space-‐filling techniqu
es
Analysis of Temporal or Geographical Events
Timeline
80 EUCLID – InteracKon with Linked Data
Maps
Source: h1p://mbostock.github.io/protovis/
Choropleth maps Aggregate data by geographical area
LocaKon maps Display geo-‐points on a map
Dorling cartograms Aggregate data and replace each area with a circle
Discrete data points in Kme ConKnuous data in Kme
Source: h1p://www.ko1ke.org/08/08/2008-‐movie-‐box-‐office-‐chart Source: h1p//musicbrainz.fluidops.net
Source: Google Map API Source: h1p//musicbrainz.fluidops.net
Libraries
h1ps://github.com/mbostock/d3/wiki/Gallery
APPLICATIONS
83
SPARQL endpoints have been developed to access data from the LOD cloud.
Tasks to be Solved …
Traverse and Consume Linked Data from the LOD cloud or locally.
SPARQL ENDPOINTS
select disKnct * where {<h1p://dbpedia.org/resource/Venezuela> ?p ?o}
h1p://dbpedia.org/sparql
All the informaKon related to Venezuela
SPARQL Query
h1p://worldbank.270a.info/about.html
SPARQL Endpoint URL
SPARQL Query
SPARQL Query
dbpedia: The_Beatles foaf:made
<h1p:// musicbrainz.org/
record/...>
<h1p:// musicbrainz.org/
record/...>
foaf:made Data:
"Help!" "Let It Be"
dc:Ktle dc:Ktle
<h1p:// musicbrainz.org/
record/...>
"Abbey Road"
dc:Ktle
foaf:made
SELECT ?x ?name ?mbox ?country ?reviewer ?product ?title WHERE { <http://www4.wiwiss.fu-‐berlin.de/bizer/bsbm/v01/instances/dataFromRatingSite293/Review2883011> rev:reviewer ?x . ?x <http://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type> <http://xmlns.com/foaf/0.1/Person> . ?x <http://xmlns.com/foaf/0.1/name> ?name . ?x <http://xmlns.com/foaf/0.1/mbox_sha1sum> ?mbox . ?x <http://www4.wiwiss.fu-‐berlin.de/bizer/bsbm/v01/vocabulary/country> ?country . ?reviewer <http://purl.org/stuff/rev#reviewer> ?x . ?reviewer <http://www4.wiwiss.fu-‐berlin.de/bizer/bsbm/v01/vocabulary/reviewFor> ?product . ?reviewer <http://purl.org/dc/elements/1.1/title> ?title }
Graph Databases
APPLICATIONS DEVELOPED AT USB FEDERATED QUERIES
ANAPSID
SPARQL-DQP
Federations of Endpoints
97
SELECT DISTINCT ?D1?TGD ?GN1 ?GN2 WHERE {
?CT1 <http://data.linkedct.org/resource/linkedct/condition> ?C1 . ?CT1<http://data.linkedct.org/resource/linkedct/intervention> ?I . ?CT1<http://data.linkedct.org/resource/linkedct/intervention> ?I . ?I<http://data.linkedct.org/resource/linkedct/intervention_type> "Drug" . ?C1 <http://www.w3.org/2000/01/rdf-‐schema\#seeAlso> ?D1 . ?I <http://www.w3.org/2000/01/rdf-‐schema\#seeAlso> ?I1 . ?C <http://data.linkedct.org/resource/linkedct/condition_name> "Breast Cancer" . ?CT <http://data.linkedct.org/resource/linkedct/intervention> ?I . ?CT <http://data.linkedct.org/resource/linkedct/condition> ?A4 . ?II <http://www4.wiwiss.fu-‐berlin.de/drugbank/resource/drugbank/target> ?TGD . ?TGD <http://www4.wiwiss.fu-‐berlin.de/drugbank/resource/drugbank/genbankIdGene> ?GN1 . ?D1 <http://www4.wiwiss.fu-‐berlin.de/diseasome/resource/diseasome/associatedGene> ?GN2 .
}
“Genes and diseases that have been studied for drugs tested in clinical trials where Breast Cancer was studied”
Life Sciences Query:
Federated Queries ANAPSID
h1ps://github.com/anapsid/anapsid
98
SELECT DISTINCT ?D1 ?TGD ?GN1 ?GN2 WHERE { { SERVICE <http://virtuoso.bd.cesma.usb.ve/sparql> { ?C1 <http://data.linkedct.org/resource/linkedct/condition_name> "Breast Cancer" . ?C1 <http://www.w3.org/2000/01/rdf-‐schema#seeAlso> ?D1 . ?C3 <http://www.w3.org/2000/01/rdf-‐schema#seeAlso> ?D1 . ?CT3 <http://data.linkedct.org/resource/linkedct/condition> ?C3 }} . { SERVICE <http://virtuoso.bd.cesma.usb.ve/sparql> { ?C1 <http://data.linkedct.org/resource/linkedct/condition_name> "Breast Cancer" . ?I <http://data.linkedct.org/resource/linkedct/intervention_type> "Drug" . ?CT1 <http://data.linkedct.org/resource/linkedct/condition> ?C1 . ?CT1 <http://data.linkedct.org/resource/linkedct/intervention> ?I }} . { SERVICE <http://www4.wiwiss.fu-‐berlin.de/drugbank/> { ?I1 <http://www4.wiwiss.fu-‐berlin.de/drugbank/resource/drugbank/target> ?TGD . ?TGD <http://www4.wiwiss.fu-‐berlin.de/drugbank/resource/drugbank/genbankIdGene> ?GN1 }} . { SERVICE <http://virtuoso.bd.cesma.usb.ve/sparql> { ?I <http://data.linkedct.org/resource/linkedct/intervention_type> "Drug" . ?I <http://www.w3.org/2000/01/rdf-‐schema#seeAlso> ?I1 . ?CT3 <http://data.linkedct.org/resource/linkedct/intervention> ?I . ?CT3 <http://data.linkedct.org/resource/linkedct/condition> ?C3 }} . }
S1:
S2:
S3:
S4:
Federated Queries ANAPSID
h1ps://github.com/anapsid/anapsid
S4
S1
S2
S3
99
Federated Queries ANAPSID
h1ps://github.com/anapsid/anapsid
ANAPSID
ANAPSID ANAPSID
h1p://silurian.thalassa.cbm.usb.ve/
101
“Drugs that possibly target Leukemia”
SELECT DISTINCT ?drug1 WHERE { ?drug1 drugbank:possibleDiseaseTarget diseasome:673 . ?drug1 drugbank:target ?o. ?o drugbank:genbankIdGene ?g. ?o drugbank:locus ?l. ?o drugbank:molecularWeight ?mw. ?o drugbank:hprdId ?hp. ?o drugbank:swissprotName ?sn. ?o drugbank:proteinSequence ?ps. ?o drugbank:generalReference ?gr. ?drug drugbank:target?o. ?drug drugbank:synonym?o1 . OPTIONAL { ?drug owl:sameAs ?drug5 . ?drug5 rdf:type dbcategory:Drug . ?drug drugbank:keggCompoundId ?cpd . ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . } }
h1p://silurian.thalassa.cbm.usb.ve/
102
“Drugs that possibly target Leukemia”
SELECT DISTINCT ?drug1 WHERE { ?drug1 drugbank:possibleDiseaseTarget diseasome:673 . ?drug1 drugbank:target ?o. ?o drugbank:genbankIdGene ?g. ?o drugbank:locus ?l. ?o drugbank:molecularWeight ?mw. ?o drugbank:hprdId ?hp. ?o drugbank:swissprotName ?sn. ?o drugbank:proteinSequence ?ps. ?o drugbank:generalReference ?gr. ?drug drugbank:target?o. ?drug drugbank:synonym?o1 . OPTIONAL { ?drug owl:sameAs ?drug5 . ?drug5 rdf:type dbcategory:Drug . ?drug drugbank:keggCompoundId ?cpd . ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . } }
h1p://silurian.thalassa.cbm.usb.ve/
103
“Drugs that possibly target Leukemia”
SELECT DISTINCT ?drug1 WHERE { ?drug1 drugbank:possibleDiseaseTarget diseasome:673 . ?drug1 drugbank:target ?o. ?o drugbank:genbankIdGene ?g. ?o drugbank:locus ?l. ?o drugbank:molecularWeight ?mw. ?o drugbank:hprdId ?hp. ?o drugbank:swissprotName ?sn. ?o drugbank:proteinSequence ?ps. ?o drugbank:generalReference ?gr. ?drug drugbank:target?o. ?drug drugbank:synonym?o1 . OPTIONAL { ?drug owl:sameAs ?drug5 . ?drug5 rdf:type dbcategory:Drug . ?drug drugbank:keggCompoundId ?cpd . ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . } }
h1p://silurian.thalassa.cbm.usb.ve/
APPLICATIONS-‐ LINK PREDICTION AND PATTERN DISCOVERY
105
��������
������
���������
��� �����
����������
�������� ���
� ���������
� ��� ����
� ��� ����
� ��� ����
� ��� ����
���������� ��������
!��"���# $�%%��#
� ���������
� ���������
� ��������&
� ���������
A significant increase of graph data in the form of social & biological information.
Tasks to be Solved …(2)
Patterns of connections between people to understand functioning of society.
Topological properties of graphs can be used to identify patterns that reveal phenomena, anomalies and potentially lead to a discovery.
Annotation Graph
107
Pa1erns or Signatures Brentuzumab_vedoKn And Catumaxomab
108
Chloroplast
AtVHA-‐C5 AtVHA-‐C
Vacuole
Vacuolar Membrane
Vacuole
Golgi apparatus
Plant-‐type vacuole
Chloroplast
Gene Gene
GO Paths
Vacuole proton-‐ TransporKng V-‐type ATPase , V1 domain
GO Terms
GO Terms
Vacuolar Membrane
Annotation Similarity between two genes based on shared GO annotations
109
Pa1erns or Signatures between genes AtVHA-‐C5 and AtVHA-‐C
110
Drug-Target Interaction Network
112
Pa1erns Between InteracKons
PotenKal new interacKon
113
Patterns of connections between people to understand functioning of society.
h1p://www.visualizing.org/full-‐screen/27036
Conclusions Ø Open Data:
ü Transparency ü Interoperability ü Avoid Corruption ü Impulse research and development ü Data Quality
116
Ø Linked Open Data: ü RDF data ü Linked to existing datasets ü Endpoints can be used to access data
Conclusions Ø Open Data Applications:
ü Citizens can developed applications to take control of their lives.
Ø (Linked) Open Data can be used: Ø Link Prediction Ø Discover Complex Patterns.
117
Future Directions
THANKS! QUESTIONS
Maria-Esther Vidal
Universidad Simón Bolívar
119
[email protected] h1p://www.ldc.usb.ve/~mvidal Twi1er @Maria11576561 Skype: mevs2006