reproducibility for ir evaluation
TRANSCRIPT
Department of Information Engineering University of Padua, Italy
Gianmaria Silvello@giansilv
Reproducibility for IR Evaluation
slideReproducibility for IR EvaluationG. Silvello
IR Evaluation Initiatives
2
Evaluation in IR is often conducted in large, shared, international campaigns
FIRE
slideReproducibility for IR EvaluationG. Silvello
IR Evaluation Initiatives
3
Organizer
Assessor
Par.cipant
Organizer
Assessor
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Prepara.onof
Documents
Crea.on
ofTopics
Experiment
Submission
Crea.onof
Pools
Relevance
Assessment
Performance
Measures
Scien.fic
Produc.on
Data
Informa.on
Knowledge
Wisdom
Sta.s.cal
Analyses
slideReproducibility for IR EvaluationG. Silvello
IR Evaluation Initiatives
3
Organizer
Assessor
Par.cipant
Organizer
Assessor
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Prepara.onof
Documents
Crea.on
ofTopics
Experiment
Submission
Crea.onof
Pools
Relevance
Assessment
Performance
Measures
Scien.fic
Produc.on
Data
Informa.on
Knowledge
Wisdom
Sta.s.cal
Analyses
We have shared
experimental
collections and we
perform statistical
validation.
But, are we done?
slideReproducibility for IR EvaluationG. Silvello
IR Evaluation Initiatives
3
Organizer
Assessor
Par.cipant
Organizer
Assessor
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Visitor,Par.cipant
andOrganizer
Prepara.onof
Documents
Crea.on
ofTopics
Experiment
Submission
Crea.onof
Pools
Relevance
Assessment
Performance
Measures
Scien.fic
Produc.on
Data
Informa.on
Knowledge
Wisdom
Sta.s.cal
Analyses
Multiple targets for reproducibility: experimental collections system runs meta-evaluation studies
slideReproducibility for IR EvaluationG. Silvello
The Format Babele
4
This situation hampers:
- automatic management - interpretability - reproducibility - ease of (re-)use - take-up from new comers
<topic'number="6"'type="ambiguous">1 <query>2 kcs3 </query>4 <description>5 Find'information'on'the'Kansas'City'Southern'railroad.'6 </description>7 <subtopic'number="1"'type="nav">8 Find'the'homepage'for'the'Kansas'City'Southern'railroad.'9 </subtopic>10 <subtopic'number="2"'type="inf">11 I'm'looking'for'a'job'with'the'Kansas'City'Southern'railroad.'12 </subtopic>13 <subtopic'number="3"'type="nav">14 Find'the'homepage'for'Kanawha'County'Schools'in'West'Virginia.'15 </subtopic>16 <subtopic'number="4"'type="nav">17 Find'the'homepage'for'the'Knox'County'School'system'in'Tennessee.'18 </subtopic>19 <subtopic'number="5"'type="inf">20 Find'information'on'KCS'Energy,'Inc.,'and'their'merger'with'Petrohawk'Energy'Corporation.'21 </subtopic>22</topic>23
24
<session'num="1"'starttime="08:59:47.258675">1 <topic>2 <title>3 peacecorp4 </title>5 <desc>6 Find'information'about'the'peace'corp7 </desc>8 <narr>9 When'was'it'started'and'by'whom?'What'services'does'it'provide'and'where'does'it'provide'these'services?'What'is'the'criteria'for'applying?'What'is'the'salary'or'stipend?'What'positions'are'available?10 </narr>11 </topic>12 <interaction'num="1"'starttime="09:00:04.155323">13 <query>14 peace'corp15 </query>16 <results>17 <result'rank="1">18 <url>19 http://www.peacecorps.gov/20 </url>21 <clueweb09id>22 clueweb09Nen0011N60N0800323 </clueweb09id>24 <title>25 Peace'Corps26 </title>27 <snippet>28 Fighting'hunger,'disease,'poverty,'and'lack'of'opportunity.29 </snippet>30 </result>31 ...'32 </results>33 <clicked>34 <click'num="1"'starttime="09:00:09.943356"'endtime="09:01:13.434255">35 <rank>36
<top>12<num>)Number:)3033<title>)Hubble)Telescope)Achievements4
5<desc>)Description:6Identify)positive)accomplishments)of)the)Hubble)telescope)since)it7was)launched)in)1991.8
9<narr>)Narrative:10Documents)are)relevant)that)show)the)Hubble)telescope)has)produced11new)data,)better)quality)data)than)previously)available,)data)that12has)increased)human)knowledge)of)the)universe,)or)data)that)has)led13to)disproving)previously)existing)theories)or)hypotheses.))Documents14limited)to)the)shortcomings)of)the)telescope)would)be)irrelevant.15Details)of)repairs)or)modifications)to)the)telescope)without16reference)to)positive)achievements)would)not)be)relevant.17
18</top>19
slideReproducibility for IR EvaluationG. Silvello
The Format Babele
4
This situation hampers:
- automatic management - interpretability - reproducibility - ease of (re-)use - take-up from new comers
303#0#APW19980609.1531#21303#0#APW19980610.1778#12303#0#APW19980715.1061#23303#0#APW19980910.1078#04
51#0#clueweb095en0120513520479#061#1#clueweb095en0120513520479#071#2#clueweb095en0120513520479#08
9101#0#clueweb095en0047533520039#110101#0#clueweb095en0004566509322#211101#0#clueweb095en0033530508382#012101#0#clueweb095en0000545505740#5213101#0#clueweb095en0020592511795#114
1520002#0#clueweb095en0006585533170#1#1#10.51620004#0#clueweb095en0005528520976#1#1#10.51720006#0#clueweb095en0010507521538#1#1#10.518
19
ad-hoc
diversity
ad-hoc with grades
relevance feedback
slideReproducibility for IR EvaluationG. Silvello
The Format Babele
4
This situation hampers:
- automatic management - interpretability - reproducibility - ease of (re-)use - take-up from new comers
303#0#APW19980609.1531#21303#0#APW19980610.1778#12303#0#APW19980715.1061#23303#0#APW19980910.1078#04
51#0#clueweb095en0120513520479#061#1#clueweb095en0120513520479#071#2#clueweb095en0120513520479#08
9101#0#clueweb095en0047533520039#110101#0#clueweb095en0004566509322#211101#0#clueweb095en0033530508382#012101#0#clueweb095en0000545505740#5213101#0#clueweb095en0020592511795#114
1520002#0#clueweb095en0006585533170#1#1#10.51620004#0#clueweb095en0005528520976#1#1#10.51720006#0#clueweb095en0010507521538#1#1#10.518
19
ad-hoc
diversity
ad-hoc with grades
relevance feedback
We need: to agree on a common data model which allows for extension to provide the basic experimental data with proper metadata (descriptive, administrative, copyright, ...)
slideReproducibility for IR EvaluationG. Silvello
Referenceability and Traceability
5
- Explanation of experimental data is usually reported in scientific papers that do not provide direct links to them
- the may be referred to in many different ways within the same paper (experiment id, system version, participant id, …)
- It is often difficult to exactly know which data have been used in a paper and have access to them
- It is ever more difficult to exactly know the performed data cleaning and processing operations
[Ferro, 2016]
slideReproducibility for IR EvaluationG. Silvello
Referenceability and Traceability
5
- Explanation of experimental data is usually reported in scientific papers that do not provide direct links to them
- the may be referred to in many different ways within the same paper (experiment id, system version, participant id, …)
- It is often difficult to exactly know which data have been used in a paper and have access to them
- It is ever more difficult to exactly know the performed data cleaning and processing operations
We need: to have the possibility of citing experimental data in our papers as any other references and to link the data with the claims in the papers to make our papers actionable and executable providing access to the mentioned experimental data
[Ferro, 2016]
slideReproducibility for IR EvaluationG. Silvello
The DIRECT Experience
6
BIBLIOGRAPHICAL
EXPERIMENT
VISUALANALYTICS
EVALUATIONACTIVITY
EXPERIMENTALCOLLECTION
RESOURCEMANAGEMENT
MEASUREMENT
METADATA
http://direct.dei.unipd.it/
http://lod-direct.dei.unipd.it/
[Agosti et al., 2012]
slideReproducibility for IR EvaluationG. Silvello
LOD DIRECT
7
JussiKarlgren
Link ims:relation
ims:has-source
ims:has-target
is-expert-in
ReputationManagement
0.46 0.84
ims:score ims:backward-score
CLEF2012wn-RepLab-
KarlgrenEtAl2012
Link
ims:has-source
ims:has-target
ims:relation
feature
0.53 0.87
ims:scoreims:backward-score
Profiling Reputation of Corporate Entities in Semantic Space
ims:title
dbpedia.org/resource/Reputation_manageme
ntowl:sameAs
dbpedia.org/resource/Information
_retrieval
Link
ims:has-source
ims:has-target
ims:relation
0.42 0.23
ims:score ims:backward-score
InformationRetrieval
owl:sameAs
swrc:has-author
dblp.l3s.de/d2r/resource/
publications/conf/clef/
KarlgrenSOEH12
owl:sameAs
dblp.l3s.de/d2r/resource/
authors/Jussi_Karlgre
n owl:sameAs
RepLab2012
CLEF2012
profiling_kthgavagai
_1
Measure0.77 ims:score
Effectiveness
Accuracy
ims:refersTo
ims:submittedTo
ims:isPartOf
ims:evaluates ims:isEvaluatedBy
ims:assignedTo
ims:measuredBy
[Silvello et al., 2016]
slideReproducibility for IR EvaluationG. Silvello
LOD DIRECT
7
@prefix dc: <http://purl.org/dc/terms/> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix aktors: <http://www.aktors.org/ontology/portal#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix ims: <http://ims.dei.unipd.it/data/rdf/> .@prefix bibo: <http://purl.org/ontology/bibo/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix swrc: <http://swrc.ontoware.org/ontology#> .
<http://lod-direct.dei.unipd.it/user/Fredrik+Olsson;http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/> ;ims:identifier "Fredrik+Olsson" .
<http://lod-direct.dei.unipd.it/user/Fredrik+Espinoza;http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/> ;ims:identifier "Fredrik+Espinoza" .
<http://lod-direct.dei.unipd.it/user/Magnus+Sahlgren;http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/> ;ims:identifier "Magnus+Sahlgren" .
<http://lod-direct.dei.unipd.it/user/Jussi+Karlgren;http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/> ;ims:identifier "Jussi+Karlgren" .
<http://lod-direct.dei.unipd.it/user/Ola+Hamfors;http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/> ;ims:identifier "Ola+Hamfors" .
<http://lod-direct.dei.unipd.it/contribution/CLEF2012wn-RepLab-KarlgrenEt2012b>ims:contribution-type <http://lod-direct.dei.unipd.it/concept/Publication;http://www.aktors.org/ontology/portal%23> ;ims:copyrighted "false" ;ims:created "2013-05-19T17:01:05.644+02:00" ;ims:file-metadata _:b0 ;ims:last-modified "2013-05-19T17:01:05.644+02:00" ;ims:link "http://www.clef-initiative.eu/documents/71612/155385/CLEF2012wn-RepLab-KarlgrenEt2012b.pdf" ;ims:owner <http://lod-direct.dei.unipd.it/user/root;http://ims.dei.unipd.it/> ;ims:title "Profiling Reputation of Corporate Entities in Semantic Space " ;swrc:has-author <http://lod-direct.dei.unipd.it/user/Magnus+Sahlgren;http://ims.dei.unipd.it/author/> , <http://lod-direct.dei.unipd.it/user/Jussi+Karlgren;http://ims.dei.unipd.it/author/> , <http://lod-direct.dei.unipd.it/user/Fredrik+Espinoza;http://ims.dei.unipd.it/author/> , <http://lod-direct.dei.unipd.it/user/Fredrik+Olsson;http://ims.dei.unipd.it/author/> , <http://lod-direct.dei.unipd.it/user/Ola+Hamfors;http://ims.dei.unipd.it/author/> .
<http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/>ims:file-metadata _:b0 ;ims:identifier "http://ims.dei.unipd.it/" ;ims:prefix "e6fe2c43" .
<http://lod-direct.dei.unipd.it/user/root;http://ims.dei.unipd.it/>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/> ;ims:identifier "root" .
<http://lod-direct.dei.unipd.it/namespace/http://www.aktors.org/ontology/portal%23>ims:file-metadata _:b0 ;ims:identifier "http://www.aktors.org/ontology/portal%23" ;ims:prefix "37675fe1" .
_:b0 dc:created "2015-11-06T15:55:20.052+01:00" ;dc:creator "LOD DIRECT (Distributed Information Retrieval Evaluation Campaign Tool) - Version 3.10" ;dc:rights "Copyright (c) 2006-2015 - Information Management Systems (IMS) Research Group (http://ims.dei.unipd.it/) - Department of Information Engineering (http://www.dei.unipd.it/) - University of Padua (http://www.unipd.it/)" .
<http://lod-direct.dei.unipd.it/namespace/http://ims.dei.unipd.it/author/>ims:file-metadata _:b0 ;ims:identifier "http://ims.dei.unipd.it/author/" ;ims:prefix "9c5e2261" .
<http://lod-direct.dei.unipd.it/concept/Publication;http://www.aktors.org/ontology/portal%23>ims:file-metadata _:b0 ;ims:has-namespace <http://lod-direct.dei.unipd.it/namespace/http://www.aktors.org/ontology/portal%23> ;ims:identifier "Publication" .
prefi
xes
author
sco
ntrib
ution
metad
ata
http://lod-direct.dei.unipd.it/contribution/CLEF2012wn-RepLab-KarlgrenEt2012b/
[Silvello et al., 2016]
slideReproducibility for IR EvaluationG. Silvello
Towards a Support for Run Reproducibility
8
slideReproducibility for IR EvaluationG. Silvello
Towards a Support for Run Reproducibility
8
slideReproducibility for IR EvaluationG. Silvello
Actionable Papers
9
<a href=”http://direct.dei.unipd.it/user/UPV”>UPV</a>
slideReproducibility for IR EvaluationG. Silvello
Actionable Papers
9
<a href=”http://direct.dei.unipd.it/experiment/EXP_UKB_WN100”>EXP_UKB_WN100</a>
slideReproducibility for IR EvaluationG. Silvello
Actionable Papers
9
<a href=”http://direct.dei.unipd.it/estimate/017c333a-4b7c-4267-926d-f15fe3554efd”>51.61%</a>
slideReproducibility for IR EvaluationG. Silvello
Actionable Papers
9
<img src=”http://direct.dei.unipd.it/visualization/017c333a-4b7c-4267-926d-f15fe3554efd/snapshot/
177bcef2-00a0-4f59-b781-f285610f1c6f”/>
slideReproducibility for IR EvaluationG. Silvello
Reproducibility is tied to data citation
10
Being able to uniquely identify data (e.g., DOI, URI) is fundamental, but it is not enough
- We need to:
- automatically generate pertinent, consistent and complete human- and machine-readable citation snippets
- define tool to make data citation easy: click, generate, copy and paste
- develop citation systems which require low or no effort to data creators/curators and low or no modification to the actual data being cited
- make persistent data citations [Silvello&Ferro, 2016]
slideReproducibility for IR EvaluationG. Silvello
Data Citation is a Computational Problem
11
- Identity
- Completeness
- Fixity
- Validity
[Buneman, Davidson, Frew, 2016]
The four main computational issues of data citation
slideReproducibility for IR EvaluationG. Silvello
Towards a General Data Citation System
12
The identity+completeness issuesTo identify and generate a citation for a single resource
<Iuphar> <name>IUPHAR-DB </name> <citation>Rule0</citation> [...] <gpcr>
<name>G protein-coupled receptors</name> <citation>Rule1</citation> [...]
<family> <id>29</id>
<name>Glucagon receptor family</name> <citation>Rule2</citation> <receptor> <id>247</id> <name>GHRH</name> [...] <agonists> <ligand> [...] </ligand> </agonists> [...] </receptor> [...]
</family> [...] </gpcr> <ionchannels> [...] </ionchannels></iuphar>
iuphar[name=$.d,url=$.u, version=$.v]
iuphar[]/gpcr[name=$.n]
iuphar[]/gpcr[]/family[name=$.f,id=$.i]/contributors[]/contributor[name=$?c]
{database=$d, version=$v, contributors=$c, db-family=$n, family=$f, idFamily=$i}
Rules:
The citation that gets generated (example):{ database=IUPHAR-DB: the IUPHAR database || url=http://www.iuphar-db.org/ || version=15 || dbFamily=G protein-coupled receptors || family=Glucagon receptor family || idFamily=29 || contributor={Laurence J. Miller;;Daniel J. Drucker;;[...];;Rebecca Hills;;}}
The rules are recursively processed by the system and then transformed into a conjunction of XPaths.
The interpretation of the XPaths generates the citation.
Instantiation of the variables:
The first rule interpreted by the system
The second rule interpreted by the system
The third rule interpreted by the system
[Buneman&Silvello, 2010] Rule-based system for hierarchical data
slideReproducibility for IR EvaluationG. Silvello 13
Towards a General Data Citation System The identity+completeness issues
To identify and generate a citation for a single resource
[Silvello, 2016] Learning to cite framework
for hierarchical data
Human-ReadableCitations
XML FilesCollection
Training Data
Learner
CitationModel
CitationSystem
CitationXPath
XML File
Test Data
Machine-ReadableCitation
Human-ReadableCitation
Output Reference
1
2
3
4
5 6
slideReproducibility for IR EvaluationG. Silvello 14
Towards a General Data Citation System The identity+completeness issues (+ fixity)
To identify and generate a citation for a single resource
[Alawini, Chen, Davidson & Silvello,
in preparation] View+rule based system
for RDF datasets
e1
e2 e3
e4 e5
e6e7
e8
e9
e10
pypz
pz
py
pz
py
px
px
px
px
py
py
pz
VSW(e1)
Resource to be cited: e1 check type
citation query parametrized by e1 CSW(e1,s,v,d,t,o,u)
Citation Function
{eagle-id: “eagle-id: e1'', name: `”Significance Tester'', developers: {“Grant, G.'', “Lazar, M.l'', “Manduchi, E.''}, url: “http://www.cbil.upenn.edu/STAR/ '' }
Final citation
RDFCitation Model
eagle-i id
CitationFormatter
machine-readable citation (JSON)
human-readable citation
eagle-i triple store
eagle-iV versioning
system
slideReproducibility for IR EvaluationG. Silvello 15
Towards a General Data Citation System The identity+completeness issues
To identify and generate a citation for a multiple resources
[Silvello 2015] Named graphs for RDF subsets
ex:systemA
ex:expA
ex:CLEF 2009
ex:measureA
ex:produce
ex:measure ex:submitted-to
precision
0.70
ex:name
ex:value
ex:n1
ex:n2
ex:n3
ex:n4
ex:n5
schema:is-related-to
schema:is-related-to
schema:is-related-to
schema:is-related-to
ex:n1 schema:is-related-to ex:n2 ex:cit-sysA-CLEF2009
ex:n1 schema:is-related-to ex:n3 ex:cit-sysA-CLEF2009
ex:n2 schema:is-related-to ex:n4 ex:cit-sysA-CLEF2009
ex:n2 schema:is-related-to ex:n5 ex:cit-sysA-CLEF2009
Subject Property Object Name
Machine-readable citation meta-graph
ex:systemA ex:produce ex:expA ex:n1
ex:expA ex:measure ex:measureA ex:n2
ex:expA ex:submitted-to ex:CLEF2009 ex:n3
ex:measureA ex:name "precision" ex:n4
ex:measureA ex:value "0.7" ex:n5
Subject Property Object Name
Original cited LOD subsetn1
n3
n2n5
n4
Copyright © 2015 Gianmaria Silvello
slideReproducibility for IR EvaluationG. Silvello 16
Towards a General Data Citation System The identity+completeness issues
To identify and generate a citation for a multiple resources
[Davidson, Deutch, Milo,Silvello, 2017] View-based model
for relational databases
Query RewritingFunction
Database ViewsV
SpecificationLanguage
Query Q
DatabaseD
Citation Policies
q1
q2
qn
.
.
.
PreferenceModel
CitationFunction
Set of bestrewritings
Citation QueriesCQ
c1
c2
cm
.
.
.
AggregationFunction
Citation C
1
2 3 4 5
Citation Views
Citation Checking Mechanism
6
slideReproducibility for IR EvaluationG. Silvello
Conclusions
17
- Reproducibility is a fundamental topic for science
- Information retrieval evaluation is a challenging domain
- Data Citation is a complex and open problem
- new models of citations
- computational solutions
- intrinsically related to reproducibility
slideReproducibility for IR EvaluationG. Silvello
References
18
[Agosti et al., 2012] Agosti, M., Di Buccio, E., Ferro, N., Masiero, I., Peruzzo, S., and Silvello, G. (2012). DIRECTions: Design and Specification of an IR Evaluation Infrastructure. In Proceedings of the Third International Conference of the CLEF Initiative (CLEF 2012). LNCS 7488, Springer, Heidelberg, Germany.
[Buneman et al., 2016] Buneman, P., Davidson, S. B., and Frew, J. (2016). Why data citation is a computational problem. Communications of the ACM (CACM), 59(9):50–57.
[Buneman and Silvello, 2010] Buneman, P. and Silvello, G. (2010). A Rule-Based Citation System for Structured and Evolving Datasets. IEEE Data Eng. Bull., 33(3):33–41.
[Davidson et al., 2017] Davidson, S. B., Deutch, D., Tova, M. and Silvello, G. (2017). A Model for Fine-Grained Data Citation. In 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017).
[Ferro, 2016] Ferro, N. (2016). Reproducibility Challenges in Information Retrieval Evaluation. ACM Journal of Data and Information Quality (JDIQ), to appear.
[Silvello, 2015] Silvello, G. (2015). A Methodology for Citing Linked Open Data Subsets. D-Lib Magazine, 21(1/2).
[Silvello, 2016] Silvello, G. (2016). Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data. Journal of the American Society for Information Science and Technology (JASIST), in print:1–28.
[Silvello et al., 2016] Silvello, G., Bordea, G., Ferro, N., Buitelaar, P., and Bogers, T. (2016). Semantic Representation and Enrichment of Information Retrieval Experimental Data. International Journal on Digital Libraries (IJDL), in press:1–28.
[Silvello and Ferro, 2016] Silvello, G. and Ferro, N. (2016). ”Data Citation is Coming”. Introduction to the special issue on data citation. Bulletin of IEEE Technical Committee on Digital Libraries, Special Issue on Data Citation, 12(1):1–5.
slideData Driven Digital Libraries: The Case of Data CitationG. Silvello 19