liberating oa figures from pdf to flickr (a pro-ibiosphere talk)

16
Liberating OA Figures

Upload: ross-mounce

Post on 06-May-2015

1.568 views

Category:

Education


0 download

DESCRIPTION

My short wrap-up talk about my project to liberate Open Access figures from (CC BY) PDFs, to enable their re-use, discovery and promotion.

TRANSCRIPT

Page 1: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Liberating OA Figures

Page 2: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Why?

● Facilitates– Content Discovery

– Re-use

– Innovation ( seriously )

PDFs are 'electronic paper' – we can do better than this

Page 3: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Where?

● Free-to-use platform (free as in beer, it's not open)

One Terabyte of free storage per account

● Highly popular platform for image sharing (in top 100 most frequently visited websites of the world)

● Supports Creative Commons licensing (many platforms don't)

● Feature-rich, good UI, useful API, etc...

Page 4: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

How? (technical)

● PDFimages– to extract just images from PDFs

● Exiftool– to embed appropriate metadata in the images

– e.g. the Publisher

– the Authors

– the paper Title and DOI

– the figure Caption Text

– the licence under which the image is available for use

http://en.wikipedia.org/wiki/Pdfimages

http://en.wikipedia.org/wiki/Exiftool

Page 5: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

How? (legal)

● Open Access facilitates and empowers re-use– BOAI-compliant OA papers are typically licensed

under CC BY or CC0.

● Flickr free accounts are ad-supported● Advertising is a commercial endeavour – it generates money● Thus CC BY-NC or CC BY-NC-ND content to which you are not the

original copyright holder, cannot be reposted to your free Flickr account

Key point:Re-posting to Flickr is only possible if content is openly-licensed. Licensing details are important!

Page 6: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

https://www.flickr.com/photos/79472036@N07/sets/

Page 7: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

A Flickr 'album' for the figure content of each paper.

Clark, J.L. & Mora M.M. (2014) Nautilocalyx erytranthus (Gesneriaceae), a new species from Northwestern Amazonia. Phytotaxa. Licensed under CC BY

Page 8: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

“Resource not found”

Page 9: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Using Content Negotation for CrossRef DOIs you can get the full citation details of the paper from the DOI.

But ONLY if the publisher has done their job and registered the DOI with CrossRef

A major problem for this project is that often, newly-published Magnolia Press article DOI's are NOT registered with CrossRef. I can find articles from 2013 with DOI's that are still aren't registered with CrossRef. Extremely annoying – this causes real problems.

http://crosstech.crossref.org/2011/04/content_negotiation_for_crossr.html

Results of content negotiation performed 11-June-2014:

Page 10: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Full attribution visible next to figure. One-click link to source. Full caption text. Searchable.View-counter (METRICS!). Open licencing marked (tells you it's CC BY on mouse-over)

Page 11: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Enriching OA literature – adding embedded metadata to figures

Using exiftool one can embed the attribution (plain-text citation), Publisher, Re-use Rights, and Figure Caption inside the figure image file.

Page 12: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Only one publisher currently embeds useful metadata in their figure images

Well done PLOS! Not perfect though. Author names & the paper title are NOT embedded

Page 13: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Searching for phylogeny is hard

Make it a lot easier!

Search by “presence of phylogenetic trees”

Link to journal search here

Page 14: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Status as of 11-June-2014

● 4045 phylogeny figures from PLOS ONE

– bit.ly/PLOStrees● 5074 phylogeny figures from 128 other OA journals

(Pensoft, BMC, other PLOS journals, Hindawi, MDPI)

– bit.ly/phylofigs ● 708 figures from 152 (OA-only) Phytotaxa papers

– On Twitter @PhytoFigs , and on Flickr ● 1326 figures from 146 (OA-only) Zootaxa papers

– on Flickr

Page 15: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Screencrop of the @PhytoFigs twitter account: https://twitter.com/PhytoFigs

Page 16: Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)

Other work in this area(people are already re-using content)

Rod Page (2010) has re-imagined Zootaxa content http://iphylo.org/~rpage/zootaxa/ explained here: http://iphylo.blogspot.be/2010/08/extracting-semantic-goodness-from.html Numerous other Rod Page projects probably relevant too – thoughts on a comparison of Flickr & Pinterest in terms of features / use would be interesting.

Yale Image Finder: http://krauthammerlab.med.yale.edu/imagefinder/takes OA figures + captions from PubMed Central articles with good search capabilityBUT it's only for PMC articles. A lot of biodiversity literature is NOT in PMC.

British Library content on Flickr: https://www.flickr.com/photos/britishlibrary/explained more here: http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html

Biosearch (last updated October 2007!!!): http://biosearch.berkeley.edu/another seemingly abandoned open access figure search database

PLOS's Tumblr highlighting visually appealing figures they publish, on Tumblr:http://openfigs.tumblr.com/