text and data mining: what librarians need to know

26
www.bl.u k 1 Text and Data Mining: what librarians need to know EIFL-Licensing/EIFL-IP webinar, 6 February 2014

Upload: eifl

Post on 12-Jan-2015

321 views

Category:

Technology


2 download

DESCRIPTION

Text and data mining of large datasets is often described as the new frontier for science and research. This presentation is from a webinar hosted by the EIFL-Licensing Programme and the EIFL-IP (Copyright and Libraries) Programme on February 6, 2014 which can be found here: http://bit.ly/1iwr4io In the webinar Benjamin White (Head of Intellectual Property at the British Library) provided a clear introduction to what text and data mining is, and how it differs from other methods of information retrieval About EIFL Working in collaboration with libraries in more than 60 developing and transition countries in Africa, Asia, Europe, and Latin America, EIFL enables access to knowledge for education, learning, research and sustainable community development. Visit eifl.net to learn more. Connect to EIFL on: Facebook - facebook.com/eIFL.net Twitter - twitter.com/EIFLnet LinkedIn - linkedin.com/groups/Friends-EIFL-1862455 Google+ - plus.google.com/+EiflNet/posts

TRANSCRIPT

Page 1: Text and Data Mining: what librarians need to know

www.bl.uk 1

Text and Data Mining:what librarians need to know

EIFL-Licensing/EIFL-IP webinar, 6 February 2014

Page 2: Text and Data Mining: what librarians need to know

Ben White

Ben O’Steen

British Library

Page 3: Text and Data Mining: what librarians need to know

www.bl.uk 3

• Lorem ipsum dolor sit amet, consectetur adipiscing elit

• Ut tristique lectus a massa tristique accumsan

• Integer congue felis nec purus condimentum ultricies

• Donec volutpat diam nec sapien lobortis malesuada

• Morbi in dolor in lorem faucibus semper

Page 4: Text and Data Mining: what librarians need to know

www.bl.uk 4

How Much Data is there?

2013

1.8 zetabytes?

And 80% is unstructured.

Page 5: Text and Data Mining: what librarians need to know

www.bl.uk 5

• Lorem ipsum dolor sit amet, consectetur adipiscing elit

• Ut tristique lectus a massa tristique accumsan

• Integer congue felis nec purus condimentum ultricies

• Donec volutpat diam nec sapien lobortis malesuada

• Morbi in dolor in lorem faucibus

Page 6: Text and Data Mining: what librarians need to know

www.bl.uk 6

Learning and Research

• For millennia learning has been based on people reading;

• Taking notes;

• Extracting facts and data; and

• Organising information.

Page 7: Text and Data Mining: what librarians need to know

www.bl.uk 7

Pre mid 1990s = pen, pencil and eyes

.

Page 8: Text and Data Mining: what librarians need to know

www.bl.uk 8

Computers can now read

© Woodguy

Page 9: Text and Data Mining: what librarians need to know

www.bl.uk 9

And a lot faster than humans

Page 10: Text and Data Mining: what librarians need to know

www.bl.uk 10

How to Do Research in 2013?

Post mid 1990s = pen, pencil, eyes AND computers.

Are off the shelf text and data mining tools from software providers, but researchers write their own programmes too.

Page 11: Text and Data Mining: what librarians need to know

www.bl.uk 11

What is Text and Data Mining?

(NOT search by a search engine)

Algorithms are “intelligently” analysing and reading the text / data (using statistics, probabilities, computational linguistics etc) to do amongst other things:

i) Make assumptions what text strings are about - (e.g. Is the “tree” a piece of wood, a family tree, the tree of life (biology)?);

ii) Analyse what the entire text is about;

iii) See if there is a +ve or –ve relationship between two pre-selected variables.

Page 12: Text and Data Mining: what librarians need to know

www.bl.uk 12

Text Mining Shakespeare

Page 13: Text and Data Mining: what librarians need to know

www.bl.uk 13

What is Text and Data Mining?

This allows for example people to:

i) See if there is some kind of relationship between a chemical / enzyme etc and a medical disease;

ii) Discover some previously undiscovered use for a drug or a chemical compound;

iii) Allow organisations to organise electronic data by subject category etc.

Page 14: Text and Data Mining: what librarians need to know

www.bl.uk 14

TDM & Libraries

Libraries important as they provide access to scholarly information.

A lot of text and data on the web but also very valuable content in books and journals.

People want to hold the data locally and work on it using their own tools.

Page 15: Text and Data Mining: what librarians need to know

www.bl.uk 15

Text and Data Mining – Big Business

Video Time!

(hopefully)

http://www.youtube.com/watch?v=2YQNQ_GLe9Q

Page 16: Text and Data Mining: what librarians need to know

www.bl.uk 16

Savings in the Health Sector

Page 17: Text and Data Mining: what librarians need to know

www.bl.uk 17

Page 18: Text and Data Mining: what librarians need to know

www.bl.uk 18

New Medical Discoveries

Page 19: Text and Data Mining: what librarians need to know

www.bl.uk 19

Reduces Reading Times Exponentially

Page 20: Text and Data Mining: what librarians need to know

www.bl.uk 20

Not Just Computer Scientists Either

© South Wiltshire Girls School

Page 21: Text and Data Mining: what librarians need to know

www.bl.uk 21

The Right to Read is the Right to Mine?

• Facts and data not subject to copyright and database rights

• But computers have to copy in order to mine the data – so is it a licensable activity? (EU has an “internet browser” exception as browsers cache …)

• European Union Commission stakeholder dialogue on TDM / “Licences for Europe” – Research / Library, Technology Sector and Open Access Publishers boycotted.

Page 22: Text and Data Mining: what librarians need to know

www.bl.uk 22

The Right to Read is the Right to Mine?

• How would you license the internet?

• UKPMC – 75 publishers had articles with the word “malaria” in the title. BL’s estimate that from experience of negotiating a new licence it takes 16 months on average.

• TDM goes across thousands / tens of thousands of articles which you ALREADY have legal access to. How can you renegotiate this with all publishers concerned?

• UK universities experiencing server access being suspended automatically when abnormal access is being detected.

Page 23: Text and Data Mining: what librarians need to know

www.bl.uk 23

Thank you

(unless indicated otherwise)

Page 24: Text and Data Mining: what librarians need to know

Now it’s question time!

Page 25: Text and Data Mining: what librarians need to know

Further information• Find out more about the EIFL-Licensing

programme

– www.eifl.net/licensing

• Find out more about the EIFL-IP programme

– www.eifl.net/copyright

Page 26: Text and Data Mining: what librarians need to know

Stay connected

• Visit our website - www.EIFL.net

• Subscribe to our newsletter - www.EIFL.net/subscribe

• Join email lists for EIFL programmes

• facebook.com/EIFLnet

• twitter.com/EIFLnet

• www.flickr.com/photos/EIFL