horstmann humanities research_data

35
Humanities Research Data – Rate me! Wolfram Horstmann Digital.Humani,es@Oxford Summer School, 3 July 2012

Upload: bdlss

Post on 27-Jun-2015

107 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Horstmann humanities research_data

 Humanities  Research  Data  –  Rate  me!  

Wolfram  Horstmann  

Digital.Humani,es@Oxford  Summer  School,  3  July  2012  

Page 2: Horstmann humanities research_data

The  Research  Data  Question  

Data-­‐driven  research  is  called  the  4th  Paradigm  in  the  Sciences.  Where  are  humani;es  in  the  current  discussion  about  research  data?  

h>p://www.flickr.com/photos/desconciertos/160752180/  

Page 3: Horstmann humanities research_data

 Ratings,  Skepticism  &  Anxiety    

Research  Excellence  Framework  is  a  reality.  But  it  is  objected  that:  “Humani;es  research  threatened  by  demands  for  'economic  impact'”  Guardian  13  October  2009    

h>p://www.flickr.com/photos/komoda/7187391601/  

Page 4: Horstmann humanities research_data

Outline  The  current  awareness  of  the  importance  of  research  data  provides  opportuni;es  for  the  

humani;es  to  show  their  value.  ~  

The  challenge  is  to  communicate  what  research  data  means  for  the  humani;es.  

~  The  proposal  is  to  state  the  obvious  more  clearly:  text  and  images  as  research  data  of  the  humani;es  

and  libraries  as  humani;es  research  facili;es.  

Page 5: Horstmann humanities research_data

HUMANITIES  AND  LIBRARIES  AS  SOULMATES  

Page 6: Horstmann humanities research_data

Texts  and  Images  as  Data  

Humani;es  work  with  texts  and  images  as  other  subject  areas    work  with  ma>er,  wetware,  hardware  or  numbers.  

http://www.Flickr.com/photos/gorgmorg/9944210/  

Page 7: Horstmann humanities research_data

Libraries  as  Research  Facilities  

Humani;es  have  ins;tu;onalized  their  research  facili;es  centuries  ago,    other  subject  areas  did  it  much  later,  with  labs  and  centers  like  CERN  or  EMBL.  

h>p://vi.sualize.us/carl_spitzweg_bucherworm_1850_books_library_ladder_reading_picture_2Qp9.html  

Page 8: Horstmann humanities research_data

The  Advent  of  the  Digital  

Transforming  the  physical  research  facili;es  into  digital  is  a  laborious    and  expensive  exercise  –  and  its  poten;al  is  not  yet  exploited.  

h>p://www.bodley.ox.ac.uk/librarian/rpc/manchesterpres/slide15.jpg  

h>p://www.flickr.com/photos/flex/27334821/  

h>p://tei.oucs.ox.ac.uk/Talks/2008-­‐08-­‐kazan/exercise-­‐2.xml  

Page 9: Horstmann humanities research_data

Digital  Humanities  &  Libraries  

World  Data  Centers  or  the  EBI  are  centralized    –  can  Humani;es  Data  Centers  can  be  at  each  ins;tu;on?  

h>p://adamcrymble.blogspot.com.es/2012/01/is-­‐old-­‐bailey-­‐online-­‐film-­‐or-­‐science.html  

Page 10: Horstmann humanities research_data

SOME  EXAMPLES  

Page 11: Horstmann humanities research_data

Digital  Resources  in  the  Bodleian  ~ approaching  petabyte  scale  of  highly  structured  storage  for  texts  and  images  

~ 2.000.000  digi;zed  images,  another  Million  to  come  in  the  next  3  years,  plus  350.000  Google  Books  

~ 100  virtual  machines  

…  and  by  far  most  of  these  are  resources  of  the  Humani;es.  

REFERENCE  MISSING  

Page 12: Horstmann humanities research_data

Cultures  of  Knowledge  

An  example  of  highly  structured,  intellectually  curated  data:  more  than  unique  12.000    people  and  3500  loca;ons  iden;fied  in  60.000  le>ers  with  25.000  annota;ons.  

h>p://www.history.ox.ac.uk/coj/  

Page 13: Horstmann humanities research_data

What’s  the  Score?  

In  only    a  few  months  over  10.000  scores  have  been  described  by  the  public.    h>p://www.whats-­‐the-­‐score.org/  

Page 14: Horstmann humanities research_data

Broadside  Ballads  

Collabora;ve  research  introduces  novel  quali;es    into  humani;es  research  data  management.  

h>p://ballads.bodley.ox.ac.uk  

Page 15: Horstmann humanities research_data

Google  Books  at  the  Bodleian  

Approaching  one  download  a  minute:  350.000  Google  books  with    es;mated  10.000.000  pages  and  25.000.000.000  words  

   

12-­‐18  Mar  

19-­‐25  Mar  

26  M

ar  -­‐  1  Ap

r  

2-­‐8  Ap

r  

9-­‐15  Apr  

16-­‐22  Ap

r  

23-­‐29  Ap

r  

30  Apr  -­‐  6  May  

7-­‐13  M

ay  

14-­‐20  May  

21-­‐27  May  

28  M

ay  -­‐  2  Jun  

Total   5150   3338   7111   3010   3955   4528   6901   4566   6883   5300   5165   2844  .uk   1202   2088   5950   1705   2532   3360   5386   3445   3667   2704   3092   1347  .ac.uk   1033   1328   5751   1610   1262   2970   4482   3123   2988   2525   2803   1194  .ox.ac.uk   991   1296   5636   1559   1249   2938   4435   3111   2973   2498   2737   1186  Bodleian  Libraries   291   464   516   306   319   524   562   680   552   499   649   224  .bodley   0   0   15   3   3   8   14   8   6   21   7   4  .bodleian   0   0   0   0   0   0   0   0   0   1   0   0  .ouls   106   48   43   26   15   88   89   94   39   50   112   39  .sers   79   187   102   63   64   154   105   131   139   181   126   26  .library-­‐public   0   4   0   3   3   0   3   3   3   0   2   0  .bodley-­‐open   3   9   17   4   7   18   10   14   11   6   17   5  .bodley-­‐public   5   14   14   12   19   28   21   32   18   21   30   18  .odl   0   0   0   0   0   0   0   0   0   0   0   0  .ouls-­‐open   98   202   325   195   205   223   313   381   322   212   348   128  .saclib   0   0   0   0   2   0   1   14   10   4   3   1  .taylor   0   0   0   0   1   5   6   3   4   3   4   3  

Page 16: Horstmann humanities research_data

THE  STORY  SO  FAR  

Page 17: Horstmann humanities research_data

Size  matters!  

Even  though  humani;es  oken  use  qualita;ve  and  hermeneu;c  methodology  –  rather  than  quan;ta;ve  –  the  size  of  data  is  significant.  

h>p://randommiza;on.com/2011/03/08/library-­‐has-­‐giant-­‐books-­‐for-­‐facade/  

Page 18: Horstmann humanities research_data

Structure  matters!  

Sizable  numbers  will  not  give  a  thorough  idea  of  digital  humani;es  data    –  structure  is  evenly  important.  This  can  only  be  understood  by  example.  

h>p://cacm.acm.org/magazines/2010/4/81499-­‐the-­‐data-­‐structure-­‐canon/fulltext  

011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010  

Page 19: Horstmann humanities research_data

Collaboration  matters!  

Involvement  of  colleagues  in  collabora;ve  research  and  the  public  in    crowdsourcing  makes  a  difference.  

h>p://www.flickr.com/photos/ludovicmauduit/2646525907  

Page 20: Horstmann humanities research_data

RESEARCH  DATA  CHALLENGES  IN  THE  HUMANITIES  

Page 21: Horstmann humanities research_data

1st  Challenge:  Diversity  

Humani;es  have  a  varied  typology  of  research  data,  oken  requiring  idiographic  approaches.  Thus,  standardiza;on  is  difficult  (cf.  cita;on),  and  so  is  finding  computa;onal  skills.    

h>p://www.ucl.ac.uk/archaeology/studying/undergraduate/courses/ARCL2037  

Page 22: Horstmann humanities research_data

2nd  Challenge:  Openness  

h>p://www.flickr.com/photos/uncene/364730693/  

As  with  all  researchers,  compe;;on,  privacy  and  exploita;on  are  impediments  to  data    sharing.  Do  humani;es  more  than  others  keep  the  “ivory  tower”  aptude?  

Page 23: Horstmann humanities research_data

Accessibility  of  Humanities  Texts  

From  some  30.000.000  bibliographic  records  it  is  hard  to  fill  the  humani;es  corpus.    This  might  constrain  discoverability  of  Humani;es  resources.  

Lösch,  M.,  Wal;nger,  U.,  Horstmann,  W.,  &  Mehler,  A.  (2011).  Building  a  DDC-­‐annotated  Corpus  from  OAI  Metadata.  Journal  of  Digital  Informa;on,  12(2)  

Wal;nger,  U.,  Mehler,  A.,  Lösch,  M.,  &  Horstmann,  W.  (2011).  Hierarchical  Classifica;on  of  OAI  Metadata  Using  the  DDC  Taxonomy.  In  Chambers  et  al  (Eds.),  Advanced  Language  Technologies  for  Digital  Libraries  (Vol.  6699,  pp.  29  -­‐  40).  Berlin  /  Heidelberg:  Springer.  

Page 24: Horstmann humanities research_data

3rd  Challenge:  Inherent  Obstacles  

Humani;es  research  data  show  some  peculiari;es.  An  extreme  example  is  the  closure  of  archaeological  data  to  protect  sites  against  tomb  raiders.  

Research  in  the  Humani;es  and  Social  Sciences  :  Hogenaar,  A.  ,  H.  Tjalsma,  &  M.  Priddy.  2011.  “Research  in  the  Humani;es  and  Social  Sciences”  h>p://dx.doi.org/10.2390/PUB-­‐2011-­‐7  

Page 25: Horstmann humanities research_data

4th  Challenge:  Implementing  Policy  

Funders  policies  are  an  approach  for  opening  up  data  –  but  humani;es  produce    much  data  outside  of  the  regular  project  life  cycle.  

Deposit of resources or datasets Grant Holders in all areas must make any significant electronic resources or datasets created as a result of research funded by the Council available in an accessible and appropriate depository for at least three years after the end of their grant. The choice of depository should be appropriate to the nature of the project and accessible to the targeted audiences for the material produced. h>p://www.ahrc.ac.uk/FundingOpportuni;es/Documents/Research%20Funding%20Guide.pdf  

Page 26: Horstmann humanities research_data

RESEARCH  DATA  OPPORTUNITIES  IN  THE  HUMANITIES  

Page 27: Horstmann humanities research_data

1st  Opportunity:  Public  Understanding  

Humani;es  research  data  are  oken  easier  understood  by  the  public  than  science  data.  The  “Impact  Regime”  may  even  be  an  advantage  for  the  humani;es.  

h>p://www.queenvictoriasjournals.org/home.do  

Page 28: Horstmann humanities research_data

2nd  Opportunity:  Cultural  Heritage  

They  are  more  likely  to  be    accessed  and  preserved  than  research  data  in  other  subject  areas.    

h>p://www.europeana.eu/portal/  

Page 29: Horstmann humanities research_data

3rd  Opportunity:  Infrastructure  

The  requirements  of  infrastructure  for  many  humani;es  research  data  resemble  those    of  digital  libraries.  No  new  research  facili;es  have  to  be  built.        

Na;onal  Library  of  China  

Page 30: Horstmann humanities research_data

4th  Opportunity:  New  Metrics  

It  is  likely  that  humani;es  research  data  have  an  web  impact  advantage.  High  societal    interest  could  result  in  higher  web-­‐o-­‐metric  and  usage  sta;s;cs  ra;ngs.    

http://newsinfo.iu.edu/pub/libs/images/usr/9584_h.jpg  

Page 31: Horstmann humanities research_data

CONCLUSION  

Page 32: Horstmann humanities research_data

Another  mindset?  

…to  see  text  &  images  as  humani;es  research  data.  ~  

…to  see  the  humani;es  as  data  intensive.  ~  

…to  see  a  web  impact  advantage  for  the  humani;es.  

~  …to  see  libraries  as  humani;es  research  facili;es.  

 

Page 33: Horstmann humanities research_data

Recommendations  

Exploit  the  good  accessibility  of  humani;es  research  themes  through  newspapers,  exhibi;ons,  crowdsourcing  and  ci;zen  

science.  ~  

Make  as  many  research  outputs  web  accessible  as  possible.  ~  

Invest  in  and  support  new  metrics  such  as  usage  sta;s;cs  and  web-­‐impact.    

~  Strengthen  partnership  between  humani;es  and  other  

disciplines  and  libraries.  

Page 34: Horstmann humanities research_data

Suggestion  

     

Rate  your  data!  

Page 35: Horstmann humanities research_data

Thank you