archiving and presenting journals with rosetta matthias groß, bavarian state library, munich,...
TRANSCRIPT
Archiving and Presenting Journals with Rosetta
Matthias Groß, Bavarian State Library, Munich, Germany 10th IGeLU Conference, Budapest, September 2nd 2015
DRAG Dresden 2014
2
Short timeline (1) - DigiTool
BVB: Bavarian Library Network, regional consortia for research librariesHead Office: department of the Bavarian State Library
2004-2006: looking for powerful „multimedia“ software2006-: implementing DigiTool, going live 2007/08
How to manage journals?complex objects / collections / METS objects
BVB chooses METS-objects for journals
4
Short timeline (2) - Rosetta
2010-: implementing Rosetta at BSB journals not included in pilot workflows
How to manage journals?collections / METS-objects/…
2013/14 collection management gets better, but …2014 … decision to follow own approach in parallel2015 struggling with some problems, then:
Welcome, journals, to Rosetta!
5
Presenting journals with Rosetta• BSB uses Rosetta as „light“ archive
whenever reasonable• A tree structure with several levels
(unlimited depth) is powerful enough to handle most common journal structures and seems natural for end user presentation
• If the tree structure is represented by an „object“, this can correspond with catalogue entries / persistent identifier on the title level
WANTED:
WANTED:
(elsewhere)
6
Re-shaping our DigiTool concept for Rosetta
• In the „Manual Legal Deposit“ workflow, new issues are ingested as new IEs
• Testing collection management in Rosetta in 2014 we saw still some shortcomings (addressed in Pressure Points document)
• Adding new components (issues) to METS-objects would create new versions and lead to a confusing situation, obfuscating genuine preservation actions
BVB wants something that acts like METS, but is not a METS-object
7
Starting at the end …BVB developed own METS viewer for DigiTool in 2012/13 which is basically independent of the system holding the objects; display uses jquery/css. Only a few interfaces to the system needed:
1. Table of contents: from StructMap/FileSec json (Precache)tree structure with Digitool-PIDs of components as leaves
2. Bibliographic metadata: on-the-fly from original MARC/MODS/DC data (2-layer XSLT transformation to json)
3. Request for a child object: uses delivery URL for embedded mode (provides main title and stream)
4. Thumbnail preview: based on Table of contents using special
Delivery Rule
8
Facial composite of the solution (1)
1. Table of contents as „near-METS“• All components of a journal share the same
bibliographic ID in dc:relation • Store reference data (volume, issue, year)
in dcterms:bibliographicCitation (trick: use OpenURL 1.0)
• Based on this information, a ToC can becreated and stored in the file system asBibID.json with Rosetta‘s IE IDs as leaves.
9
Facial composite of the solution (1a)
Plan: Using MARC/MODS metadata instead; OpenURL trick is not so friendly for human editing
OpenURL as container
10
Facial composite of the solution (2)
2. Bibliographic metadata
BibID is known (from each component); for display fetch recent MARC-XML record via Aleph SRU interface
3. Request for child object
DeliveryRule „embedded“ in Rosetta
4. Thumbnail preview
DeliveryFunction „thumbnail“ in Rosetta
12
Creation of near-METS industrialized
Our approach: Harvesting the OAI interface (good experience with DigiTool)
However, we encountered problems to get valid XML output from Rosetta. After some months it turned out that there is a config parameter ‚dublincore_additional_namespaces‘
(see Home > Advanced > Configuration > General > General Parameters) that should be defined as [blank] – which was not the case in our installation.
13
Data processing (simplified: without deletions)
• ( Rosetta OAI repository
filter by journal
Harvest: What‘s new since …?
BibID BV123456789issue 3, vol. 2, year 2015
Found new component?
add to StructMap
BV123456789.json
Known journalNew journal
createStructMap BV123456789.json
get bibliographic MD from Aleph
14
Following two tracks
Combining near-METS with Rosetta-Collections1 collection equals 1 journal
Metadata on journal level
URN on journal level (PP: CM 2.2.2)
AssignCMS for journal level (metadata in Rosetta // URN, ArchiveURL in ALEPH) (Collection Support – WP, 2012)
Searching monographs and journals in parallel (IEs and collections, PP: CM 2.2.3)
Manual Legal Deposit : Issue goes to correct journal „automatically“
Easy administration of IEs in Rosetta
15
They are waiting:
Legal Deposit:- in DigiTool: 450 journals, 15.000 issues- on heap: 100+ journals, constantly new titles arriving
OA publications- finalizing collection strategy for Bavarica and special subject fields
Licensed publications (E-journal backfiles): - responsibility on national, regional and local levels- for hosting and long term preservation
Digitized material- from ZEND / TSM
16
Thank you very much for your interest in the most fascinating format
of scientific literature!