{ the front matters: capturing journal front matter content with jats
TRANSCRIPT
{
THE FRONT MATTERS:
Capturing Journal Front Matter Content with JATS
Front Matter vs. Journal Matter (disambiguation)
For the purposes of this presentation:“front matter” = “journal matter”
In the current publishing environment where more and more journals are published online, there are many examples of journals without a traditional “front”.
Obvious
This… not as much
??
??
?
Team Introduction
Rachael Carter a journal manager at PMC at the National Center of
Biotechnology Information at the US National Library of Medicine. Rachael graduated in 2010 from the University of Maryland with a Masters of Library Science.
Kathryn Funk a technical editor for NIHMS and PubMed Health at the National
Center of Biotechnology Information at the US National Library of Medicine. Kathryn graduated from The Catholic University of America with a Masters of Library and Information Science.
Rebecca Mooney formerly a journal manager at PMC at the National Center of
Biotechnology Information at the US National Library of Medicine, recently moved to a new position as a Project Analyst in the IT Department of the American Association for the Advancement of Science (AAAS). Rebecca graduated in 2008 from the University of Maryland with a Masters of Library Science.
“Decisions must be made about what will actually be saved for future use… Will the content consist only of articles in a journal, or will it also include front matter (such as the names of the names of the members of the journal’s editorial board)?”
Marcum, 2001
The Big Picture
PMC as an archive has a responsibility to answer:
What we should preserve? How we should preserve? Why preserve?
NLM Initiative
PMC Submission Method A
• Currently, PMC strives to archive data at the article level, but sees the potential benefit in finding a way to preserve information about the journal that the articles were published in, such as who was Editor in Chief at the time of publication? What was the journal’s philosophy at this time? Etc.
• TOCs: PMC creates their one table of contents, organized by article-type. Still very article based, not at the issue level.
PMC structure
Front Matter “capturing” in PMC as it currently exists – through banner journal-links only
What PMC Front Matter IS Editorial board Journal philosophy Submission guidelines Subscription information Covers Journal contact information Publisher information
What PMC Front Matter is NOT Tables of contents Advertisements Forewords Prefaces
Scope of Front Matter within project
Frontmatter DTD development Timeline
NLM DTD developed
issue-admin.dtd was made available
pmc-journalmatter.dtd developed
Atypon Issue XML presented at JATS-Con
2001 2011 2012
XML to the rescue
- The content is queryable and reusable - Updating just requires editing a file - Allows for data manipulation over various platforms/formats
Value of capturing front matter as XML
Limitations of PDF
- Assumes there is an issue to scan- Difficult to update
content- Limited to certain
platforms and technologies
o Mostly because we already use JATS
o It’s flexible
o Already had meaningful framework to capture journal article content
o Works well within the structure of PMC• consistency
Why we chose to create an extension to JATS
Why JATS isn’t enough to capture front matter:
No meaningful way to capture front matter elements such as editorial boards
No way to tag journal metadata at a level higher than article-meta
Limitations of JATS
To capture front matter in the environment in which it was published
To work as much as possible with the existing JATS framework
To create a DTD that would allow for flexibility in both use in rendering
Goals
Testing 1 2 3
Looking at samples
Defined content types
Created new elements
Completed first iteration of the pmc-journalmatter.dtd
Tagged samples of front matter using our DTD and made adjustments
User testing: PMC journal managers
Adjustments made to final DTD based on user feedback
Highlighted physical example of a journal’s front matter
Anything in RED is required
<journal-meta> contains, in order:• <journal-id>*• <journal-title-group>• <issn>*• <isbn>*• <publisher>?
<issue-meta> contains, in order:• <pub-date>*• <volume>?• <issue>?• <issue-title>*• <issue-sponsor>*• <first-page><last-page>?<page-range>? OR <elocation-id>?
<document-meta> contains, in order:• <pub-date>*• <document-title>• <self-uri>*
<body> contains, in order:• <person-list> requires one or more <person>• <person> contains, in order:
• <name> OR <string-name> OR <collab>• <degrees>*• <address>*• <aff>*• <role>*• <ext-link>*• <xref>*
Initial Classification
Created new elements
<person-
list><issue-meta>
<document-
meta>
Tagged samples of front matter using our DTD and made adjustments
User testing: PMC journal managers
DTD technical details
pmc-journalmatter.
dtd
.ent
.mod
pmc-journa
l matter custom .ent
customizations
<journalmatter journalmatter-type="issue" content-type="edboard">
Root element:journalmatter
How to generate a foundation for organizing and labeling the front matter content?
Answering the question of can we tag all of this content in one document?
Challenges
Root element attribute: @journalmatter-type
Prevents hybrid of issue and non-issue content in the same document
Changes in content can be more easily updated
Allows a single journal to have issue and standing documents
Issue vs. Standing: The Benefits
standing – Information of Authors
Example: Standing & Issue
issue - Cover
@content-type Separate documents
Flexibility In tagging and rendering
Update as need be EX: Journal philosophy vs. ed board
Root element: @content-type
@content-type
edboard
cover
general-info
publisher
info-for-authors
other
Individual documents for each @content-type.
Cover ("cover"): can include cover image, caption, and cover image copyright information.
Editorial Board ("edboard"): can include executive editors, associate editors, etc. as well as general editorial board members.
General Journal Information ("general-info"): can include but is not limited to journal mission statement, scope, journal contact information, subscription information, copyright, and other journal-specific content.
Publisher Information ("publisher"): can include publisher philosophy, other journals published, contact information, etc.
Information for Authors ("info-for-authors"): can include article submission and formatting instructions.
Other ("other"): if the document is not one of the listed types or the type of document cannot be determined, the "other" attribute value may be used.
@content-type values
The 4 Main elements of a document
<jour
nal-
met
a>
<issue-
meta>
<document-meta>
<body>
<journalmatter>
<!ENTITY % journal-meta-model "(journal-id*, journal-title-group*, issn*, isbn*, publisher*)">
<journal-meta>
JATS journal-meta
pmc-journal matter journal-
meta
<!ENTITY % issue-meta-model "(pub-date*, volume?, issue?, issue-id*, issue-title*, issue-sponsor*)">
<issue-meta>
JATS article-meta
Pmc-journalmatter issue-meta
<!ENTITY % document-meta-model "((document-title, document-subtitle?)?, contrib-group?, pub-date*, (((fpage, lpage?, page-range?) | elocation-id)?), self-uri*, permissions?)"
<document-meta>
JATS article-meta
pmc-journalmatter
document-meta
Borrowed directory from JATS (with a few additions)
<body>
Addition: <person-list>
<!ELEMENT person-list (title?, person+) >
Person-list vs. Person-group
advisory-board: A board appointed to advise the editorial board
editor: Content editors editorial-board: A group of editors on a
publication guest-editor: Content editors that have
been invited to edit all or part of a work reviewer: Content reviewer transed: Editors of a translated version
of a work
@person-list-type
Not required – suggested list Not controlled attribute Only used when content-type=“general-
info” Intent was to give meaning for searching
and grouping purposes. Used similarly to JATS’ @sec-types
@sec-type
@sec-type is not a required or controlled attribute. However, when "general-info" is the @content-type of the document, the following is a suggested list of types:
association*
copyright journal-contact journal-philosophy subscription-info
*This refers to associations which may be affiliated with a journal but does not necessarily publish the journal.
List of @sec-types
http://dtd.nlm.nih.gov/ncbi/pmc/journalmatter/
DTD Documentation
So how’s it all going to look?
?
Still relatively untested No rendering No actual use
Lack of an existing model
Based on perceived needs of PMC as an archive. Unanticipated uses beyond.
Different naming conventions and structures of published journal front matter
Limitations
Trying to start a conversation Looking for ways to best capture to suit
needs both inside PMC and the broader JATS community
Determining whether the content types will be applicable for future applications
Initiating the usage for the DTD and seeing what happens
Looking Forward
Breena Krick Jeff Beck Audrey Hamelers Christopher Maloney PMC Journal Managers
Acknowledgements
Andrew N.. The Oxford Journals Online Archives: The Purpose and Practicalities of a Major Digitization Program. Serials Review. (2006. June). 32(12), 78-80.
Holdsworth David. Preservation Strategies for Digital Libraries. Glasgow, UK: HATII, University of Glasgow;DCC Digital Curation Manual. (2007. November). Retrieved from: http://www .dcc.ac.uk /resource/curation-manual /chapters/preservation-strategies-digital-libraries .
Marcum D. Scholars as Partners in Digital Preservation. CLIR Issues. (2001. March/April)20. Retrieved from:http://www .clir.org/pubs /issues/issues20.html.
Markantonatos N. Article vs Issue XML: Capturing the Table of Contents under the NLM DTD. Bethesda, MD:National Center for Biotechnology Information; Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011. (2011). Retrieved from: http://www .ncbi.nlm.nih .gov/books/NBK57236/..
Wheeler B. Journal Identity in the Digital Age. Journal of Scholarly Publishing. (2010. ) 42(1), 45-88.
NLM Journal Archiving and Interchange Tag Suite. Retrieved from: http://dtd .nlm.nih.gov/.
PMC Journal Matter DTD Documentation. Retrieved from: http://dtd .nlm.nih.gov /ncbi/pmc/journalmatter/.
BMC Cancer. Retrieved from: http://www .biomedcentral.com/bmccancer/. Frontiers in Cancer Genetics. Retrieved from: http://www .frontiersin .org/
cancer_genetics.
References
Questions?
Multiple documents: Dependent on information being captured
1 XML document: content-type=“standing” OR “issue”
2 document: 1 content-type=“standing 1 content-type=“issue”
Cover Editorial Board
General Journal Information
Publisher Information
Information for Authors
“standing” “edboard” “general-info”
“publisher” “info-for-authors”
“issue” “cover” “edboard” “general-info”
“publisher” “info-for-authors”