michael a. keller university librarian director of academic information resources
DESCRIPTION
Research Libraries: Digital Intermediaries & Digital Archives -- Stanford’s plan, practice, & application. Michael A. Keller University Librarian Director of Academic Information Resources Founder/Publisher of HighWire Press Publisher of Stanford University Press --- - PowerPoint PPT PresentationTRANSCRIPT
Research Libraries:Digital Intermediaries & Digital
Archives--
Stanford’s plan, practice, & application
Michael A. KellerUniversity Librarian
Director of Academic Information ResourcesFounder/Publisher of HighWire PressPublisher of Stanford University Press
---CALIS Conference, Chengdu, PRC
15 May 2007
研究型图书馆 :数字化媒介与数字化存档
--斯坦福大学的规划、实践与应用
斯坦福大学图书馆总馆长学术信息资源总监
HighWire 出版社创办人与社长 斯坦福大学出版社社长
迈克尔 · 凯勒
中华人民共和国四川省成都市2007 年 5 月 15 日
The Cycle
Author(ProfessorsTeachers)
Publishers
Reviewers(Professors)
DistributorsBooksellers
Hosts ISPs
(HighWire Press)
Libraries
Readers(ProfessorsTeachersStudents)
循环圈作者
(教授老师 )
出版者
审读者( 教授 )
发行人销售人
管理者 网络服务提供商
(HighWire 出版社 )
图书馆
读者( 教授老师
学生 )
The Cycle – IntermediariesAuthor
(ProfessorsTeachers)
Publishers
Reviewers(Professors)
DistributorsBooksellers
Hosts ISPs
(HighWire Press)
Libraries
Readers(ProfessorsTeachersStudents)
循环圈– 媒介
作者( 教授老师 )
出版者
审读者( 教授 )
发行人销售人
管理者 网络服务提供商
(HighWire 出版社 )
图书馆
读者( 教授老师
学生 )
The Cycle -- Digitization
Author(ProfessorsTeachers)
Publishers
Reviewers(Professors)
DistributorsBooksellers
Hosts ISPs
(HighWire Press)Libraries
Readers(ProfessorsTeachersStudents
Digitization
循环圈 - 数字化作者
( 教授老师 )
出版者
审读者( 教授 )
发行人销售人
管理者 网络服务提供商
(HighWire 出版社 )图书馆
读者( 教授老师
学生 )
数字化
Stanford Strategic Positions
• HighWire Press, a unit of Stanford University Libraries – serves scholarly publishers– Intersection of professors as authors, publishers,
editors and reviewers– with librarians as information managers – with information technologists as service
• Merging Libraries with Academic Computing– Undertaking digital archiving (LOCKSS/CLOCKSS,
Stanford Digital Repository)• Including Stanford University Press• Http://sulair.stanford.edu
斯坦福的战略规划
• HighWire 出版社是斯坦福大学图书馆的一个分支机构, 专门为学术期刊提供服务– 教授兼为作者、出版者、编辑及审读者– 图书馆员兼信息管理者– 信息技术人员提供服务
• 图书馆与学术计算机构相结合– 从事数字存档 (LOCKSS/CLOCKSS, 斯坦福数字库 )
• 包括斯坦福大学出版社• Http://sulair.stanford.edu
斯坦福大学图书馆网页
HighWire Press• Receives digital “manuscripts” of articles including data
supplements not found in print editions• Processes, adding features
– Publishing before print edition– Several image resolutions– Hyperlinking citations to cited references– Alerting services– PDF & HTML versions– Citation mapping– Corresponding with authors
• World-wide instantaneous delivery enabling many researchers to read simultaneously; no waiting for the print edition– Some publishers abandoning print; more to follow
• Http://highwire.stanford.edu
HighWire 出版社• 接收文章的数字“文本”, 包括未出现在印刷本中的资料附录• 加工处理, 增加功能
– 发表先于印刷版本的电子版– 多种分辨度图像– 链接引文与被引用的参考资料– 多种预告服务– PDF 及 HTML 文本– 引文示意图表– 与作者联络
• 通过全球即时发行使众多研究者能够同时读到电子版文章,无需等待印刷版的发表– 一些出版商已停止发行印刷版期刊, 更多的出版商也将这样做
• 网页: Http://highwire.stanford.edu
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
《细胞生物杂志》文章样本
“Thumbnail image”
“ 微型图形”
Medium size image
中等尺寸图形
Large size image
大幅尺寸图形
Toll Free Linking
免费链接
Citation linking: destination
引述链接 : 终端
“Prospective citing”
“ 预期引述”
Citation Map
引述示意图
Highwire Press 出版社网站( 1 )
Highwire Press 出版社网站( 2 )
Highwire Press 出版社网站( 3 )
Stanford’s HighWire Presssummary of strategic importance
• Enables web versions of many high impact, highly cited scholarly journals published
• Network distribution makes instant distribution possible, making all readers equal
• Provides numerous services making research faster, better, more penetrating
• Links readers and authors• Embodies collaboration among publishers,
librarians, information technologists• Not for profit, a Stanford enterprise
斯坦福大学 HighWire Press重要战略之总结
• 以电子版形式出版影响大及引用率高的学术期刊• 通过网络实现即时发行,使所有读者能同时读到
新发表的文章• 提供多种服务,以加快检索速度,提高检索质量
及精确度• 加强读者与作者的沟通• 实现出版者、图书馆员及信息技术人员之间的合
作• 非盈利性质的斯坦福大学附属机构
"Science, Scholarship, and Internet Publishing: The HighWire Story"
Syllabus Magazine, October 1998
• EXCERPT: "Scientists, scientific editors and publishers, scholarly society officers, and an enterprise unit of the Stanford University Libraries named HighWire Press have worked together over the past three and a half years to publish Internet editions of 70 influential scientific journals. Three significant accomplishments have resulted. First, there has evolved a mode of scholarly communication which serves readers, and facilitates research as much as it supports the clarity and validity of scientific discourse; this model has become a standard in Internet scholarly publishing. Second, an active community of scholarly editors and publishers has intensified the benefits of online scholarly publishing to the scientific, medical and technical communities at large. Third, the products of life sciences research in the advanced economies of Europe and North America are now more widely available than ever before, stimulating scientific and other cultural developments in other parts of the world."
“ 科学、学术及网络发行: HighWire 历史”《摘要杂志》 1998 年 10 月号
摘要:“在过去三年半的时间里,科学家、科学期刊的编辑和出版者、学术界负责人、以及斯坦福大学图书馆附属机构 HighWire 出版社携手合作,为 70 家具有影响力的科学期刊出版了电子版。这一成功具有三项重大意义。首先,它促使了一种学术交流模式的产生,这一模式服务于读者,并在提倡科学论文应真实和准确的同时,为科学研究提供了方便。这一模式已成为网络学术期刊发行的标准。第二,许多活跃的学术期刊编辑和出版商使从事科学、医学和技术工作的人士通过网络学术期刊而大大受益。第三,目前越来越多的人可以获得欧洲和北美先进经济体的生命科学的研究成果,这促进了世界其他地区的科学及各种文化的发展。”
斯坦福大学网站 各机构网页
The Challenge of Digital Preservation• Bit rot• Obsolescence
– Format– Technology
• Distribution and dissipation• Migrations and transitions
– People (2 – 20 years)– Software (5 – 10 years)– Hardware (3 – 5 years)
Benign neglect doesn’t work for digital objects. Preservation requires active, managed care.
• 字节损蚀• 过时
– 格式– 技术
• 发行与分散• 迁移与过渡
– 人员 (2 – 20 年 )– 软件 (5 – 10 年 )– 硬件 (3 – 5 年 )
无为而治的做法不适于数字资源的保存,它需要积极的态度和妥善的管理
数字资源保存的挑战
Three Major Areas of Preservation Needs• Digital Library
– SULAIR collections & resources
– Digitization artifacts
• Institutional Repository– Research data, – Publications, dissertations, – Learning objects, university
assets
• “External” Depositors– Online preservation and
access – Dark archive
Google Books (’000s of TB)Parker Manuscripts (75 TB)MJF Media (50 TB)NGDA (10 TB)~30 other digi projects (15 TB)Purchased collections (25 TB)
HighWirePress (32 TB ) Stanford Univ Press (10 TB)Other Academic Publishers
数字资源保存的三个主要领域 • 数字化图书馆
– 斯坦福图书馆藏书与资源 – 数字化文物
• 公共机构库存– 研究资料 , – 出版物,论文 , – 学习目标,大学资产
• “外部“ 库存– 网上保存 – 密存档案
Google Books (’000s of TB)Parker Manuscripts (75 TB)MJF Media (50 TB)NGDA (10 TB)~30 other digi projects (15TB)Purchased collections (25 TB)
HighWirePress (32 TB ) Stanford Univ Press (10 TB)Other Academic Publishers
Design Objectives & Assumptions• Preservation-focused archive• Replicated content
– multiple copies, geographically distributed
• Secure• Auditable• Modular• Tiered storage environment
– online, nearline, offline
• Version rather than delete • Content-agnostic
设计目标 及 设想• 以保存为主的档案• 经复制的内容
– 多份拷贝,不同地点储存• 安全• 便于检查• 模式化• 分层存储环境• 在线,近线,无线• 制成不同版本而非删除 • 内容的不可知
Core Repository Functionality
• Preserving access to digital information over time…through generations of technology obsolescence and
change.
• Maintaining integrity of that information over time…through generations of migration and reformatting.
Repository Services Functionality• All (or almost all) user-facing services • Enhanced access & delivery through applications• Data mining, dry research, new indexing, e-
science, etc.• Federation
核心仓储的功能
• 保持不同时期数字化资料的获取…不因技术的更新换代而受影响
• 保持不同时期的资料的完整性…不因时代的迁移与过渡而受影响
仓储服务功能• 全部(或基本全部)直接为用户服务• 通过运作增进信息的获取及传递• 进行资料挖掘,无预期结果的研究,索引更新,网络科学研究,等等• 结成联盟
SDR: Core Repository vs. Repository Services
SDR: 核心存储之与存储服务
Stanford Digital Repository (SDR): content agnostic, preservation repository
National Geospatial Digital Archive(NGDA)
Geospatial data
SUL Digital Bookshelves
(Google Books, internally digitized, vendors' e-books)
Digital Library Applications
(images, mss, media, Special Collections showcases)
while specialty archives and applications provide focused digital content collection, access and value-added services
Institutional Repository
(faculty- and student submitted papers, data, websites, etc.)
SDR Serves As Common Preservation Infrastructure
SDR 作用于公共保存设施同时其专业档案馆及专业技术的应用还提供具有针对性数字化内容的收集、获取
和增值服务
Stanford Digital Repository (SDR): content agnostic, preservation repository
National Geospatial Digital Archive(NGDA)
Geospatial data
SUL Digital Bookshelves
(Google Books, internally digitized, vendors' e-books)
Digital Library Applications
(images, mss, media, Special Collections showcases)
Institutional Repository
(faculty- and student submitted papers, data, websites, etc.)
SDR Workflow
Conversion
DigitalCollections
GeospatialData External
Collections
IngestVirus Check
Ingest StorageLayer
Access Layer
Luna
Book Reader
DEWI (?)
SDR
SDR 流程图
Conversion
DigitalCollections
GeospatialData External
Collections
IngestVirus Check
Ingest StorageLayer
Access Layer
Luna
Book Reader
DEWI (?)
SDR
SDR High-Level Architecture
SDR 高层结构图
SDR Component Diagram
ConversionGenerates TM from
existing MD
Staging Areaacts as gatekeeper: virus checking, file & format validation
Content
Directory WatcherMonitors and validates gatekeeper; transfers
files to ingest
IngestValidates,
Packages (with TM), Sends to storage
Storage ManagerDirects objects to storage layers
Archival Disk(Honeycomb)
Archival Tape
TSM
L700
A B C
three tape copies
All SDR MDSome objects + their MD A
ccessory
Watches for access requests, consults storage manager, funnels content from disk & tape to Reconstructor
Access DirectorAccess request db, checks
cache for content, queues & tracks requests for objects
ReconstructorReconstructs digital objects
from AIPs to DIPs
Fedora
All SDR MD Disseminators
ApplicationsApplications
Applications
Users
LoggingLogging
Secure Preservation Environment
Delivery Cache
SDR
SDR 结构图示
ConversionGenerates TM from
existing MD
Staging Areaacts as gatekeeper: virus checking, file & format validation
Content
Directory WatcherMonitors and validates gatekeeper; transfers
files to ingest
IngestValidates,
Packages (with TM), Sends to storage
Storage ManagerDirects objects to storage layers
Archival Disk(Honeycomb)
Archival Tape
TSM
L700
A B C
three tape copies
All SDR MDSome objects + their MD A
ccessory
Watches for access requests, consults storage manager, funnels content from disk & tape to Reconstructor
Access DirectorAccess request db, checks
cache for content, queues & tracks requests for objects
ReconstructorReconstructs digital objects
from AIPs to DIPs
Fedora
All SDR MD Disseminators
ApplicationsApplications
Applications
Users
LoggingLogging
Secure Preservation Environment
Delivery Cache
SDR
SDR Physical TopologyMarch 2006
Module(s) Hardware
Conversion, Gatekeeper
Sun Fire X4100 Server4 TB Nexsan SATA Disk
Ingest, Storage code, Storage Request Processor
Sun Fire X4100 Server4 TB Nexsan SATA Disk
Online storage 32 TB Sun Honeycomb Storage System
Tape Copies Sun StorEdge L700 Tape Library, with LTO2 drives
IBM Tivoli Storage ManagerIron Mountain data protection plan
Access Service, Access Cache
Sun Fire X4100 Server8 TB of Nexsan SATA Disk
SDR 实体结构组件 硬件
Conversion, Gatekeeper Sun Fire X4100 Server4 TB Nexsan SATA Disk
Ingest, Storage code, Storage Request Processor
Sun Fire X4100 Server4 TB Nexsan SATA Disk
Online storage 32 TB Sun Honeycomb Storage System
Tape Copies Sun StorEdge L700 Tape Library, with LTO2 drives
IBM Tivoli Storage ManagerIron Mountain data protection plan
Access Service, Access Cache
Sun Fire X4100 Server8 TB of Nexsan SATA Disk
Stanford Digital Repository
• Managed care for digital objects of all genres & formats
• Serves several strategic needs– Digital Library– Institutional Repository– Enterprise Repository
• A strategic development for research, teaching & learning
• Will provide a distinctive, competitive edge
斯坦福数字化存储•妥善管理所有类型和版式的数字组件• 为多种重要需求服务
– 数字化图书馆– 公共事业机构存储– 企业存储
•对研究、教学和学习具有重大意义• 将是一种独特和具有竞争力的优势
SDR 对机构存储的获取方针——————————————————————————————————————
What is LOCKSS?
Lots Of Copies Keep Stuff SafeDigital Preservation Infrastructure
Decentralized, Peer to Peer, Continuous Audit & Repair
Internet computers chattering away among themselves
Open Source
163 LOCKSS Libraries in 18 countries http:lockss.stanford.edu
什么是 LOCKSS?
“Lots Of Copies Keep Stuff Safe”数字化储存设施
分散而非集中,同行间交流,持续检查与修复网络电脑之间相互对话
开放源码163 个 LOCKSS 图书馆分布于 18 个国家
网址: http:lockss.stanford.edu
LOCKSS Boxes
Collection Title 1 Title 2
Patron
LOCKSS box
LOCKSS box
藏书
收藏刊名 1 刊名 2
用户
LOCKSS 存储盒 1
LOCKSS 存储盒 2
LOCKSS 存储盒 1 , 2
LOCKSS Boxes
Title 1 Title 2
Patron
LOCKSS box
LOCKSS box
Preservation
保存 刊名 1 刊名 2
用户
LOCKSS 存储盒 1
LOCKSS 存储盒 2
LOCKSS 存储盒 1 , 2
Prevents the publisher from revoking access rights to back content
AccessTitle 1 Title 2
Patron
LOCKSS box
LOCKSS box
获取
防止出版商撤销读者获取回溯内容的权利
刊名 1 刊名 2
用户
LOCKSS 存储盒 1
LOCKSS 存储盒 2
CLOCKSS
• Controlled LOCKSS• Limited network of library caches• LOCKSS technology underlies
CLOCKSS• Shared governance model
CLOCKSS
•受控管的 LOCKSS• 图书馆缓存的有限网络• LOCKSS 技术加强了 CLOCKSS 功能• 共享的控管模式
The CLOCKSS Prototype
• Two year demonstrator, ending in 2007– Public reports of progress & outcome– Demonstration that this solution is credible
for long term– Proof of scalability for publisher content &
library deployment• Funded first by participants with recent
grant support from NDIIPP (Library of Congress)
• http://www.clockss.org
CLOCKSS 模式计划
•两年的示范期于 2007 年结束– 公众对计划和结果的报告– 示范长期实行该计划的可信性– 证明出版内容和图书馆规模的可提升性
• 计划参与者以国会支持款项作为首批投资• 网址: http://www.clockss.org
CLOCKSS Participants
• CLOCKSS acting on behalf of wider community of libraries & publishers
• 7 Libraries distributed across tectonic plates• 12 publishers, commercial & scholarly
societies• Numbers & types sufficient to cover the bases• Commitment based on stewardship of
libraries & responsibility of publishers
CLOCKSS 参与者• CLOCKSS代表众多图书馆和出版社• 7 个图书馆• 12 个出版社、企业及学术团体• 参与者的数量和不同的形式足以涵盖全部
的需要• 图书馆管理和出版社的责任是实现承诺的基础
Libraries
University of EdinburghNew York Public LibraryIndiana UniversityRice UniversityUniversity of Virginia OCLCStanford University
NB: more to be added on more tectonic plates
• 爱丁堡大学• 纽约公共图书馆• 印第安那大学• 莱斯大学• 维吉尼亚大学 • OCLC• 斯坦福大学
• 注意 : 将有更多的图书馆加入
参与的图书馆
Publishers
Blackwell PublishingElsevierNature Publishing GroupOxford University PressSAGE PublicationsSpringerTaylor and FrancisJohn Wiley & SonsAmerican Chemical AssociationAmerican Medical AssociationAmerican Physiological SocietyInstitute of PhysicsNB: aim to add all the rest
• Blackwell Publishing• Elsevier• Nature Publishing Group• Oxford University Press• SAGE Publications• Springer• Taylor and Francis• John Wiley & Sons• American Chemical Association• American Medical Association• American Physiological Society• Institute of Physics• NB: aim to add all the rest
参与的出版商
Equal Partners
• Librarians, with Publishers agreeing, retain stewardship role as society’s memory institutions
• Publishers have decided to trust & engage Libraries, committing to prospect of preservation for continuing access
• Both are exploring social & technical model in a 2 year test, working to build a full scale production system
• Costs are equally shared, with add’l funding from NDIIPP for audit & reporting
平等合作者
• 图书馆员承担社会记忆机构管理员的职责,出版商也同意此说法
• 出版商信任图书馆并与之合作,共同为不断的存取而致力于数字储存的工作
• 在两年的试行中,双方都在探索社会与技术模式,以建立全面的生产系统
•费用是双方分担,另外还有 NDIIPP 提供的审计和报告资金
CLOCKSS Mission
• “CLOCKSS is a not-for-profit community partnership between publishers and libraries that is developing a distributed, validated, comprehensive archive that preserves and ensures continuing access to electronic scholarly content.”
CLOCKSS 的使命
• “CLOCKSS 是出版商和图书馆的非盈利合作伙伴,致力于建立可分发、正确及全面的数字化档案,以保证用户不断获取电子版学术期刊内容。”
CLOCKSS Governance• Jointly governed by founding library &
publisher partners• Each partner represents an organization, but
collectively sectors are represented– University libraries & Public libraries– Commercial publishers & scholarly societies
• No single point of failure or institutional interest will hinder long term governance
• Consensus driven, united for support of scholarly communication over the long term
• CLOCKSS seen as complimentary to national arrangements for legal deposit
CLOCKSS 管理• 由提供资金的图书馆与出版商共同管理• 每一个合作伙伴都代表各自的机构,但是又以行业合作为代表:– 大学图书馆与公共图书馆– 商业出版商与学术团体
• 长期管理计划不会因偶尔失误和某机构的利益而受到阻碍• 以共识为动力,为支持长期的学术交流而联合• CLOCKSS 被视为全国性部署下的合法储存之互补
LOCKSS/CLOCKSS
• Distributed preservation function• Caches authorized e-content for local
caching• Empowers libraries• Inexpensive, easily implemented• Flexible, open source application• Expanding community of users• Expanding community of uses
LOCKSS/CLOCKSS
• 被分布的资源保存之功能• 为本机缓存获取授权的电子内容•获得授权的图书馆•廉价,易于实施• 使用灵活,开放源码应用• 用户不断增长• 使用不断增加
介绍数字资源存储提倡者的国会图书馆网页
Other SULAIR Strategic Programs
• Google Book Search & other digitization• Development of “Bookless” Libraries• CourseWork
– Sakai (Open Source) Course Management System
• Media Preservation• Expanding the East Asia Library• Expanding the Middle Eastern Collection
斯坦福大学图书馆其他重要计划
•谷歌图书检索和其他数字化计划•建立“没有图书”的图书馆•课程安排
– Sakai (开放资源)课程管理系统•传媒资源的保存•东亚图书馆的扩展• 中东收藏部的扩展