The field of digital preservation is, as one colleague recently pointed out, still very experimental. However there is a wealth of documentation and support coming from the community. Following up on my previous post about the NEDCC Digital Directions workshop, here is a summary list of the tools and resources mentioned at the session.
Attributes of Trusted Digital Repositories, OCLC 2002
Open Archival Information System, CCSDS 2003, 2012
Overview: What is Digital Curation?
DCC Lifecycle Model (University of Edinburgh)
Access and Reuse: Examples and Networks
Rijksstudio in Amsterdam
Project Planning & Management
Handbook for Digital Projects, [PDF, 1.4 Mb] Paul Conway, NEDCC, 2000
IMLS National Grant Application – useful framework for organizing thoughts and plans, even if not applying for funding
Project management tools: Basecamp, Microsoft Project, Zohoprojects
Task management tools: Asana, Trello
Guidelines and Best Practices
Blue Ribbon Task Force on Sustainable Digital Preservation and Access, San Diego Supercomputer Center
Guide to Developing a Request for Proposal for the Digitization of Video (and more), AVPreserve
Case Study: California Audiovisual Preservation Project (on the Internet Archive)
CALIPR – Automated tool for preservation needs assessment
CAVPP Workflow Overview [PDF, 172Kb] – similar to a statement of work
Last week I had the pleasure of attending the Digital Directions workshop, hosted by the Northeast Document Conservation Center. There was a ton of fantastic information, along with perspectives from seasoned professionals in the field, and colleagues who are tackling some of the same challenges we’re facing at my institution.
It will be a little while until I’m finished digesting all of this information, but in the meantime I wanted to post my notes. In addition to the links sprinkled in in the notes, check out the free resources for digital preservation on the NEDCC website.
Digital Directions Day 1
Digital Preservation (Introduction)
Digital Preservation – ensuring access across technologies and over time.
Digital Curation – actions people toake to maintian and add value to digital information over its lifecycle.
Curatorial actions must serve needs of current and future users.
Content creators are not aware of how, or the value of their output outside their own use.
Why curate? Provide information about: changes, custody, usability, metadata, findability, versions, cultural memory (activities), advance knowledge (research)
Two Main Docs
Trusted Digital Repositories
Tech and procedure suitability
Open Archival Information System Reference Model (OAIS)
Producer -> Submission Information Packet -> Archives Information Packet -> Dissemination Information Packet -> User community
Over past 12 years operating in a merged model of these two.
DCC Lifecycle Model (University of Edinburgh)
Graphical High-level overview of the stages required for successful curation and preservation of data.
Moving forward, away from “Public” vs. “Technical”
New triad: Infrastructure, Content, Services (including helping users create new content, managing rights)
David Lankes on knowledge production (iSchool at Syracuse)
Librarians as facilitators of conversations.
Libraries, Kitchens, and Grocery Stores – Joan Frye Williams (2008)
Where do collecting institutuions go after preserving and providing access? People want to do stuff with digital informaton and artifacts.
Member-facing content creation services
Engage producers to build literacies and skills
Provide content creation, production conversion tools
Offer content hosting & production services
Cultural Institutuions in the Evlolving Paradigm
Teachers / Instructional Partners
Observers / Anthropologiists of information users (members) to study evolving user needs.
Content Producers and Communicators
Organizational Designers (new services, new staffing, etc)
Collaborative Network Creator (partnering with other organizations)
Digital Curation Networks (eSholarship – California, NITLE, Alliance)
Digtal Preservation Networks (MetaArchive, Chnonopolis, LOCKSS)
Digital Project Planning
Emily Gore, DPLA Director for Content
The Power of Where Your Collections Can Go
Start with reuse.
Create sharable metadata.
Thinking beyond the institutuional portal – international data models (broader than institutuion or local community)
Tell a more complete story by creating virtual collections, complementary collections at other institutuions. Linking to other relevant content or context. Reuse/remixing.
What? Or what do you want to select from born-digital items?
Selecting particular collections should be part of core mission and goals for the institutuions- taking the commitment.
Value – what is valuable, what fits with institutuional mission, what is potential use, what is the cost of NOT digitizing?
Ability – can you? do you have staff etc? RIGHTS RIGHTS RIGHTS and licence, not just on the objects but also the metadata
Legal considerations – DPLA and Europeana are working on standardized actionable statements that could be used across institutuions. Searchable collections by copyright: public domain, Creative Commons flavors, etc.
Workflow of what to outsource for example. Can this step be done in-house? Some steps in house and some outsourced, project-by-project basis.
Level of discovery needed. Minimal level metadata means minimal accessability.
More product/less metadata, or rich metadata and more selective collections?
Crowdsourcing transcription projects (NYPL menu project)
Delivery Expectations – what do you want users to do?
Do you have to place restrictions? If so, be very clear.
Create documented APIs
Look at Rijksstudio in Amsterdam – making money (from prints, postcards, etc) AND making images available for download.
Serendip – o – matic “let your sources surprise you” – run your text through this and discover related content in major collections.
No need to create a portal anymore – just make collections open and allow others (aggregators like DPLA) to build the interface.
Part of larger effort or collaborations?
GLAM, DPLA Hub, MetaArchive, Chronopolis, DPN)
IIIF (Stanford et al) and Mirador – international image interoperbility framework (media ecology project?) – participating institutuions have IIIF plugin running on their collections, Mirador is the interface to search and compare across institutuions. Artstor has released an IIIF-compliant viewer.
Goals for long-term access, preservation & sustainability?
Essential part of the process; partner with other institutuions or outsource. LOCKSS, HathiTrust?
How will you $$$?
See grant guidelines for best ways to plan the process even if you don’t apply for the money. Force you to address each consideration. Take the IMLS national grant applicaton for example. Consider local funding options.
Digital Directions Day 2
Sr. Director, Archives, Special Collections and Digital Curation
Preservation is the preservation of access
Creating durable access:
Sustainability – maintained and accessed over time
Authenticity – digital object is reliably true to the original
Interoperability – standards-based object can be used in a standards-based system
Reusability – can be used in ways not related to original purpose
Parts of a Digital Repository System
Repository (the infrastructure for preservation)
Systems that support the application of policies and activities
Five Attrubutes of Digital Integrity (RLG)
Digital Integrity (Paul Conway): content, fixity, reference, provenance, context
DCC Curation Lifecycle Model: Integrity + Time + Actions = Preservation
Not just getting stuff in and being able to get it back out again, but maintaining usability over time, via: Metadata maintenance, format migration, transforming the original resource to a usable digital object for today (example of a Quark Express file). Continual attention to preserve access.
Preservation is a value proposition based on purpose & mission, and available resources.
Downside of a complete repository system is like having to replace an entire house of plumbing if you want a new kitchen faucet. Keep tools module and connect them together.
Flickr DPLA, WordPress, Omeka are the shiny faucet that can reuse your stuff and present it in new ways.
Presentation (Discovery access)
Tools that enable siple or sophisticated user experiences within the control of the repository manager.
Neatline sits on Omeka and takes an object to put on a map. Viewshare – visualizes objects. These only work because the foundation is there and the digital objects are durable.
British Library interactive collections online.
People can use your stuff anyway they want.
Systems that leverage repository data without management or ownership responsibilities (except for the rights statement).
If you build good objects and have addribution information in metadata attached to objects, then when people build layers and layers on top or remix items they can always track back to the original source.
Digital repositories provide the structue within which preservation decisions can be made and implemented.
Digital Workflow Roundtable
Trying to create a vetting process – digital project proposal questionnaire?
Examples posted after session? Syracuse University, project proposal and checklist for evaluating questionnaire.
No metadata is bad, just misunderstood – use what you can.
200 DPI greyscale for best OCR experience?
For microfilm newspapers, going from original microfilm is fine if the quality is good enough for purposes and less fragile than paper.
Using a Wiki todocument procedures and policies – allowing students to comment on points and nominate them for staff review and clarification.
Project management strategies:
Task tracker platforms (web-based and students can post their progress)
Zohoprojects, Basecamp, MS Project
Asana for task management (in additon to Trello)
Rebecca Chandler, AVPreserve
Managing Digital Collections for Preservation and Access
Digital collections are the same as managing physical objects (sortof)
Require item-level control
Require intervening technology at every stage
Appraisal in a digital world:
Carrier media & file format (can the carrier medium sustain the information over time?)
What are we trying to save? – the experience of the original digital object, or the information on the carrier?
Original source file characteristics
Normalized information objects
The experience of the original
Given the chance, people have also chosen convenience over durability (of the carrier media)
Given information overload, we go back to assessment: what is important? what is culturally valuable? one grocery list vs. 1,000? fitbit info to tell you something you already know?
Cultural Armageddon: The Digital Attic (versioning, naming, in the digital world you get all the drafts (in paper world the curator might only get one or two drafts and the final version))
Appraisal is hard because there’s more stuff AND it’s harder to view and assess it all.
Paul Conway, Handbook for Digital Projects, NEDCC, 2000
Analog items become digital objects:
Source: condtion, container, readability
Purpose: Protect original from handling by making digital surrogate, Represent information rather than the thing, Transform use?
Technology: Does the technology exist? Do you own the equipment, can you afford to outsource it? Can you manage and DELIVER the resulting output?
“Born Digital” content doesn’t work this way:
No such thing as physical arrangement, only intellectual arrangement.
How you define “objects” affects management and access more than arrangement.
Advantage: presentation is very flexible, using metadata, to mix, remix, match rearrange objects and display different kinds of relationships, groups, etc.
OAIS IS rocket science!
Conceptual model of an information object that is self-contained and self-describing.
A set of data elements combined into a package that is internally coherent and can be managed in a digital preservation environment (digital repository).
How do you manage?
Largest/smallest information unit that becomes a unit? (lumper vs. splitter) Creating complex objects from small parts (recombining individually scanned pages back into a browsable book, or lumping them back into a PDF).
Quality decisions about the primary content file and its metadata.
How much context is enough? Do we need John Hancock’s pants to understand why he signed the Declaration?
Mangement Requires Tools
Managing digital objects requires intervening technology at EVERY stage.
Translate functional needs into application services.
Software Reality (goal is the free movement of content – from depository to discovery to access to remixing)
Let the tools do the work
Create it once, use it often. Central repository manages metadata, archival masters. Metadata is interoperable across multiple schemas, crosswalks are key.
Automate activities as much as possible (let the system create the derivatives at the point of need).
Step 1: scan RR pictures, basic metadata automated with duplicate records (these are all railroad pictures, #1, 2, 3, 4). Users look for caboose, users can describe and add metadata to the system to delineate different kinds of trains.
Archive does metadata on an item level but does NOT describe in detail each item. (These are all part of the trains collection.)
Case Study: California Visual History Archive Preserve (on the Internet Archive)
CALIPR as the assessment for items to be digitized – how to choose what to pick?
CAVPP specifications for vendors to inspect items and treatment if necessary.
Statement of Work includes technical metadata that they ask vendors to capture as well as descriptive metadata fields.
Documentation and templates available from http://calpreservation.org
Pop-up Archive from Berkeley.
Like many institutions of our size (or larger), my library has a large collection of commercially produced VHS tapes. Many of these titles are out of print, yet still heavily used for teaching and research. And as technology is ever-changing, what used to be a collections bragging point has now become an albatross, and for years we haven’t had much of a notion of how to address the issue. However, developments in the archival and fair use circles in the past two years have slowly revealed a possible solution.
I’ve written before about the Code of Best Practices in Fair Use for Academic and Research Libraries, and wondered how this new set of guidelines would support the work and processes already in play. At the time, entities such as the Chronicle for Higher Education predicted that the Code would “solve the problem” of VHS, however it is not clear that this document has had such a direct impact. Institutions have remained slow to publicly adopt and advocate for a change in practice.
Interestingly, New York University was quietly working on their own, more direct solution to the “VHS problem,” supported by a grant from the Mellon Foundation. In 2012 they published Video at Risk: Strategies for Preserving Commercial Video Collections in Research Libraries. The guidelines [PDF 406KB] provide more specific interpretation on the clauses within the copyright law that allow for archival preservation of copyrighted materials. I had reviewed the project information when I heard about it from CCUMC and other mailing lists last year, and was further interested after hearing Howard Besser speak about the Video at Risk project at the National Media Market last fall.
Although the preservation allowance within copyright law isn’t a complete solution for our VHS problem, the Video at Risk guidelines in combination with the Code present a groundwork I hope we can use to build a collections policy around these items. We’ve already begun to replace VHS titles in other formats when possible. The next step will be weeding the VHS collections to remove items that are no longer meeting current needs (including federal government documents on VHS and films about computer technology from the 1980′s). We will also develop a framework for justifying digital preservation of remaining out-of-print VHS titles where damage can be proven. Hopefully, within the next year or so the VHS format will become officially obsolete, which will further allow libraries and archives to justify long-term preservation of these materials in an accessible form.