Digital Preservation Resources

The field of digital preservation is, as one colleague recently pointed out, still very experimental. However there is a wealth of documentation and support coming from the community. Following up on my previous post about the NEDCC Digital Directions workshop, here is a summary list of the tools and resources mentioned at the session.

Digital preservation workflow at the most basic level.

Digital Curation
Attributes of Trusted Digital Repositories, OCLC 2002

Open Archival Information System, CCSDS 2003, 2012

Overview: What is Digital Curation?

DCC Lifecycle Model (University of Edinburgh)

 

Access and Reuse: Examples and Networks

Rijksstudio in Amsterdam

Serendip-o-matic: Let Your Sources Surprise You

Established collaboration networks:
Chronopolis — Digital Preservation Across Space & Time
Mirador a complimentary interface for searching IIIF-compliant collections

 

Project Planning & Management

Handbook for Digital Projects, [PDF, 1.4 Mb] Paul Conway, NEDCC, 2000

IMLS National Grant Application – useful framework for organizing thoughts and plans, even if not applying for funding

Project management tools: Basecamp, Microsoft Project, Zohoprojects

Task management tools: Asana, Trello

 

Guidelines and Best Practices

Blue Ribbon Task Force on Sustainable Digital Preservation and Access, San Diego Supercomputer Center

Federal Agencies Digitization Guidelines Initiative (FADGI) (general)

FADGI: Audio-Visual Working Group

Guide to Developing a Request for Proposal for the Digitization of Video (and more), AVPreserve

 

Evaluation and Quality Assurance

AVPreserve: Tools

FADGI Guidelines

Embedded Metadata in Broadcast WAVE Files

Minimal Descriptive Embedded Metadata in Digital Still Images

Field Audio Collection Evaluation Tool (FACET)

Format Characteristics and Preservation Problems

 

Case Study: California Audiovisual Preservation Project (on the Internet Archive)

Preservation Management Resources

CALIPR – Automated tool for preservation needs assessment

CAVPP Workflow Overview [PDF, 172Kb] – similar to a statement of work

Share this on:
  • Print
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • RSS
  • email

Notes from Digital Directions 2014

Northeast Document Conservation CenterLast week I had the pleasure of attending the Digital Directions workshop, hosted by the Northeast Document Conservation Center. There was a ton of fantastic information, along with perspectives from seasoned professionals in the field, and colleagues who are tackling some of the same challenges we’re facing at my institution.

It will be a little while until I’m finished digesting all of this information, but in the meantime I wanted to post my notes. In addition to the links sprinkled in in the notes, check out the free resources for digital preservation on the NEDCC website.

Digital Directions Day 1

Digital Preservation (Introduction)

Tyler Walters
@tywalters1

Digital Preservation – ensuring access across technologies and over time.

Digital Curation – actions people toake to maintian and add value to digital information over its lifecycle.

Curatorial actions must serve needs of current and future users.

dcc.ac.uk/digital-curation

Content creators are not aware of how, or the value of their output outside their own use.

Why curate? Provide information about: changes, custody, usability, metadata, findability, versions, cultural memory (activities), advance knowledge (research)

Two Main Docs
Trusted Digital Repositories
oclc.org/programs/ourwork/past/trusted…
Admin Responsibility
Org. Viability
Financial sustainability
Tech and procedure suitability
System security
Prcedural Accountability

Open Archival Information System Reference Model (OAIS)

Producer -> Submission Information Packet -> Archives Information Packet -> Dissemination Information Packet -> User community

Over past 12 years operating in a merged model of these two.

DCC Lifecycle Model (University of Edinburgh)
Graphical High-level overview of the stages required for successful curation and preservation of data.

http://www.dcc.ac.uk/lifecycle-model

Moving forward, away from “Public” vs. “Technical”
New triad: Infrastructure, Content, Services (including helping users create new content, managing rights)

David Lankes on knowledge production (iSchool at Syracuse)
Librarians as facilitators of conversations.

Libraries, Kitchens, and Grocery Stores – Joan Frye Williams (2008)
Where do collecting institutuions go after preserving and providing access? People want to do stuff with digital informaton and artifacts.

Creation/Curation/Use
Member-facing content creation services
Engage producers to build literacies and skills
Provide content creation, production conversion tools
Offer content hosting & production services

Cultural Institutuions in the Evlolving Paradigm
Teachers / Instructional Partners
Observers / Anthropologiists of information users (members) to study evolving user needs.
Systems Builders
Content Producers and Communicators
Organizational Designers (new services, new staffing, etc)
Collaborative Network Creator (partnering with other organizations)

Digital Curation Networks (eSholarship – California, NITLE, Alliance)
Digtal Preservation Networks (MetaArchive, Chnonopolis, LOCKSS)

 

Digital Project Planning
Emily Gore, DPLA Director for Content

The Power of Where Your Collections Can Go

Start with reuse.
Create sharable metadata.
Thinking beyond the institutuional portal – international data models (broader than institutuion or local community)

Why?
Tell a more complete story by creating virtual collections, complementary collections at other institutuions. Linking to other relevant content or context. Reuse/remixing.

What? Or what do you want to select from born-digital items?
Selecting particular collections should be part of core mission and goals for the institutuions- taking the commitment.
Value – what is valuable, what fits with institutuional mission, what is potential use, what is the cost of NOT digitizing?
Ability – can you? do you have staff etc? RIGHTS RIGHTS RIGHTS and licence, not just on the objects but also the metadata
Legal considerations – DPLA and Europeana are working on standardized actionable statements that could be used across institutuions. Searchable collections by copyright: public domain, Creative Commons flavors, etc.

How?
Workflow of what to outsource for example. Can this step be done in-house? Some steps in house and some outsourced, project-by-project basis.

Level of discovery needed. Minimal level metadata means minimal accessability.
More product/less metadata, or rich metadata and more selective collections?
Crowdsourcing transcription projects (NYPL menu project)

Delivery Expectations – what do you want users to do?
Do you have to place restrictions? If so, be very clear.
Create documented APIs
Look at Rijksstudio in Amsterdam – making money (from prints, postcards, etc) AND making images available for download.
Serendip – o – matic “let your sources surprise you” – run your text through this and discover related content in major collections.
No need to create a portal anymore – just make collections open and allow others (aggregators like DPLA) to build the interface.

Part of larger effort or collaborations?
GLAM, DPLA Hub, MetaArchive, Chronopolis, DPN)
IIIF (Stanford et al) and Mirador – international image interoperbility framework (media ecology project?) – participating institutuions have IIIF plugin running on their collections, Mirador is the interface to search and compare across institutuions. Artstor has released an IIIF-compliant viewer.

Goals for long-term access, preservation & sustainability?
Essential part of the process; partner with other institutuions or outsource. LOCKSS, HathiTrust?

How will you $$$?
See grant guidelines for best ways to plan the process even if you don’t apply for the money. Force you to address each consideration. Take the IMLS national grant applicaton for example. Consider local funding options.

Collaborate.

———–

Digital Directions Day 2

Digital Preservation

Greg Colati
Sr. Director, Archives, Special Collections and Digital Curation
UCONN
Preservation is the preservation of access

Creating durable access:
Sustainability – maintained and accessed over time
Authenticity – digital object is reliably true to the original
Interoperability – standards-based object can be used in a standards-based system
Reusability – can be used in ways not related to original purpose

Parts of a Digital Repository System

Repository (the infrastructure for preservation)
Systems that support the application of policies and activities

Five Attrubutes of Digital Integrity (RLG)
Digital Integrity (Paul Conway): content, fixity, reference, provenance, context

Management

DCC Curation Lifecycle Model: Integrity + Time + Actions = Preservation

Not just getting stuff in and being able to get it back out again, but maintaining usability over time, via: Metadata maintenance, format migration, transforming the original resource to a usable digital object for today (example of a Quark Express file). Continual attention to preserve access.

Preservation is a value proposition based on purpose & mission, and available resources.

Downside of a complete repository system is like having to replace an entire house of plumbing if you want a new kitchen faucet. Keep tools module and connect them together.

Flickr DPLA, WordPress, Omeka are the shiny faucet that can reuse your stuff and present it in new ways.

Presentation (Discovery access)
Tools that enable siple or sophisticated user experiences within the control of the repository manager.

Neatline sits on Omeka and takes an object to put on a map. Viewshare – visualizes objects. These only work because the foundation is there and the digital objects are durable.
British Library interactive collections online.

Information Universe
People can use your stuff anyway they want.
Systems that leverage repository data without management or ownership responsibilities (except for the rights statement).

If you build good objects and have addribution information in metadata attached to objects, then when people build layers and layers on top or remix items they can always track back to the original source.

Digital repositories provide the structue within which preservation decisions can be made and implemented.

 

Digital Workflow Roundtable

Trying to create a vetting process – digital project proposal questionnaire?
Examples posted after session? Syracuse University, project proposal and checklist for evaluating questionnaire.

No metadata is bad, just misunderstood – use what you can.

200 DPI greyscale for best OCR experience?

For microfilm newspapers, going from original microfilm is fine if the quality is good enough for purposes and less fragile than paper.

Using a Wiki todocument procedures and policies – allowing students to comment on points and nominate them for staff review and clarification.

Project management strategies:
Task tracker platforms (web-based and students can post their progress)
Zohoprojects, Basecamp, MS Project
Asana for task management (in additon to Trello)

 

Audiovisual Digitization

Rebecca Chandler, AVPreserve

Statement of Work is primary document outlining steps, responsibilities.
Preservation is focused on the recorded signal, not the container (cassette).
Factors affecting digitzation: Quality of equipment, expertise of operator, condition of media.
Quality of digital process factors (bit depth, sample rate, etc)
SOW can be used for both outside vendor and in-house lab.
Defines specifications that can be used for quality control.
SOW Introduction – what is the mission of the organization and what is the purpose of the project?
Brief description of materials to be digitized: content, formats, run times, quantity, condition issues.
Care, Handling and Storage: staff must be knowlegable. Items should be played as little as possible, secure entry to building, no items left unattended during playback, stored in proper environment.
Media Issues: Tell the vendors how to address problems.  They need to contact you.  Talk to vendors about how they might clean specific formats and make sure they consult before cleaning.
Field Audio Collection Evaluation Tool (FACET)
Format Characteristics and Preservation Problems
Reformatting: reproduction setup – specify that the deck should be cleaned between each tape.
Setup deck each time, adjusting tension, asimuth etc.
For records: correct EQ, stylus, etc, must be selected for each item.
TBC in addition to mechanical adjustments.
Specify that the vendor calibrate signal path and not include equipment that is not part of the signal path. Use distribution amplifier not cable splitter.
Any processing to preservation master only happens to the access master and access copies. Specify trim, eq, levels, etc. But preservation masters remain untouched.
File Formats
Choosing and specifying sample rate.
Guidelines on TCO4 reccomend sampling rate of 96Khz
Bit depth of quantizing level., captured at 24 bits
chrominance – color information
luminance – bw information
4:2:2 for video; 4:4:4: for film
Digital tapes should be transferred as-is, not up-converted.
Audio masters: broadcast wave, 96kHz at 24bit. Access masters are optional but may not be quite this high fidelity.  Access copies might be 256 mp3. Small file, universal format, streaming-compatible.
Video is less conforming.
Quicktime .mov wrapper’ 10-bit YUV 4:2:2 uncompressed v210 codec
Standard, widely adopted, accessable, high quality, flexible for future migrations.
Lossy compression schemes might be fine for talking-head videos.
Access Masters for production and frequent access copies of different flavors. Not needed for stuff that just gets played back.
Determine whether metadata is external or embedded – at least minimal info should be embedded.
FADGI Guidelines for embedded metadata.
Vendor can provide external data such as who transferred, on what machine, any
Checking and editing file’s embedded metadata
MDQC, QT7 lets you edit
BWF MetaEdit
Quality assurance DURING transfer
Quality control AFTER transfer
Digital Directions Day 3

Managing Digital Collections for Preservation and Access

Greg Colati

Digital collections are the same as managing physical objects (sortof)

Digital collections…

Fragile

Require item-level control

Require intervening technology at every stage

 

Appraisal in a digital world:

Carrier media & file format (can the carrier medium sustain the information over time?)

De-duplication!!!!!

Deleted files

Disk images

Virtual machines

 

What are we trying to save? – the experience of the original digital object, or the information on the carrier?

Original source file characteristics

Normalized information objects

Disc images

Operating systems

Software applications

The experience of the original

Given the chance, people have also chosen convenience over durability (of the carrier media)

Given information overload, we go back to assessment: what is important? what is culturally valuable? one grocery list vs. 1,000?  fitbit info to tell you something you already know?

Cultural Armageddon: The Digital Attic (versioning, naming, in the digital world you get all the drafts (in paper world the curator might only get one or two drafts and the final version))

Appraisal is hard because there’s more stuff AND it’s harder to view and assess it all.

Paul Conway, Handbook for Digital Projects, NEDCC, 2000

Analog items become digital objects:

Source: condtion, container, readability

Purpose: Protect original from handling by making digital surrogate, Represent information rather than the thing, Transform use?

Technology: Does the technology exist? Do you own the equipment, can you afford to outsource it? Can you manage and DELIVER the resulting output?

“Born Digital” content doesn’t work this way:

No such thing as physical arrangement, only intellectual arrangement.

How you define “objects” affects management and access more than arrangement.

Advantage: presentation is very flexible, using metadata, to mix, remix, match rearrange objects and display different kinds of relationships, groups, etc.

OAIS IS rocket science!

Conceptual model of an information object that is self-contained and self-describing.

A set of data elements combined into a package that is internally coherent and can be managed in a digital preservation environment (digital repository).

How do you manage?

Largest/smallest information unit that becomes a unit? (lumper vs. splitter) Creating complex objects from small parts (recombining individually scanned pages back into a browsable book, or lumping them back into a PDF).

Quality decisions about the primary content file and its metadata.

How much context is enough? Do we need John Hancock’s pants to understand why he signed the Declaration?

Mangement Requires Tools

Managing digital objects requires intervening technology at EVERY stage.

Translate functional needs into application services.

Software Reality (goal is the free movement of content – from depository to discovery to access to remixing)

Let the tools do the work

Create it once, use it often. Central repository manages metadata, archival masters. Metadata is interoperable across multiple schemas, crosswalks are key.

Automate activities as much as possible (let the system create the derivatives at the point of need).

Step 1: scan RR pictures, basic metadata automated with duplicate records (these are all railroad pictures, #1, 2, 3, 4). Users look for caboose, users can describe and add metadata to the system to delineate different kinds of trains.

Archive does metadata on an item level but does NOT describe in detail each item. (These are all part of the trains collection.)

 

Case Study: California Visual History Archive Preserve (on the Internet Archive)

CALIPR as the assessment for items to be digitized – how to choose what to pick?

CAVPP specifications for vendors to inspect items and treatment if necessary.

Statement of Work includes technical metadata that they ask vendors to capture as well as descriptive metadata fields.

Documentation and templates available from http://calpreservation.org

Pop-up Archive from Berkeley.

Share this on:
  • Print
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • RSS
  • email

At-risk VHS collections & the NYU solution

VHS Tapes on Library ShelvesLike many institutions of our size (or larger), my library has a large collection of commercially produced VHS tapes. Many of these titles are out of print, yet still heavily used for teaching and research. And as technology is ever-changing, what used to be a collections bragging point has now become an albatross, and for years we haven’t had much of a notion of how to address the issue. However, developments in the archival and fair use circles in the past two years have slowly revealed a possible solution.

I’ve written before about the Code of Best Practices in Fair Use for Academic and Research Libraries, and wondered how this new set of guidelines would support the work and processes already in play. At the time, entities such as the Chronicle for Higher Education predicted that the Code would “solve the problem” of VHS, however it is not clear that this document has had such a direct impact. Institutions have remained slow to publicly adopt and advocate for a change in practice.

Interestingly,  New York University was quietly working on their own, more direct solution to the “VHS problem,” supported by a grant from the Mellon Foundation. In 2012 they published  Video at Risk: Strategies for Preserving Commercial Video Collections in Research Libraries. The guidelines [PDF 406KB] provide more specific interpretation on the clauses within the copyright law that allow for archival preservation of copyrighted materials. I had reviewed the project information when I heard about it from CCUMC and other mailing lists last year, and was further interested after hearing Howard Besser speak about the Video at Risk project at the National Media Market last fall.

Although the preservation allowance within copyright law isn’t a complete solution for our VHS problem, the Video at Risk guidelines in combination with the Code present a groundwork I hope we can use to build a collections policy around these items. We’ve already begun to replace VHS titles in other formats when possible. The next step will be weeding the VHS collections to remove items that are no longer meeting current needs (including federal government documents on VHS and films about computer technology from the 1980′s). We will also develop a framework for justifying digital preservation of remaining out-of-print VHS titles where damage can be proven. Hopefully, within the next year or so the VHS format will become officially obsolete, which will further allow libraries and archives to justify long-term preservation of these materials in an accessible form.

Share this on:
  • Print
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • RSS
  • email

Next Page »