Please note some links lead to information accessible only to the staff of UH Libraries.
1. Roles and Responsibilities
Creators/Producers: The role played by those persons or client systems that provide the information to be preserved. Creators/producers include faculty, students, staff, alumni, collectors, content creators, publishers, and others. Creators/producers can also be internal persons or systems. They can generate “born digital” content or digitized surrogates from physical objects. Creators/producers will be responsible for complying with established deposit requirements and working with the management of the UH Libraries digital preservation program to ensure a successful transfer.
Management: The chair of the Digital Preservation Working Group, working with other key stakeholders, will be responsible for setting digital preservation policies and integrating them into broader organizational contexts.
Digital Preservation Administrators: Designated staff responsible for selection and for ongoing curation of specific collections (see below). Digital Preservation Administrators will be responsible for the establishment and day-to-day management of the digital preservation program.
Cooperating Repositories: Those repositories that have designated communities with related interests. They may ingest and provide access to each other’s collections. At a minimum, cooperating repositories must agree to support at least one common Submission Information Package (SIP) and Dissemination Information Package (DIP) among repositories. Examples include: UH Digital Library (UHDL), UH Libraries’ Institutional Repository (through TDL partnership), DuraSpace (through TDL partnership), the Texas Data Repository (through TDL partnership), and HathiTrust Digital Library.
Consumers/Client Groups: The role played by those persons or client systems who interact with UH Libraries repositories to access information of interest. This can include other institutions and repositories, as well as internal persons or systems.
Chart 1: Digital Preservation Management and Administrators
Head of Digital Research Services
Coordinates with internal and external partners (Including TDL and HathiTrust) to implement digital preservation policy and program; collaborates with the Head of Library Technology Services, Digital Projects Coordinator, and the Digitization Services Coordinator to conduct audit
Head of Library Technology Services
Collaborates with DPWG to implement digital preservation policies and procedures; maintains hardware and software for digital preservation program including distributed digital preservation service; provides support for Archivematica; assists with digital preservation audit process
Head of Metadata and Digitization Services
Collaborates with DPWG to implement digital preservation policies and procedures; oversees digital preservation work done by librarians and staff members in MDS
Digitization Services Coordinator
Ensures digitized content meets SIP requirements; creates AIP for digitized content following digital preservation requirements; assists with digital preservation audit process; maintains preservation storage
Digital Projects Coordinator
Ensures born-digital Special Collections content meets SIP requirements; creates AIP for born-digital Special Collections content following digital preservation requirements; creates DIP for born-digital content; assists with digital preservation audit process
Systems Administrator 3
Provides server and storage support and error troubleshooting for Archivematica
Metadata Unit Managers
Andy Weidner, Anne Washington
Provides appropriate descriptive and administrative metadata for access objects
2. Digital Assets
2.a Quality Creation and Benchmarking
Image capture specifications
UH Libraries bases its image capture specifications on the Federal Agencies Digitization Guidelines Initiative’s (FADGI’s) document Technical Guidelines for Digitizing Cultural Heritage Materials.
Technicians must capture each aspect of an object (page, recto/verso of photograph, etc.) as a 16-bit TIFF file, with the Adobe RGB 1998 colorspace embedded. All captures should be taken at 300 ppi or higher, depending on equipment used. Pixel dimensions vary based on the size of the original object. The TIFF file serves as the preservation master file.
For access files, technicians create 8-bit JPEGs with an embedded sRGB colorspace. Typically, the dimensions of JPEGs will not exceed 3,000 x 3,000 pixels.
For further details, see FADGI Guidelines for Local Implementation.
Audio capture specifications
UH Libraries bases its audio capture specifications on Sound Directions: Best Practices for Audio Preservation and the International Association of Sound and Audiovisual Archives’ Guidelines on the Production and Preservation of Digital Audio Objects.
All audio digitized by UH Libraries must be captured as 96 kHz, 24-bit WAVE files. Basic metadata is embedded in the file using FADGI’s BWF Metaedit tool.
For access files, mid-range quality (44.1 kHz Stereo, 16-bit, 192 kbps, CBR) MP3s are created from the WAVE files. This process may include stitching files, increasing volume, and improving the clarity of sound.
Video capture specifications
Following such institutions as the Library of Congress, Library and Archives Canada, and 20th Century Fox, for film and video digitized by an outside vendor, we request a lossless JPEG2000 in an MXF wrapper (OP1a). This file serves as the preservation master. For video digitized by UH Libraries, the preservation master will be captured as an uncompressed AVI or MOV. This model was selected to expedite access to A/V holdings. As we expand in-house A/V digitization, we hope to move to a more robust software and hardware setup that would allow for the creation of MXF.
Mezzanine files will not be preserved at this time due to limitations in archival storage; they will be stored on local network drives.
In order to keep master files accessible over time, and to ensure a digital object’s authenticity and reliability, metadata must be collected to meet certain functional requirements. All content entering the preservation system must have descriptive, structural, and administrative metadata, and the metadata must be made available in well-documented and widely-adopted formats.
Sufficient metadata must be created to support a number of essential functions, listed below.
- Functions required of all digital masters
- It will be possible to produce, in print of as an online (on-screen) display, a faithful, citable rendering of the physical source including the sequencing of its component parts (pages, volumes, boxes, folders, etc.)
- It will be possible to navigate sequentially through the physical components (go to next, previous, first, last, or nth page, etc.)
- The relationship between component parts of the physical source (pages, volumes, boxes, folders, etc.) will be represented.
- Images of blank pages, photographic versos, and other like materials will be included as sequenced components.
- It will be possible to associate higher-level descriptive metadata with digital component parts of the object (for example, for the purposes of citation.)
- Functions required where applicable
- Where possible, masters will support navigation to, between, and among logical structures (chapters, volumes, parts, boxes, folders). Citation of those features will also be supported.
- Where applicable and in a manner appropriate for the physical object in question, any enumeration found on the physical object will be represented. Representation will maintain all variations in the enumeration of the physical object’s component parts (signature pages, preface, etc.)
- Placeholders for known missing materials (pages, photos, etc.) will be included as sequenced components. In the interest of creating complete digital masters, missing pages and other components should be identified as such in higher-level metadata. Where parts of the object are provided by third parties, information to that effect should be noted in descriptive metadata.
- Functions strongly preferred
- High-level logical structures will be identified in ways to enable more complex rendering and navigation functions. This would play an important role should objects become displayed in more associative, non-hierarchical contexts, such as image clouds or image recommendation services.
- For purposes of citation, access, etc., it will be possible to support association of higher-level metadata with both logical structures and specific instances of an object’s representation.
PREMIS metadata will be captured and stored within the canonical metadata documents associated with all digital objects. These values will be monitored on a regular basis to ensure fixity and ongoing access. For more information on what PREMIS semantic units or components are captured, refer to Archivematica’s PREMIS documentation.
2.b Selection and Acquisition Policies and Procedures
The UH Libraries Digital Preservation Program acquires and preserves digital assets from the following sources: the UH Digital Library, UH Libraries’ Institutional Repository, and digital archival materials from Special Collections. As a general operating procedure, content that is either digitized by, acquired by, or submitted directly to UH Libraries will be preserved. Content submitted by a producer to partner organizations, including TDL, is preserved by those organizations.
Selection of digitized materials for UH Libraries’ digital collections
Selection criteria for objects deposited into digital repositories at UH Libraries should conform to the Digital Collections Development Policy, administered by the UH Libraries’ Digital Collections Management Committee (DCMC).
While repositories may have specific selection criteria based on their scope and purpose, any UH Libraries repository should retain collections and items that generate national recognition for the University of Houston and UH Libraries. Additionally, collections and items should:
- Be of significant research and/or teaching interest
- Align with University of Houston strategic priority areas (Energy, Health, Arts)
- Exist nowhere else as digital content that is easily accessible and/or of comparable quality
- Meet existing or anticipated demand
Specific selection criteria for each repository can be found in UH Libraries’ Digital Collections Development Policy.
Selection of born-digital assets to be preserved in UH Digital Preservation Program
The selection of materials in born-digital form for preservation is guided by the same general objectives that direct the selection of materials in other media. These collections include unique born-digital resources that are part of UH Libraries’ archival/manuscript collections and which are unlikely to be preserved anywhere else.
Note on selection of electronic theses and dissertations (ETDs)
All ETDs produced for University of Houston masters and doctoral programs are submitted by students to the Vireo Thesis and Management System, hosted by TDL. ETDs without embargoes, or with expired embargoes, are ingested into and made available through the UH Libraries’ Institutional Repository.
TDL stores backups of Vireo and the UH Libraries’ Institutional Repository to Amazon in differing intervals (last five days, last day of month, last day of year.) These backups are also stored in Amazon’s S3 preservation service, a cloud-based distributed digital preservation network.
See also Content Selection Strategies for Cloud-Based Storage in Appendix B, Section 3.g: “Preservation Planning.”
Acquiring and deaccessioning digital assets
Files are acquired from creators through methods that assure the authority and integrity of the files. UH Special Collections has outlined their procedures in Special Collections Procedures for Accessioning Born Digital Content.
Digital assets that no longer support the teaching and research activities of the University of Houston, the scholarly community, and the general public may be deaccessioned from collections and will no longer be maintained or preserved by UH Libraries. THe policies and procedures for this process will be determined by DCMC.
Format specifications for producers of born-digital content
These recommendations are designed to serve as a general guideline for file formats for producers of born digital content. The digital preservation workflow accepts all file formats; however, some formats are more sustainable and easier to preserve long-term, while it may not be possible to fully preserve higher risk formats over time, as hardware and software needed to read the files may become obsolete, and normalization and migration lead to data loss. The UH Libraries Digital Preservation Program ranks a format as either high, moderate, or low preference based on amount of support required to maintain the file format and probability of long-term stability.
High Preference indicates that the formats have the most support and the highest probability of long-term stability. The formats are typically openly documented and not compressed (or have lossless data compression).
Moderate Preference formats do not meet the minimum requirements for long-term retention but come close, and for practical reasons may be necessary for long-term maintenance. These formats are more likely to require migration in order to remain renderable.
Low Preference formats are not recommended or supported for long-term preservation. These files may be difficult or even impossible to render or provide access to in the future. These formats are likely candidates for normalization or migration. Information is often lost during this process.
Chart 2: File Format Preferences for Digital Preservation
PDF/A-1a (.pdf), OpenDocument Text (.odt)
PDF (.pdf), Microsoft Word (.doc), Microsoft Open XML (.docx), Rich Text Format (.rtf)
Corel WordPerfect (.wpd), Lotus WordPro (.lwp)
Plain text (.txt), comma-separated file (.csv), tab-delimited file (.txt)
Structural markup text documents
SGML with DTD/Schema, XML (.xml) with DTD/Schema
SGML without DTD/Schema, XML without DTD/Schema
OpenDocument Spreadsheet (.ods), comma-separated file (.csv), tab-delimited file (.txt), PDF/A-1a (.pdf)
Microsoft Excel (.xls), Microsoft Excel Open XML (.xlsx)
WAVE format (.wav)
AIFF uncompressed (.aif, .aiff), standard MIDI (.mid, .midi), Windows Media Audio (.wma), MPEG3 (.mp3), MP2 AAC (.m4a)
Audio CD, DVD-Audio, QuickTime MP4 AAC Protected (.m4p, .m4b), QuickTime MP3, iTunes, RealAudio (.rm, .ra), Shorten (.shn), RIFF-RMID (.rmi), Extended MIDI (.xmi), Module Music Formats, Mods (.mod)
Lossless JPEG2000 in and MXF wrapper (OP1a)
QuickTime (.mov), AVI (.avi), MPEG-1 (.mpg), MPEG-2 (.mpg), MPEG-4 (.mp4)
Windows Media Video (.wmv)
TIFF (.tif, .tiff)
JPEG (.jpg, .jpeg), JPEG2000 (.jp2), PNG (.png), PDF/A-1a (.pdf), GIF (.gif)
RAW (.raw, various), Adobe Photoshop (.psd), Kodak PhotoCD, Encapsulated PostScript (.eps), FlashPix (.fpx), PDF (.pdf)
2.c Transfer Requirements and Deposit Guidelines
At UH Libraries, digital assets that are candidates for preservation originate in Metadata and Digitization Services (MDS) for digitized assets and UH Special Collections for born-digital assets.
UH Libraries’ digitization projects are initiated by a project plan meeting, bringing together MDS with relevant stakeholders, which would include Special Collections, the Architecture and Art Library, or the Music Library, to discuss project scope and parameters. Upon the completion of the project, Digital Preservation Administrators prepare this content for transfer to UH Libraries’ digital preservation system.
With born digital items, Special Collections negotiates an agreement with the information producer, which includes a deed of gift or deposit agreement. Archives personnel work with information producers to collect information detailing provenance, file organization, hardware and software needed to read files, and context of the files and how they were created. Once content is accessioned, Digital Preservation Administrators transfer it to UH Libraries’ digital preservation system.
Digital Preservation Administrators will be responsible for ensuring that required descriptive, structural, and technical metadata is preserved. In preparing materials for transfer, both MDS and Special Collections create submission information packages (SIPs) that contain the materials necessary for the long-term preservation of digital assets. To ensure consistent quality across all submissions, the Digital Preservation Working Group has created SIP specifications aimed at preserving the digital assets themselves; the minimal descriptive metadata attributed to these assets that is necessary to ensure appropriate levels of reliability, authenticity, and provenance; and the structural metadata necessary to preserve a canonical record of original order and/or hierarchy (if applicable).
See Appendix B, Section 3.a: “Pre-Ingest: Producer-Archive Interactions” and Appendix B, Section 3.c: “Ingest,” Preparation of Digital Assets for Transfer, UHL Digital Preservation System SIP Specification, for details on these processes.
2.d Access and Use Policies
University of Houston Libraries acquires, manages, and preserves digital assets so that they remain accessible to its constituents over the long term. Certain limitations may be placed on access due to legal, donor, or other restrictions, but in general, insofar as possible, UH Libraries endeavors to make its digital assets accessible to all users.
Each individual digital collection will have its own defined restrictions for access and use. These restrictions may be determined by intellectual property rights, legal requirements, privacy concerns, or a project’s mission. UH Libraries provides access to its digital assets in such a way that all license and donor agreements are respected.
A preservation copy (or copies) of digital assets are kept in preservation storage, which prohibits direct public access. Public access, where appropriate, to derived copies of digital assets in the UHDL is provided through the Libraries’ digital asset management system (DAMS) and the Libraries’ Web sites. Dissemination of these digital objects is managed by the Metadata Unit through the DAMS/digital project workflow. External requests for master copies of digital assets will be reviewed on a case by case basis. Additionally, consumers are permitted to download certain static image files through the UHDL Digital Cart service.
Currently, Special Collections provides public access to processed born digital files in the reading room. Future plans include expanding access through the same avenues as other UH digital materials.
For additional information, see Appendix B, Section 3.h: “Access.”
3. Digital Preservation Strategies
In general, the preservation strategies utilized by UH Libraries are based on both the OAIS conceptual model and the Trusted Digital Repository specifications. All decision-making surrounding digital preservation stems, whenever possible, from this common core. To that end, UH Libraries will:
- Establish and maintain a robust preservation system that is able to ensure the reliability, authenticity, and provenance of digital objects. This repository is modular in its structure, so services can be added or updated over time as the information landscape evolves and preservation needs change.
- Manage risk through the Digital Preservation Working Group and its administrators, who are actively involved in the day-to-day preservation of digital content. Decisions regarding specific preservation strategies (format-specific, etc.) will be documented by members of the team and consolidated into standard preservation rules and operational procedures. Members will monitor research/developments in the field of digital preservation and will reevaluate existing documentation and procedures to ensure continued relevance.
- Educate all staff working directly with digital content so that sustainability and preservation become consistently taken into consideration throughout the digital object lifecycle. Care will be taken to ensure digital content falling within the scope of the repository will be prepared according to stringent submission specifications.
- Commit to budgeting for the long-term preservation of digital assets.
- Collaborate with other institutions and organizations to strengthen our commitment to digital preservation and to share resources and best practices.
For more information on digital preservation strategies, see Appendix B, Section 3.g: “Preservation Planning,” Strategic Priorities for Digital Preservation, 2015-2018.
- Trusted Digital Repositories: Attributes and Responsibilities
- Reference Model for an Open Archival Information System
4. Technological Infrastructure
4.a Digital Archive Operations
The UH Libraries digital preservation workflow must process digital objects from ingest to archival storage and access in compliance with the ISO-OAIS functional model (Figure 1) and other digital preservation standards and best practices. The digital preservation function, including Archivematica, is situated within the larger digital access and preservation framework (Figure 2).
Figure 1. OAIS Reference Model
Figure 2. UH Digital Access and Preservation Workflow
4.b Platform Requirements and Procedures
To meet current and future needs, the UH Libraries Digital Preservation System should fulfill the following requirements:
- Scalability: The ability for the repository to scale to manage large collections of digital objects.
- Extensibility: The ability to integrate external tools with the repository to extend the functionality of the repository, via provided software interfaces (APIs), or by modifying the code-base (open source software).
- Interoperability: The ability for the repository to interoperate with other repositories.
- Security: The system must provide multiple mechanisms to prevent compromise and data loss.
- Performance: The system must perform at a level that satisfies the needs of the users of system. It must have quick response times and high system availability.
- Flexibility: The ability for multiple instances for offsite recovery; the ability to function with the offsite backup facility; the ability for components to reside at different physical locations; the ability for development, testing, and production environments; capability for disaster recovery.
- System support: The quality of documentation and responsiveness of support staff or developer/user community (open source) to assist with problems.
- Development community: Reliability and support track record of the company providing the software; or size, productivity, and cohesion of the open source developer community.
- Development organization: Viability of the company providing the software; or stability of the funding sources and organizations developing open source software.
- Technology roadmap for the future: Technology roadmap that defines a system evolution path incorporating innovations and “next practices” that are likely to deliver value.
Purdue University Libraries, “File Format Recommendations”
Digital Library Foundation, “Benchmark for Faithful Digital Reproductions of Monographs and Serials,” 2002