A university ID is required for entry between 9:00pm and 7:00am during extended hours

Appendix B: Technological Infrastructure

Right Block

Please note some links lead to information accessible only to the staff of UH Libraries.

1. OAIS Reference Model

UH Libraries uses the Open Archival Information System reference model (OAIS) as a basis and a means for communicating about digital preservation. The OAIS reference model is an ISO standard and outlines a common set of requirements and practices developed around the globe. OAIS also provides a common language for communication, collaboration, and sharing information across repositories. UH Libraries is committed to establishing and maintaining a digital preservation system that is OAIS compliant. Functional entities implemented in our system, such as pre-ingest, ingest, archival storage, data management, administration, preservation planning, and access are OAIS compliant. UH Libraries is establishing its program using a planning document designed around the OAIS model and the Trusted Digital Repository requirements and will periodically review our program to ensure ongoing compliance.

Related Documentation:

 

2. Information Packages

A basic concept of the OAIS Reference Model is the need to combine data and representation information into information objects. This model is valid for all the types of information in an OAIS compliant repository, such as the UH Libraries’ digital preservation system. The OAIS Model contains the following three types of information packages:

Submission Information Packages (SIPs) are delivered by the producer to the digital preservation repository for use in the construction of one or more AIPs. Policies and procedures related to the delivery of digital content by the producer to the repository are further detailed in Appendix B, Section 3.a: “Pre-ingest: Producer-Archive Interaction” and Appendix B, Section 3.c: “Ingest.”

Archival Information Packages (AIPs) consist of the content information and the associated preservation description information (PDI), which is preserved within the digital preservation repository. See Appendix B, Section 3.d: “Archival Storage.”

Dissemination Information Packages (DIPs) are derived from one or more AIPs and received by the consumer in response to a request to the digital preservation repository. See Appendix B, Section 3.h: “Access” for information on how access is provided to the consumer from the digital preservation repository.

Related Documentation:

 

3. Functional Entities

3.a Pre-ingest: Producer-Archive Interaction

Digitization projects conducted by UH Libraries are initiated by a project plan meeting, which brings relevant stakeholders together, including as appropriate Special Collections, MDS, the Architecture and Art Library, the Music Library, and/or the Health Sciences Library, to discuss project scope and parameters. In addition to identifying physical condition concerns of analog materials and establishing a project timeline, the meetings outline instructions for digital object creation, including digitization and metadata specifications. Upon the completion of the project and the successful ingest of digital objects into the UHDL (or other access system depending on project specifications), preservation administrators prepare this content for transfer to UH Libraries’ digital preservation system by ensuring that it complies with SIP Specification (see Appendix B, Section 3.c: “Ingest” for additional information).

With born-digital items, steps are taken to insure the authenticity and integrity of the files, such as the use of write blockers or capture of disk images, during transfer of the files to the archives. Files are previewed before being transferred and validated within the archival setting. The information producer is asked to retain a copy of the files until they are ingested into the digital preservation system.

The Digital Preservation Working Group will determine workflows and revisions to policies for other materials that require preservation but have no existing policy, such as outsourced digitization or electronic serials produced by the University.

Related Documentation:

3.b Common Services

Operating system services

The digital preservation system Archivematica will be run on the Ubuntu Operating System per the developers’ recommendation. Ubuntu is a Debian-based Linux operating system that uses the open source development model. It is a mature product with a large installed base and an active development community. Ubuntu has a large and extensible suite of utilities, services, and applications that fully support the digital preservation system. Our Ubuntu server is run in a virtual machine from within the Libraries’ highly scalable and highly available server virtualization environment.

Network services

Archivematica has full access to the Libraries’ 1 Gbps switched Ethernet network using TCP/IP for communication. It will access common network services such as DNS, SMTP, etc. The digital preservation system will access the Libraries’ storage area network using the iSCSI protocol and will access cloud storage locations using HTTPS over the Internet.

Security services

Shell level access to the operating system is limited to the Library Technology Services staff who are responsible for administering the server. Non-LTS staff access is granted on a case-by-case basis. Shell accounts are created and administered by LTS staff. Sudo is used to assign temporary privileges when performing administrative tasks in the shell.

All web interfaces that require a login are secured with 2048 bit SSL certificate. Accounts for the Archivematica web interface (dashboard) are limited to staff who need access to perform their roles in the preservation workflow. Archivematica allows two types of accounts, users and administrators. Administrator accounts have full access to all the functions of the system. User accounts have access to all the functions of the system, except they cannot create other users and cannot modify the preservation planning settings. Dashboard accounts are created and administered by LTS staff. Access to, and administration of, the Archivematica storage services dashboard is limited to LTS staff and Digital Preservation Administrators.

All unnecessary operating system services are disabled. The built-in software firewall blocks all incoming network connections except those required for the administration and normal operations of the digital preservation system.

Security patches and updates are applied to the operating system each month or as critical patches are released. Key users will be notified when updates are scheduled for installation and if a server restart is required. Updates to the digital preservation system software are applied as released by the developer after thorough review by LTS staff and the DPWG.

System health, performance, security will be actively monitored by the Libraries’ Xymon Monitor service which alerts LTS staff when events occur.

Preservation data will be backed up on a weekly basis to the Libraries’ rdiff-backup server and retained for 30 days.

3.c Ingest

Preparation of digital assets for transfer

At UH Libraries, digital assets that are candidates for preservation originate/reside in one of two departments: for digitized assets, MDS serves as a producer or steward of content; for born-digital assets, Special Collections plays a critical role through its accessioning of born-digital materials from donors or campus entities. Thus, to ensure a chain of custody for our digital assets, preservation responsibilities reside in both departments.

In preparing materials for transfer, both MDS and Special Collections create submission information packages (SIPs) that contain the materials necessary for the long-term preservation of digital assets. To ensure consistent quality across all submissions, the Digital Preservation Working Group has created SIP specifications aimed at preserving:

  •  The digital assets themselves,
  • The minimal descriptive metadata attributed to these assets that is necessary to ensure appropriate levels of reliability, authenticity, and provenance,
  • The structural metadata necessary to preserve a canonical record of original order and/or hierarchy (if applicable).

All metadata preserved in the preservation repository adheres to thoroughly-documented, widely-adopted standards that are executed in languages that are both human and machine readable for persistent access across time and changing technologies.

The Digital Preservation Working Group will determine workflows and revisions to policies for other materials that require preservation but have no existing policy, such as outsourced digitization or electronic serials produced by the University.

 

University of Houston digital preservation system SIP specification

This section outlines the minimum requirements necessary for all SIPs that are to be ingested into the system.

Directory Structure & File Placement

Create a top-level folder complying with local naming conventions for top-level folders (either born-digital or digital conventions). Beneath that level, create three directories named “logs,” “metadata,” and “objects.”

The Objects Directory

For digitized collections, within the objects directory, create two folders: one named “pm” and the other named “mm.” Place all preservation master files in the pm directory, and modified masters, if any, in the mm directory.

For born-digital collections, within the objects directory, create a structure that faithfully represents the original order/structure of the digital materials, using the original names given to folders by the creator(s), as well as a structure presenting files as they have been arranged and described, if available. All spaces or other forbidden characters will be replaced by appropriate characters during the Archivematica transfer process.

The Metadata Directory

Place the completed .csv file “metadata.csv” directly into the metadata directory. This file contains core descriptive metadata for either an entire collection (at the collection level) or both the collection and the individual files within that collection (in the case of collections migrated from CONTENTdm to the new DAMS). For future collections, item-level descriptive metadata and access files will be preserved in a separate SIP exported from the DAMS after a collection is published.

Supplemental Documentation

If there are donor agreements, transfer forms, copyright agreements, or any correspondence or other documentation relating to the transfer, create a folder within the “metadata” directory named “submissionDocumentation.” You may place items in this folder. “submissionDocumentation” is also where any access system METS files or metadata in other formats not currently supported by Archivematica’s ingest funtion are stored.

Core Descriptive Metadata Record

The core descriptive metadata record will typically contain all of the following fields:

  • dcterms:title
  • dcterms:creator
  • dc:date
  • dcterms:description
  • dcterms:publisher
  • dc:rights
  • dcterms:accessRights

Of these elements, only dcterms:title is required. The rest of the elements are recommended.

For guidance on what information should be placed within these fields, refer to the UH BCDAMS-MAP.

For information on how to construct the .csv Dublin Core record for ingest, see Metadata import in the Archivematica user documentation.

Archivematica ingest of SIPs and AIP/DIP generation

During the ingest process, digital objects are packaged into SIPs and run through a wide range of microservices, potentially including normalization. In Archivematica, “normalizing is the process of converting digital objects to preservation and/or access formats.” Though this process results in new files, the original objects are always kept.

After the SIP is approved by the system, the package is run through additional microservices, which includes the processing of submission documentation, generation of the Archivematica METS file, indexing, generation of the DIP and packaging of the AIP.

The Digital Preservation Working Group will determine workflows and revisions to policies for DIP content.

Retrieving and updating AIPs

In order to effectively track ingests, they must be logged in accordance with departmental documentation standards. A key piece of information that must be documented by all departments is the ARK that will be assigned to each preservation package during the digital access and preservation workflow. This will aid in AIP retrieval in the future. Currently, the only way to find and retrieve an AIP from Archivematica is through its search feature, which utilizes a limited index of digital asset metadata. Once an AIP is created, quality assured, and passed to archival storage, the package is not to be altered.

Related Documentation:

3.d Archival Storage

The storage locations used by the digital preservation system are volumes on the Libraries’ storage area network (SAN). The Libraries’ SAN is composed of multiple storage arrays that each have a large number of disk drives. The drives are managed by high performance redundant hard drive controllers in a RAID 6 configuration. This RAID level allows for the failure of up to 3 disks in each array before data is lost. The disk drives have predictive media-error detection and correction capabilities.

Collection files are prepared and temporarily stored on SAN storage volumes designated for that purpose. Users with accounts in preservation system dashboard can transfer collection files of various types into the system and create SIPs for ingestion into archival storage. Once successfully transferred and ingested into the archival storage, the collection files are deleted from the temporary and processing locations. SIPs and AIPs are stored in predefined locations in the preservation system storage that can be selected during processing.

Copies of files in archival storage are duplicated to offsite cloud storage location(s) periodically at the completion of the digital project workflow. Copies of archival storage files will be retrieved from the cloud storage locations in the event that local files are lost, corrupted, or backups don’t contain the needed files.

Fixity checking of collections in archival storage is performed on a monthly schedule using Archivematica’s fixity tool. A random sample of checksums will be checked each month, and a full fixity check of all checksums will be conducted during the 3-year audit.

Users with accounts in the preservation system dashboard can create DIPs from stored AIPs by browsing and downloading collections as needed.

3.e Data Management

AIPs placed in archival storage will not be changed or modified. Adding or changing objects in a collection is accomplished by ingesting new files, creating a new SIP, and relating it to the original AIP with the accession number in the package title.

Simple queries for filenames and metadata contents can be performed using the search function in the dashboard.

3.f Administration

Management of UH Libraries’ digital preservation function will be a collaborative process, led by the Digital Preservation Working Group in conjunction with key library stakeholders. This section reviews the major services related to the system’s implementation and functionality, provides the guidelines that inform these services, and identifies the agents who are responsible for implementing the services. It also addresses physical access and environmental control issues important to the long-term maintenance of electronic media.

 

Chart 4: Administration of UH Libraries Digital Preservation System

Service Description Guidelines Responsible Agent

Establish Standards and Policies

Establish and maintain the UH Libraries Digital Preservation Policy and standards

As workflows and technology change, so too will the need for updating current standards as well as establishing new policies. The Digital Preservation Working Group will work closely with the UH Libraries’ key stakeholders to monitor professional developments, craft changes to the Digital Preservation Policy, and coordinate the integration of these changes into existing workflows and programs.

DPWG

Manage System Configuration

Provide system specifications for the repository to continuously monitor the functionality of the entire repository and systematically control changes to the configuration

Changes to the UH Libraries’ digital preservation system are made through Archivematica’s Dashboard Administration Tab. This interface controls content related to:

 

  • Processing Configuration
  • Failure Reporting
  • Transfer Source Locations
  • AIP Storage Locations
  • User Administration

 

The Digital Preservation Working Group will determine default values for these content areas.

 

Related Documentation:

Archivematica, Dashboard Administration Tab, 2014

DPWG

 

Head of Library Technology Services

Negotiate Submission Agreement

Solicit desirable archival information for the DPR and negotiate Submission Agreements with producers

SIP data intended for ingest into Archivematica should comply with the specifications for digital objects and associated metadata, outlined in  A.2 “Digital Assets” and in B.3 “Functional Entities” of the UH Libraries’ Digital Preservation Policy.

 

The Digital Preservation Working Group, in conjunction with the Digital Collections Management Committee, will determine the workflow for objects that fall outside this scope.

Digitization Services Coordinator

 

Digital Projects Coordinator

Update Archival Information

Provide a mechanism for updating archive contents

Archivematica will preserve AIPs in the condition in which they were ingested into the system. Changes to metadata previously ingested will be preserved through the submission of a new package from the DAMS which will be identified by the ARK.

Digitization Services Coordinator

 

Digital Projects Coordinator

Repository Self-Audit

Verify that submissions meet the specifications of the Submission Agreement; will maintain a record of event-driven requests and periodically compare it to the contents of the archive to determine if all needed data is available

In the final year of each three-year cycle, the Head of Digital Research Services, the Head of Library Technology Systems, the Digitization Services Coordinator, and the Digital Projects Coordinator will conduct an audit of the UH Libraries’ digital preservation system and its contents. During each review cycle, this group will establish digital preservation audit criteria, such as:

 

  • Content inventory
  • Core metadata
  • PREMIS event recording
  • Selection criteria
  • Emerging standards

 

Head of Digital Research Services

 

Head of Library Technology Services

 

Digitization Services Coordinator

 

Digital Projects Coordinator

Provide Customer Support

Create, maintain and delete administrator accounts

Because digital objects are at great risk for degradation or loss, their exposure to human interaction should be minimized during the preservation process. Consequently, only key stakeholders with digital preservation responsibilities will have access to Archivematica. These positions include:

 

  • Head of Digital Research Services
  • Head of Metadata and Digitization Services
  • Head of Library Technology Services
  • Digitization Services Coordinator
  • Digital Projects Coordinator

 

Head of Library Technology Services

 

Physical access and environmental control

Ensuring a secure and stable environment for electronic media is essential to the long-term preservation of digital objects. Key units in UH Libraries have implemented security measures to prevent media from damage or theft. Servers in the Library Technology Services department are positioned in one physical location. UH Libraries restricts access to these areas through door locks and card-access devices. Those employees with direct reporting lines to Library Technology Services receive access to these respective areas.

In addition to secure physical locations, environmental conditions also play an important role in long-term digital preservation. Strict environmental controls are necessary to slow the rate of deterioration, since even electronic media is affected by the levels of temperature, relative humidity, light, and air pollution in which they are stored. Establishing baseline temperature and relative humidity controls and minimizing fluctuations in these areas slows chemical deterioration. Best practices recommend the following values for areas containing magnetic media for long-term digital preservation.

Chart 5: Environmental Controls for Magnetic Media

Temperature (Degrees F)

Allowable Range

(+ or -)

Relative Humidity

Allowable Fluctuations (+ or -)

65°

30%

5%

 

3.g Preservation Planning

Strategic priorities for digital preservation, 2015-2018

Preparing for the demands of digital preservation requires UH Libraries to be strategic with its allocation of resources and talent. This section identifies the future actions of a digital preservation program, based on principles in the policy framework, and categorizes them as short-term (within the next three years) and long-term priorities. These principles focus on four key areas:

  1. Repository robustness
  2. Risk management
  3. Planning and development
  4. Collaboration

 

Chart 6: Short-Term Priorities

Category

Action

Repository Robustness

  • Normalize higher risk files (while retaining the original file) to preservation standards for those file formats where open specification or less risky file formats exist. As part of its implementation work, Digital Preservation Administrators will jointly determine which files should be converted and to which particular file format to best provide characteristics such as functionality, longevity, and preservability. Archivematica will automate these preservation decisions.
  • Continually identify and implement essential preservation tools within the long-term digital repository, so that we can use them to reliably preserve our collections for future re-use.
  • Devise preservation plans for all major types of digital collection content held in the repository, so that we can invoke the necessary preservation tools in a timely manner.
  • Monitor file integrity, so that we may identify corrupt files and act accordingly to ensure only files with their integrity intact are delivered to users.
  • Utilize shared technical digital preservation services where appropriate such as representation information registries, so that we do not unnecessarily duplicate efforts.
  • Audit the repository against a recognized digital preservation repository audit methodology, such as ISO 16363:2012, so that we may independently validate our approach and measure our progress over time.

Risk Management

  • Clearly define our technical requirements and collection policies for preservation throughout the information lifecycle, so that we can ensure preservation needs are known and can be addressed as and when relevant.
  • Implement rigorous quality assurance process for digitized content, so that we can identify content of inadequate quality before it enters the preservation workflow.
  • Implement rigorous quality assurance process for digitized content, so that we can identify content of inadequate quality before it enters the preservation workflow.
  • Implement tools and end-to-end workflows for digital content, so that we control the risks associated with receiving, managing, processing, and ingesting digital collection content.
  • Ingest valid legacy digital content into preservation storage as soon as possible, so that distributed and inconsistent storage and management practices are minimized and the risks associated with such practices addressed.

Planning and Development

  • Document our relevant policies, procedures, standards, and systems development, so that they may be sustained, audited, and understood over time.
  • Plan and budget for long-term preservation of content at point of acquisition, so that financial sustainability is considered early in the lifecycle.
  • Consider sustainability in all future system procurement exercises and content oriented partnerships, so that we enter new initiatives with a long-term vision and plan.
  • Ensure that all staff working with a responsibility for digital content understand the issues associated with preserving it, so that sustainability and preservation become an embedded consideration when developing and planning new systems and workflows.

Collaboration

  • Seek out appropriate opportunities to collaborate with other institutions and organizations on digital preservation initiatives that meet our business needs, so that we may benefit from shared resources available to address shared challenges.
  • Ensure our collaboration with professional digital preservation membership organizations such as the National Digital Stewardship Alliance (NDSA) and the Digital Preservation Network (DPN) is in line with organizational requirements, so as to achieve the maximum return on investment in terms of time, effort, and financial commitment.

 

 

Chart 7: Long-Term Priorities

Category

Action

Repository Robustness

  • Test different technical strategies such as migration and emulation so that we can identify appropriate large scale approaches and tools to combat technological obsolescence.

Risk Management

  • Integrate digital preservation risk management into our collection management and risk management strategies, so that digital risks are treated comparably with those facing analog content and regular preservation risk assessments are undertaken.

Collaboration

  • Deliver successful contributions to collaborative projects already underway, including the NDSA, so that we meet existing commitments and maintain the Libraries’ place at the cutting edge of international collaborative digital preservation.
  • Exchange knowledge and expertise across the wider international digital preservation and digital cultural heritage communities, for other institutions to learn from our work and provide opportunities to identify potential future partners with similar interests.

 

Content selection strategies for cloud-based storage

As digital preservation activities increase at UH Libraries, the need to prioritize the type of content redundantly stored to the cloud may arise. We believe that everything preserved in Archivematica storage should be synced to DuraCloud. However, in the event that resources will not allow for everything to be transferred to DuraCloud, one potential model for prioritization analyzes the level of risk certain kinds of digital and analog formats face and ranks them according to their current and future physical condition. UH Libraries’ digital preservation stakeholders should review this model periodically, update it as resources and circumstances change, and be transparent when it moves to this model for cloud-based storage selection.

 

Chart 8: Risk Levels for Content Selection

Risk Level

Description

Type of Content

Extreme Risk

No other copies of the digital or analog content are preserved.

 

  • Born digital: unique holdings
  • Physical content: no analog resource available
  • Physical content: analog resource to fragile or cost-prohibitive to re-digitize

 

Significant Risk

Analog content exists but additional digitization beyond an initial capture could result in damage to the item. Content could also be residing on formats that will likely become obsolete in the near future.

  • Magnetic tape and film content: resource in poor condition

  • Digitized content: obsolete file formats

High Risk

Analog content exists but the material to be digitized is fragile or at a high preservation risk (e.g., magnetic tape). Re-digitization is not out of the question, though it is not ideal.

 

  • Born digital: available elsewhere
  • Magnetic tape and film content: analog resource in fair condition
  • Physical content: analog resource in poor condition
  • Transparencies and photographic film: no photographic prints available

 

Low Risk

Analog content exists and there are few limitations on the number of times an object can be re-digitized

  • Physical content: analog resource in stable condition

3.h Access

The vast majority of digitized content is made available to consumers through the University of Houston Digital Library. The repository’s contents are indexed by all major search engines, as well as by UH Libraries’ discovery platform. Additionally, the UHDL interface allows consumers to search for content using simple and advanced search features. Browsing content is also available through the “Browse the Collection,” “Related Collections,” and “Related Items” features. Consumers are permitted to download certain static image files through the UHDL Digital Cart Service.

Currently, consumers determine the availability of born-digital content stored in the UH Libraries digital preservation system through the online finding aid. For collections in which the digital materials have already been processed and described in the finding aid, access files will be available for viewing on the Reading Room computer. Consumers may also contact the collection curator to request and receive information products that are served up in the reading room on demand.

 

4. References

Archivematica 1.5 Documentation, 2016
https://www.archivematica.org/en/docs/archivematica-1.5/

British Library, Digital Preservation Strategy, 2013
http://www.bl.uk/aboutus/stratpolprog/collectioncare/digitalpreservation/strategy/BL_DigitalPreservationStrategy_2013-16-external.pdf

Lyrasis, Environmental Specifications for the Storage of Library & Archival Materials, 2009
http://www.lyrasis.org/LYRASIS%20Digital/Documents/Preservation%20PDFs/environspec.pdf

Gail McMillan and Rachel Howard, “Chapter 5: Content Selection, Preparation, and Management,” A Guide for Distributed Digital Preservation, MetaArchive Cooperative Publications, eds. Katherine Skinner and Matt Schultz, 2010
https://educopia.org/sites/educopia.org/files/publications/A_Guide_to_Distributed_Digital_Preservation_0.pdf