Everyone would agree that media content libraries are extremely valuable. Even more so as we are seeing a sudden and growing trend for archive content to become popular, making it monetisable a long time after first being created. Ensuring your content is preserved and usable long into the future has therefore become extremely important and yet many broadcasters and content providers are not taking the necessary steps to ensure fixity and digital preservation.
A short time back, the National Digital Stewardship Alliance (NDSA) published a number of excellent articles looking at the different levels of digital preservation you should apply to your content and how you can measure it. The concepts they described are valid and still very relevant nowadays, however the media industry has changed dramatically since that time and some adjustments could be made to the importance of some of the categories they defined.
Why is Fixity Important?
A fundamental goal of digital preservation is to establish and check its “fixity” or stability. In the context of digital preservation, fixity is the property of a digital file or object being fixed or unchanged. This is synonymous with bit-level integrity. Fixity information offers evidence that one set of bits is identical to another.
The PREMIS data dictionary defines fixity information as "information used to verify whether an object has been altered in an undocumented or unauthorized way." Fixity information is normally based on checksums or cryptographic hashes.
There are a whole range of reasons to collect, maintain and verify fixity information on your digital content:
When and Where Should Fixity be Generated?
There are different approaches on how and when to generate fixity information. That could be on ingest, on transfer, at regular intervals, or when content is moved into storage systems, for example. Sometimes, it may be generated on portions of the content.
Obviously, the generation of fixity information means that you have to access the content. The uncertainty principle can be applied here in the sense that accessing the content to extract the fixity information will generate different effects in the systems holding or accessing the content depending on how and how often fixity is generated and checked. Some of those effects can be:
By Francisco Ontoso, CTO, Object Matrix
- Assure the good reception of the content
- Assure the content hasn't changed unexpectedly
- Assure the content hasn't changed in transfers
- Support the repair of corrupted or altered content
- Monitor hardware degradation
- Allow change in a portion of the content leaving the rest intact
- Support the monitoring of production or digitalisation processes
- Document provenance and history
- Detect human errors in the manipulation of the content
When and Where Should Fixity be Generated?
There are different approaches on how and when to generate fixity information. That could be on ingest, on transfer, at regular intervals, or when content is moved into storage systems, for example. Sometimes, it may be generated on portions of the content.
Obviously, the generation of fixity information means that you have to access the content. The uncertainty principle can be applied here in the sense that accessing the content to extract the fixity information will generate different effects in the systems holding or accessing the content depending on how and how often fixity is generated and checked. Some of those effects can be:
- Removing CPU time from other services because they are calculating fixity information
- Degradation of the hardware holding the data
- Redundant fixity information if different systems are calculating it
- Storage and Geographic Location
- File Fixity and Data Integrity
- Information Security
- Metadata
- File Formats
By Francisco Ontoso, CTO, Object Matrix