minerva homepage


 
  About Minerva  
  Structure  
   
  NPP  
  Good practises  
  Competence centres  
  Digitisation guidelines  
  European and national rules on the Web Applications  
  Enlargement  
  Events  
  References  
  Publications
 
 

home  |  search  |  map  |  contact us  

Path: Home | Publications | Technical Guidelines | Table of contents | Storage and management

 

Interoperability and service provision centres Working group

Technical Guidelines
for Digital Cultural Content Creation Programmes
Version 1.0: Revised 08 April 2004


This document has been developed on behalf of the Minerva Project by UKOLN, University of Bath, in association with MLA The Council for Museums, Libraries & Archives

cover of  handbook

 

5. Storage and management of the digital master material

Preservation issues must be considered an integral part of the digital creation process. Preservation will depend upon documenting all of the technological procedures that led to the creation of an object, and much critical information can - in many cases - be captured only at the point of creation.
Projects must consider the value in creating a fully documented high­quality ‘digital master’ from which all other versions (e.g. compressed versions for access via the Web) can be derived. This will help with the periodic migration of data and with the development of new products and resources.
It is important to realise that preservation is not just about choosing suit­able file formats or media types. Instead, it should be seen as a funda­mental management responsibility for those who own and manage dig­ital information content, ensuring its long-term use and re-use. This depends upon a variety of factors that are outside of the digitisation process itself, e.g. things like institutional stability, continued funding and the ownership of intellectual property rights.
However, there are technical strategies that can be adopted during the digitisation process to facilitate preservation. For example, many digitisation projects have begun to adopt strategies based on the creation of metadata-rich ‘digital masters’. A brief technical overview of the ‘digital master’ strategy is described in the informa­tion paper on the digitisation process produced for the UK NOF-digi-tise programme by HEDS.

Guidance:
Joint NPO and RLG Preservation Conference Guidelines for Digital Imaging 28 - 30 September 1998
http://www.rlg.org/preserv/joint/
Available 2005-02-15

Preservation Management of Digital Materials Handbook
http://www.dpconline.org/graphics/handbook/
Available 2005-02-15

The Digitisation Process
http://www.ukoln.ac.uk/nof/support/help/papers/digitisation_process/
Available 2005-02-15

5.1 File formats

Open standard formats should be used when creating digital resources in order to maximise access. (Note that file formats for the delivery of digital records to users are outlined in 7.1) The use of open file formats will help with interoperability, ensuring that resources are reusable and can be created and modified by a variety of applications. It will also help to avoid dependency on a particu­lar supplier.
However, in some cases there may be no relevant open standards or the relevant standards may be sufficiently new that conformant tools are not widely available. In some cases therefore, the use of proprietary formats may be acceptable. However, where propri­etary formats are used, the project should explore a migration strat­egy that will enable a transition to open standards to be made in the future.
If open standards are not used, projects should justify their require­ment for use of proprietary formats within their proposals for fund­ing, paying particular attention to issues of accessibility.

Text capture and storage

Character encoding
A character encoding is an algorithm for presenting characters in digital form by mapping sequences of code numbers of characters (the integers corresponding to characters in a repertoire) into sequences of 8-bit values (bytes or octets). An application requires an indication about the character encoding used in a document in order to interpret the bytes which make up that digital object.
The character encoding used by text-based documents should be explicitly stated. For XML documents, the character encoding should usually be recorded in the encoding declaration of the XML declaration.
For XHTML documents, the XML declaration may be omitted, but the encoding must be recorded within the value of the http-equiv attrib­ute of a meta element.
For character encoding issues in the delivery of documents, see 7.1.1.1.

Standards:
The Unicode Consortium. The Unicode Standard, Version 4.0.0, defined by: The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1)
http://www.unicode.org/versions/Unicode4.0.0/
Available 2005-02-15

Extensible Markup Language (XML) 1.0
http://www.w3.org/TR/REC-xml/
Available 2005-02-15

XHTML 1.0 The Extensible HyperTextMarkup Language
http://www.w3.org/TR/xhtml1/
Available 2005-02-15

Guidance:
JukkaKorpela, A Tutorial on Character Code Issues
http://www.cs.tut.fi/~jkorpela/chars.html
Available 2005-02-15

Document formats
Text based content should be created and managed in a structured format that is suitable for generating HTML or XHTML documents for delivery.
In most cases storing text-based content in an SGML- or XML-based form conforming to a published Document Type Definition (DTD) or XML Schema will be the most appropriate option. Projects may choose to store such content either in plain files or within a database of some kind. All documents should be validated against the appropriate DTD or XML Schema.
Projects should display awareness of and understand the purpose of standardised formats for the encoding of texts, such as the Text Encoding Initiative (TEI), and should store text-based content in such formats when appropriate. Projects may store text-based content as HTML 4 or XHTML 1.0 (or subsequent versions). Projects may store text-based content in SGML or XML formats conforming to other DTDs or Schemas, but must provide mappings to a recognised schema.
In some instances, projects may choose to store text-based content using Adobe Portable Document Format (PDF). PDF is a proprietary file format owned by Adobe that preserves the fonts, formatting, colours and graphics of the source document. PDF files are compact and can be viewed and printed with the freely available Adobe Acrobat Reader. However, as with any proprietary solution, there are dangers in its adoption and projects should be aware of the potential costs of this approach and should explore a migration strategy that will enable a future transition to open standards to be made. (See also section 7.1.1 for considerations regarding the accessibility of PDF documents).

Standards:
ISO 8879:1986. Information Processing — Text and Office Systems — Standard Generalized Markup Language (SGML) Extensible Markup Language (XML) 1.0
http://www.w3.org/TR/REC-xml/
Available 2005-02-15

Text Encoding Initiative (TEI)
http://www.tei-c.org/
Available 2005-02-15

HTML 4.01 HyperTextMarkupLanguage
http://www.w3.org/TR/html401/
Available 2005-02-15

XHTML 1.0 The Extensible HyperTextMarkup Language
http://www.w3.org/TR/xhtml1/
Available 2005-02-15

Other references:
Portable Document Format (PDF)
http://www.adobe.com/products/acrobat/adobepdf.html
Available 2005-02-15

Guidance:
AHDS Guide to Good Practice:
Creating and Documenting Electronic Texts
http://ota.ahds.ac.uk/documents/creating/
Available 2005-02-15

Still image capture and storage

Digital still images fall into two main categories: raster (or ‘bit-mapped’) images and vector (‘object-oriented’) images. Raster images take the form of a grid or matrix, with each ‘picture element’ (pixel) in the matrix having a unique location and an independent colour value that can be edited separately. Vector files provide a set of mathematical instructions that are used by a drawing program to construct an image.
The digitisation process will usually generate a raster image; vector images are usually created as outputs of drawing software.  

Raster images
When creating and storing raster images, two factors need to be con­sidered: the file format and the quality parameters. Raster images should usually be stored in the uncompressed form gen­erated by the digitisation process without the application of any sub­sequent processing. Raster images must be created using one of the following formats: Tagged Image File Format (TIFF), Portable Network Graphics (PNG), Graphical Interchange Format (GIF) or JPEG Still Picture Interchange File Format (JPEG/SPIFF).

There are two primary parameters to be considered:

  • Spatial resolution: the frequency at which samples of the original are taken by the capture device, expressed as a number of samples per inch (spi), or more commonly just as pixels per inch (ppi) in the result­ing digital image.
  • Colour resolution (bit depth): the number of colours (or levels of brightness) available to represent different colours (or shades of grey) in the original, expressed in terms of the number of bits available to represent colour information, e.g. a colour resolution of 8 bits means 256 different colours are available.

In general photographic images should be created as TIFF images.
The selection of quality parameters required to capture a useful image of an item is determined by the size of the original, the amount of detail in the original and the intended uses of the digital image. Digitising a 35mm transparency will require a higher resolu­tion than a 6x4 print because it is smaller and more detailed; if a required use of an image of a watercolour is the capacity to analyse fine details of brushstrokes, then that requires a higher resolution than that required to simply display the picture as a whole on a screen.
Images should be created at the highest suitable resolution and bit depth that is both affordable and practical given the intended uses of the images, and each project must identify the minimum level of qual­ity and information density it requires.
As a guide, a resolution of 600 dots per inch (dpi) and a bit depth of 24-bit colour or 8-bit greyscale should be considered for photographic prints. A resolution of 2400 dpi should be considered for 35 mm slides to capture the increased density of information. (Source: EMII DCF)
In some cases, for example when using cheaper digital cameras, it may be appropriate to store images in JPEG/SPIFF format as an alternative to TIFF. This will result in smaller, but lower quality images. Such images may be appropriate for displaying photographs of events etc. on a Web site but it is not suggested that such cameras are used for the large-scale digi­tisation of content. (Source: NOF-digitise)

Standards:
Tagged Image File Format (TIFF)
http://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pdf
Available 2005-02-15

Joint Photographic Expert Group (JPEG)
http://www.w3.org/Graphics/JPEG/
Available 2005-02-15

JPEG Still Picture Interchange File Format (SPIFF)
http://www.jpeg.org/public/spiff.pdf
Available 2005-02-15

Guidance:
TASI: Advice: Creating Digital Images
http://www.tasi.ac.uk/advice/creating/creating.html
Available 2005-02-15

Graphic non-vector images
Computer-generated images such as logos, icons and line drawings should normally be created as PNG or GIF images at a resolution of 72 dpi. (N.B. Images resulting from the digitisation of physical line draw­ings should be managed as described in the previous section.)

Standards:
Portable Network Graphics (PNG)
http://www.w3.org/TR/PNG
Available 2005-02-15

Vector images
Vector images consist of multiple geometric objects (lines, ellipses, polygons, and other shapes) constructed through a sequence of commands or mathematical statements to plot lines and shapes. Vector graphics should be created and stored using an open format such as Scalable Vector Graphics (SVG), an XML language for describing such graphics. SVG drawings can be interactive and dynamic, and are scalable to different screen display and printer res­olutions.
Use of the proprietary Macromedia Flash format may also be appro­priate, however projects should explore a migration strategy so that they can move to more open formats once they become widely deployed. In addition, the use of text within the Flash format should be avoided, in order to enable the development of multi-lin-gual versions.

Standards:
Scalable Vector Graphics (SVG)
http://www.w3.org/TR/SVG/
Available 2005-02-15

Other references:
Macromedia Flash
http://www.macromedia.com/
Available 2005-02-15

Video capture and storage

Video should usually be stored in the uncompressed form obtained from the recording device without the application of any subsequent process­ing. Video should be created at the highest suitable resolution, colour depth and frame rate that are both affordable and practical given its intended uses, and each project must identify the minimum level of qual­ity it requires.
Video should be stored using the uncompressed RAW AVI format, without the use of any codec, at a frame size of 720x576 pixels, a frame rate of 25 frames per second, using 24-bit colour. PAL colour encoding should be used.
Video may be created and stored using the appropriate MPEG for­mat (MPEG-1, MPEG-2 or MPEG-4) or the proprietary formats Microsoft WMF, ASF or Quicktime.

Standards:
Moving Pictures Experts Group (MPEG)
http://www.cselt.it/mpeg/
Available 2005-02-15

Audio capture and storage

Audio should usually be stored in the uncompressed form obtained from the recording device without the application of any subsequent processing such as noise reduction. Audio should be created and stored as an uncompressed format such as Microsoft WAV or Apple AIFF. 24-bit stereo sound at 48/96 KHz sample rate should be used for master copies. This sampling rate is suggested by the Audio Engineering Society (AES) and the International Association of Sound and Audiovisual Archives (IASA).
Audio may be created and stored using compressed formats such as MP3, WMA, RealAudio, or Sun AU formats.

 

5.2 Media choices

Different digital storage media have different software and hardware requirements for access and different media present different storage and management challenges. The threats to continued access to digi­tal media are two-fold:

  • The physical deterioration of, or damage to, the medium itself
  • Technological change resulting in the obsolescence of the hardware and software infrastructure required to access the medium

The resources generated during digitisation project will typically be stored on the hard disks of one or more file servers, and also on portable storage media. At the time of writing, the most commonly used types of portable medium are magnetic tape and optical media (CD-R and DVD).
Portable media chosen should be of good quality and purchased from reputable brands and suppliers, and new instances should always be checked for faults. Media should be handled, used and stored in accor­dance with their suppliers’ instructions.
Projects should consider creating copies of all their digital resources – metadata records as well as the digitised objects - on two different types of storage medium. At least one copy should be kept at a loca­tion other than the primary site to ensure that they are safe in the case of any disaster affecting the main site. All transfers to portable media should be logged. (Source: Minerva GPH, DPC) Media should be refreshed (i.e. the data copied to a new instance of the same medium) on a regular cycle within the lifetime of the medium. Refreshment activity should be logged. (Source: Minerva GPH, DPC)

Preservation Management of Digital Materials
http://www.dpconline.org/graphics/handbook/
Available 2005-02-15

TASI: Advice: Using CD-R and DVD-R for Digital Preservation
http://www.tasi.ac.uk/advice/delivering/cdr-dvdr.html
Available 2005-02-15

 

5.3 Preservation strategies

There are three main technical approaches to digital preservation: technology preservation, technology emulation and data migration. The first two focus on the technology used to access the object, either maintaining the original hardware and software or using cur­rent technology to replicate the original environment. The work on “persistent archives” based on the articulation of the essential char­acteristics of the objects to be preserved may also be of interest.
Migration strategies focus on the maintaining the digital objects in a form that is accessible using current technology. In this scenario, objects are periodically transferred from one technical environment to another, newer one, while as far as possible maintaining the con­tent, context, usability and functionality of the original. Such migra­tions may require the copying of the object from one medium or device to a new medium or device and/or the transformation of the object from one format to a new format. Some migrations may require only a relatively simple format transformation; a migration to a very different environment may involve a complex process with considerable design effort.
Projects should understand the requirements for a migration-based preservation strategy and should develop policies and guidelines to support its implementation.
The capture of metadata is a critical part of a migration-based preservation strategy (see 6.2.3). Metadata is required to support the management of the object and of the migration process, but fur­thermore, migration inevitably leads, at least in the longer term, to some changes in, or losses of, original functionality. Where this is sig­nificant to the interpretation of the object, users will rely on meta-data about the migration process - and about the original object and its transformations - to provide some understanding of the func­tionality provided in the original technological environment.

Guidance:
Preservation Management of Digital Materials Handbook
http://www.dpconline.org/graphics/handbook/
Available 2005-02-15

The State of Digital Preservation: An International Perspective
http://www.tasi.ac.uk/advice/creating/creating.html
Available 2005-02-15

 

 

Copyright Minerva Project 2005-07, last revision 2005-07-11 edited by Minerva Editorial Board.
URL: www.minervaeurope.org/publications/technicalguidelines/storagemanagement.htm