Data archive workflow



The social science data archive step by step

Ekkehard Mochmann (Central Archive for Empirical Social Research, Cologne, FRG)
Paul de Guchteneire (UNESCO, Paris, France)
 


Content:
 

1. Identification of datasets
2. Sources of data
3. Selection criteria
4. Data transfer to the archive
5. Data processing

6. Documentation
7. Storage
8. Information retrieval
9. Dissemination of data

10. Notes


7. Storage

One dataset from the archive's holdings may be available in the archive in several different formats and on several different storage media. Datasets that are often used, for instance during the documentation phase, will reside on the magnetic disk systems of the computer. For long term storage, datasets frequently are stored on magnetic tape casettes. Unfortunately these even these casettes are not a reliable medium for long term storage. Computer tapes may lose their information within a couple of years due to the deterioration of the magnetic carriers or due to external physical influences. The tape can break or it can be overwritten by accident. For these reasons there should be at least one back­up copy of every dataset. The magnetic tape casettes that are used should be of the best quality, "archive quality" and every casette has to be replaced within a couple of years. The maximum lifetime for a casette with valuable information should not be set higher than three years, provided it is stored in a dry and not to warm climate. If sufficient back­ups are available, then the tapes may be kept some five or six years. A yearly computer check can be used to monitor the condition of the tapes.

Laser disks are now available for storage of computer information. The technique is similar to the one used with the storage of digital audio information on compact disks. Laser disks for data storage are available in two formats: CD­ROM and WORM. CD­ROM stands for Compact Disk Read Only Memory. A CD-ROM is used whenever the disks have to be available in a large number of copies. A CD­ROM contains between 600 megabyte and 2.5 gigabyte of data, the data has to be written on specialised machines, the CD burners which by now are available at reasonable prices. WORM stands for Write Once Read Many times. A WORM disk contains some 400 mb of data (systems with larger capacities are becoming available). The information can be written onto the disk by the user.

Both CD­ROM and WORM are not yet standardised. It is also not yet clear how long the disks will hold the information. One may expect though that disks with this technology can keep information much longer than tapes. Estimates range from ten to fifty years.


 
Copyright © IFDOnet - All rights reserved - Contact - 11-05-2005