Data archive workflow



The social science data archive step by step

Ekkehard Mochmann (Central Archive for Empirical Social Research, Cologne, FRG)
Paul de Guchteneire (UNESCO, Paris, France)
 


Content:
 

1. Identification of datasets
2. Sources of data
3. Selection criteria
4. Data transfer to the archive
5. Data processing

6. Documentation
7. Storage
8. Information retrieval
9. Dissemination of data

10. Notes


4. Data transfer to the archive

4.1. Administrative transfer

4.1.1. Restrictions

The most important thing that needs to be settled in the administrative transfer of a dataset towards the archive is what kind of restrictions should be imposed on future usage of the data.

  • The data­archive offers a set of possible clauses like:

  • No restrictions

  • Free for academic public research

  • Publication on the basis of data depends on donor’s consent

  • Accessible only after written permission of donor

  • Usage and publications to be brought to the attention of donor

Ad hoc clauses are sometimes added depending on the wishes of the donor. Whenever severe restrictions are imposed on the accessibility of the data, a time limit for the restrictions should be negotiated.

4.1.2. Ownership, copyright

The matter of who owns the data after transfer to the archive should be clarified to a certain level. Speaking in terms of ownership of a machine-­readablc dataset is often inadequate. A copy of that dataset on another tape will be indistinguishable from the original but strict ownership will not be applicable to that copy. Often the concept of copyright instead of ownership is easier to use. The original researcher who created the dataset normally owns a copyright to the dataset. This copyright can be handled similar to the copyright on books or works of art. The data ­archive may acquire its own copyright to a dataset. Storing a dataset in the archive will almost always involve changes to the original materials. Data is reformatted, documentation is added, variables are recoded to standards etc. This process will add a claim on copyright to the dataset for the part of what the archive contributcd. So on any archived dataset both donor and the archive will have a claim.

4.2. Technical transfer

Many different media are used for the actual transfer of machine-­readable data. A data ­archive needs very flexible machinery to cope with the variety of media used. During the past 40 years the common media for data transfer changed from punched cards, via magnetic tapes or disks to CD-ROM. Nowadays data transfer is facilitated over the Internet. A major problem in the past has been the non­standardised format of the way information was written onto electronic media. Today computer departments are able to produce fairly common standard tapes. It is a fact though that no standard in this field lasts longer than a couple of years. Therefore changing media and recording formats earn particular attention for long term preservation of electronic records.

Disks from micro­computers are still used to transfer datasets to the data ­archive, if they are not very large. More comprehensive datasets are increasingly stored on CD-ROM.

A by now more routinely used medium to carry machine-­readable data is the computer network. The Internet connects data providers and users worldwide. With a simple command large datasets can be sent from one computer to another via such a network. The nice thing about networks is that in interconnecting the various types and branches of computers, most technical incompatibility problems are solved in the design of the network so that the user does not have to deal with them.


 
Copyright © IFDOnet - All rights reserved - Contact - 11-05-2005