Data archive workflow



The social science data archive step by step

Ekkehard Mochmann (Central Archive for Empirical Social Research, Cologne, FRG)
Paul de Guchteneire (UNESCO, Paris, France)
 


Content:
 

1. Identification of datasets
2. Sources of data
3. Selection criteria
4. Data transfer to the archive
5. Data processing

6. Documentation
7. Storage
8. Information retrieval
9. Dissemination of data

10. Notes


5. Data processing

In order to store incoming datasets in the archive, the data are processed extensively. First of all the data archive has to its new datasets in such a way that the information is fully accessible internally.

5.1 Conversion of internal format

When data is stored on disk or another transport media the information is written in a particular format. Reading the information back on a different machine will often involve a conversion of the storage format towards a standard format used on the machine of the archive.

5.2. Conversion of data structures

The archive has to store the data in such a way that commonly available software can be used for analysing the data. Most statistical packages are designed to work with a rectangular matrix of cases by variables or variables by cases. With the advance of techniques for data ­analysis, more and more complex data structures are used in the social science datasets. When the structure is so complex that dedicated software is needed for accessing the information, one can try to reformat the data or one can include the special programs with the archived data. Standard statistical packages like SAS and SPSS have nowadays ready build­ing facilities for dealing with hierarchical data or relational data. More complex data structures are supported by relational data base systems.

5.3. Cleaning the data

Before a dataset can be stored in the archive for further distribution, the data has to be checked and cleaned.

  • Is the number of records as expected from the file description?

  • Are all variables included?

  • Is the data sorted in the appropriate way?

  • Are there any strange unexpected character codes in the files?


 
Copyright © IFDOnet - All rights reserved - Contact - 11-05-2005