Data archive workflow



The social science data archive step by step

Ekkehard Mochmann (Central Archive for Empirical Social Research, Cologne, FRG)
Paul de Guchteneire (UNESCO, Paris, France)
 


The general function of a social science data ­archive is to make machine-readable data available to scientists. Data is acquired, manipulated, documented, stored and finally distributed to the scientific community for further analysis. A large number of steps is involved in this process. Here various aspects of the practical work in the social science data archive are listed in a more or less chronological sequence following the flow of information through the data archive.
 
Content:
 

1. Identification of datasets
2. Sources of data
3. Selection criteria
4. Data transfer to the archive
5. Data processing

6. Documentation
7. Storage
8. Information retrieval
9. Dissemination of data

10. Notes


1. Identification of datasets

Data archives have a function as national repositories for datasets. They have presented themselves as such and they are seen as such. This implies that the archive has to keep a good overview of what is available and what is needed.

A number of information channels is used to get the information on the availability of data. In the present days, any archive working on a national level will have to select only a limited number of datasets for the archive from the many datasets that are produced in the country. The annual growth of data archives varies between countries; the smaller archives grow with some 30­50 datasets per year, the larger ones have an annual growth of more than 300. Still the numbers represent only a small proportion of the potentially available datasets and a selection has to be made based on the various information channels:

1.1. On­going research registers

On­going research registers are usually compiled on the basis of mail surveys. Standard forms are sent out to researchers and research institutes. Information is gathered on research topics, type of research, time schedules and methodology. For the archive's acquisition it is very useful if the on­going research register includes information on whether machine-­readable datasets are produced as a result of the registered researched projects.

1.2. Research reports

Many different forms of reporting are used to report on empirical research. Next to officially published books there is a vast amount of so-­called grey literature which is not distributed via the official publishing channels. Some countries have a specialised grey literature library on social sciences. This library forms an effective information channel for selecting datasets. The research report should normally give a good indication on whether the dataset is interesting for acquisition. Grey literature is becoming increasingly important for keeping track of empirical research. Desktop publishing provides researchers with a relatively fast mode for bringing out research results. Desktop publishing will often result in research reports outside the official channels.

1.3. Periodicals, newsletters

Unlike other disciplines the social sciences generally have no key journals that cover the most important research. On the contrary, important contributions may occur in journals of other disciplines, e.g. medical journals. This implies for the data­ archive that many periodicals have to be screened for reports on research that produced data which might be archived.

1.4. Direct contacts

Potential producers of datasets are asked to give overviews of the data they produce. Especially with commercial research institutes, when no traditional publications on the research may be expected, direct contacts may be the only source of information. Hints from reports on the Internet or coming in via e-mail facilitate the indentification of new data sources.

A problem with respect to these institutes is often that under their professional code they should not reveal the contract research projects on which they are working. One has to get the cooperation of the data­ collecting institute to get in touch with the "owners" of the data.

1.5. Inquiry forms

The final step in identifying datasets is sending out inquiry forms to get detailed information on the possible archivability of specific data. Who is the owner of the data, when will the data be available, and similar questions should be answered with this form.


 
Copyright © IFDOnet - All rights reserved - Contact - 11-05-2005