5.
Data processing
In order to store incoming datasets in the archive, the data are
processed extensively. First of all the data archive has to its
new datasets in such a way that the information is fully
accessible internally.
-
5.1 Conversion of internal format
When data is stored on disk or another transport media the
information is written in a particular format. Reading the information
back on a different machine will often involve a conversion of the
storage format towards a standard format used on the machine of
the archive.
-
5.2. Conversion of data structures
The archive has to store the data in such a way that commonly
available software can be used for analysing the data. Most statistical
packages are designed to work with a rectangular matrix of cases
by variables or variables by cases.
With the advance of techniques for data analysis, more and more
complex data structures are used in the social science datasets.
When the
structure is so complex that dedicated software is needed for
accessing the information, one can try to reformat the data or one
can include the special programs with the archived data.
Standard statistical packages like SAS and SPSS have nowadays
ready building facilities for dealing with hierarchical data or
relational data.
More complex data
structures are supported by relational data base systems.
-
5.3. Cleaning the data
Before a dataset can be stored in the archive for further
distribution, the data has to be checked and cleaned.
-
Is the number
of records
as expected from the file description?
-
Are all
variables included?
-
Is the data sorted in the appropriate way?
-
Are there
any strange unexpected character codes in the files?
 |