R: Merge study data from different countries and/or respondents

lsa.merge.data {RALSA}

R Documentation

Merge study data from different countries and/or respondents

Description

lsa.merge.data combines data from different countries and/or different respondents (e.g. students and teachers, or students and schools).

Usage

lsa.merge.data(inp.folder, file.types, ISO, out.file)

Arguments

`inp.folder`	Folder containing the data sets. The data sets must be `.RData`, produced by `lsa.convert.data`. See details.
`file.types`	What file types (i.e. respondents) shall be merged? See details.
`ISO`	Vector containing character ISO codes of the countries' data files to include in the merged file. See details.
`out.file`	Full path to the file the data shall be stored in. The object stored in the file will have the same name. See details.

Details

The function merges files from studies where the files are per country and respondent type (e.g. student, school, teacher). That is, all studies except PISA.

The inp.folder specifies the path to the folder containing the .RData files produced by lsa.convert.data. The folder must contain only files for a single study, single cycle and single population (e.g. TIMSS 2015 grade 4 or TIMSS 2015 grade 8, but not both), or mode of administration (e.g. either PIRLS 2016 or ePIRLS 2016, but not both; or TIMSS 2019 or TIMSS 2019 Bridge, but not both). All files in the input folder must be exported with the same option (TRUE or FALSE) of the missing.to.NA argument of the lsa.convert.data function. If input folder is not provided to the argument, the working folder (getwd()) will be used.

The file.types is a list of the respondent types as component names and their variables as elements to be merged. The file type names are three-character codes, the first three characters of the corresponding file names. The elements are vectors of upper case variable names, NULL takes all variables in the corresponding file. For example, in TIMSS asg will merge only student-level data from grade 4, c(asg, atg) will merge the student-level and teacher-level data from grade 4, c(bsg, btm) will merge student-level and mathematics teacher-level data from grade 8. If a merge is not possible by the study design, the function will stop with an error. See the examples.

The ISO is a character vector specifying the countries whose data shall be merged. The elements of the vector are the fourth, fifth and sixth characters in the file names. For example, c("aus", "swe", "svn") will merge the data from Australia, Sweden and Slovenia for the file types specified in file.types. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. If file for specific country does not exist in the inp.folder, a warning will be issued. If the ISO argument is missing, the files for all countries in the folder will be merged for the specified file.types.

The out.file must contain full path (including the .RData extension, if missing, it will be added) to the output file (i.e. the file containing merged data). The file contains object with the same name and has a class extension lsa.data. It has additional attribute file.type showing data from which respondents is available after the merging has been done. For example, merging the student-level data with teacher-level data in TIMSS grade 4 will assign "std.bckg.tch.bckg" to this attribute. The object has two additional attributes: study name (study) and study cycle (cycle). The object in the .RData file is keyed on the country ID variable. If output folder is not provided, the merged file will be saved in the working folder (getwd()) as merged_data.RData.

Value

.RData data file containing an object with class lsa.data, an extension of the data.table class. The data.table object has the same name as the .RData file it is saved in. The object contains the data from different respondents and/or countries merged and has additional attributes: study name (study), study cycle (cycle), and respondent file type (file.type). Each variable has its own additional attributes: its own label attached to it, if it existed in the source SPSS file. If the missing.to.NA in the source file was set to TRUE, each variable has an attribute missings, containing the user-defined missing values.

References

Foy, P. (Ed.). (2018). PIRLS 2016 User Guide for the International Database. TIMSS & PIRLS International Study Center.

Examples


# Merge TIMSS 2015 grade 4 student and teacher variables for Australia, Chinese Taipei and
# Slovenia taking all variables in both files
## Not run: 
lsa.merge.data(inp.folder = "C:/Data", file.types = list(asg = NULL, atg = NULL),
ISO = c("aus", "twn", "svnn"), out.file = "C:/Merged/Merged.RData")

## End(Not run)

# Same as the above, taking just few variables from each file
## Not run: 
lsa.merge.data(inp.folder = "C:/Data",
file.types = list(asg = c("ASBG01", "ASBG02A", "ASBG02B"),
atg = c("ATBG01", "ATBG02", "ATBG03")), ISO = c("aus", "twn", "svnn"),
out.file = "C:/Merged/Merged.RData")

## End(Not run)

[Package RALSA version 1.4.7 Index]