lsa.convert.data {RALSA} | R Documentation |
Convert Large-Scale Assessments' Datasets to .RData Format
Description
lsa.convert.data
converts datasets from large-scale assessments from their original formats (SPSS or ASCII text) into .RData
files. print
prints the properties of an lsa.data
objects on screen. lsa.select.countries.PISA
lets selecting PISA data from specific countries for analysis.
Usage
lsa.convert.data(
inp.folder,
PISApre15 = FALSE,
ISO,
missing.to.NA = FALSE,
out.folder
)
## S3 method for class 'lsa.data'
print(x, col.nums, ...)
lsa.select.countries.PISA(data.file, data.object, cnt.names, output.file)
Arguments
inp.folder |
The folder containing the IEA-like SPSS data files or ASCII text files and
|
PISApre15 |
When converting PISA files, set to |
ISO |
Vector containing character ISO codes of the countries' data files to
convert (e.g. |
missing.to.NA |
Should the user-defined missing values be recoded to |
out.folder |
Path to the folder where the converted files will be stored. If omitted,
same as the |
x |
( |
col.nums |
( |
... |
( |
data.file |
( |
data.object |
( |
cnt.names |
( |
output.file |
( |
Details
The lsa.convert.data
function converts the originally provided data files into .RData
sets. RALSA adds its own method for printing lsa.data
objects on screen. The lsa.select.countries.PISA
is a utility function that allows the user to select countries of interest from a converted PISA data file (or PISA object residing in memory) and remove the rest of the countries' data. This is useful when the user does not want to analyze all countries data in a PISA file.
-
lsa.convert.data
IEA studies, as well as OECD TALIS and some conducted by other organizations, provide their data in SPSS
.sav
format with same or very similar structure: one file per country and type of respondent (e.g. school principal, student, teacher, etc.) per population. For IEA studies and OECD TALIS use theISO
argument to specify the countries' three-letter ISO codes whose data is to be converted. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. To convert the files from all countries in the downloaded data from IEA studies and OECD TALIS, simply omit theISO
argument. Cycles of OECD PISA prior to 2015, on the other hand, do not provide SPSS.sav
or other binary files, but ASCII text files, accompanied with SPSS syntax (.sps
) files that are used to import the text files into SPSS. These files are per each type of respondent containing all countries' data. Thelsa.convert.data
function converts the data from either source assuring that the structure of the output.RData
files is the same, although the structure of the input files is different (SPSS binary files vs. ASCII text files plus import.sps
files). The data from PISA 2015 and later, on the other hand, is provided in SPSS format (all countries in one file per type of respondent). Thus, thePISApre15
argument needs to be specified asTRUE
when converting data sets from PISA prior to its 2015 cycle. The default for thePISApre15
argument isFALSE
which means that the function expects to find IEA-like SPSS binary files per country and type of respondent in the directory ininp.folder
or OECD PISA 2015 (or later) SPSS.sav
files. IfPISApre15 = TRUE
and country codes are provided toISO
, they will be ignored because PISA files contain data from all countries together.The files to be converted must be in a folder on their own, from a single study, single cycle and single population. In addition, if there are more than one file types per study, cycle and population, these also must be in different folders. For example, in TIMSS 2019 the grade 8 data files are main (end with "m7", electronic version of the paper administered items), bridge (end with "b7", paper administration with trend items for countries participating in previous TIMSS cycles) and Problem Solving and Inquiry (PSI) tasks (end with "z7", electronic administration only, optional for countries). These different types must be in separate folders. In case of OECD PISA prior 2015, the folder must contain both the ASCII text files and the SPSS
.sps
import syntax files. If the folder contains data sets from more than one study or cycle, the operation will break with error messages.If the path for the
inp.folder
argument is not specified, the function will search for files in the working directory (i.e. as returned bygetwd()
). If folder path for the theout.folder
is not specified, it will take the one from theinp.folder
and the files will be stored there. If both theinp.folder
andout.folder
arguments are missing, the directory fromgetwd()
will be used to search, convert and store files.If
missing.to.NA
is set toTRUE
, all user-defined missing values from the SPSS will be imported asNA
which isR
's only kind of missing value. This will be the most often case when analyzing these data since the reason why the response is missing will be irrelevant most of the times. However, if it is needed to know why the reasons for missing responses, as when analyzing achievement items (i.e. not administered vs. omitted or not reached), the argument shall be set toFALSE
(default for this argument) which will convert all user-defined missing values as valid ones. -
print
RALSA uses its own method for printing objects of class
lsa.data
on screen. Passing just the object name to the console will print summarized information about the study's data and the first six columns of the dataset (see the Value section). Ifcol.nums
specifies which columns from the dataset shall be included in the output (see examples). -
lsa.select.countries.PISA
lsa.select.countries.PISA
lets the user to take a PISA dataset, either a converted file orlsa.data
object in the memory and reduce the number of countries in it by passing the names of the countries which need to be kept as a character vector to thecnt.names
argument. If full path (including the file name) to the resulting file is specified in theoutput.file
argument, it will be written on disk. If not, the data will be written to anlsa.object
in memory with the same name as the input file. See the examples.
Value
-
lsa.convert.data
.RData
data files, containing an object with classlsa.data
, an extension of thedata.table
class. Thedata.table
object has the same name as the.RData
file it is saved in. The object has additional attributes: study name (study
), study cycle (cycle
), and respondent file type (file.type
). Each variable has its own additional attributes: its own label attached to it, if it existed in the source SPSS file. If themissing.to.NA
was set toTRUE
, each variable has an attributemissings
, containing the user-defined missing values from the SPSS files.The object in the
.RData
file is keyed on the country ID variable. -
print
Prints the information of an
lsa.data
object (study, cycle, respondent type, number of countries, key – country ID, and if the variables have user-defined missing values) and a preview of the data. The default preview (when nocol.nums
) are specified will include the first six columns. -
lsa.select.countries.PISA
Writes a file containing an
lsa.object
with the data for the countries passed to thecnt.names
argument, if theoutput.file
argument is used. If theoutput.file
argument is not used, thelsa.object
will be written to the memory with the same name as the file name ininp.file
.
Note
When downloading the .sps
files (ASCII text and control .sps
) for OECD PISA files prior to the 2015 cycle (say http://www.oecd.org/pisa/pisaproducts/pisa2009database-downloadabledata.htm), save them without changing their names and without modifying the file contents. The function will look for the files as they were named originally.
Different studies and cycles define the "I don't know" (or similar) category of discrete variables in different ways - either as a valid or missing value. The lsa.convert.data
function sets all such or similar codes to missing value. If this has to be changed, the lsa.recode.vars
can be used as well (also see lsa.vars.dict
).
References
Foy, P. (Ed.). (2018). PIRLS 2016 User Guide for the International Database. TIMSS & PIRLS International Study Center.
See Also
lsa.merge.data
, lsa.vars.dict
, lsa.recode.vars
Examples
# Convert all IEA-like SPSS files in the working directory, setting all user-defined missing
# values to \code{NA}
## Not run:
lsa.convert.data(missing.to.NA = TRUE)
## End(Not run)
# Convert IEA TIMSS 2011 grade 8 data from Australia and Slovenia, keeping all user-defined
# missing values as valid ones specifying custom input and output directories
## Not run:
lsa.convert.data(inp.folder = "C:/TIMSS_2011_G8", ISO = c("aus", "svn"), missing.to.NA = FALSE,
out.folder = "C:/Data")
## End(Not run)
# Convert OECD PISA 2009 files converting all user-defined missing values to \code{NA}
# using custom input and output directories
## Not run:
lsa.convert.data(inp.folder = "/media/PISA_2009", PISApre15 = TRUE, missing.to.NA = TRUE,
out.folder = "/tmp")
## End(Not run)
# Print 20th to 25th column in PISA 2018 student questionnaire dataset loaded into memory
## Not run:
print(x = cy07_msu_stu_qqq, col.nums = 20:25)
## End(Not run)
# Select data from Albania and Slovenia from PISA 2018 student questionnaire dataset
# and save it under the same file name in a different folder
## Not run:
lsa.select.countries.PISA(data.file = "C:/PISA/cy07_msu_stu_qqq.RData",
cnt.names = c("Albania", "Slovenia"),
output.file = "C:/PISA/Reduced/cy07_msu_stu_qqq.RData")
## End(Not run)