convert {DDIwR} | R Documentation |
Converts a dataset from one statistical software to another
Description
This function converts (or transfers) between R, Stata, SPSS, SAS, Excel and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and facilitates a consistent conversion of the missing values.
Usage
convert(
from,
to = NULL,
declared = TRUE,
chartonum = FALSE,
recode = TRUE,
encoding = "UTF-8",
csv = NULL,
...
)
Arguments
from |
A path to a file, or a data.frame object |
to |
Character, the name of a software package or a path to a specific file |
declared |
Logical, return the resulting dataset as a declared object |
chartonum |
Logical, recode character categorical variables to numerical categorical variables |
recode |
Logical, recode missing values |
encoding |
The character encoding used to read a file |
csv |
Path to the CSV file, if not embedded in XML file containing the DDI Codebook |
... |
Additional parameters passed to other functions, see the Details section |
Details
When the argument to
specifies a certain statistical package
("R"
, "Stata"
, "SPSS"
, "SAS"
, "XPT"
) or "Excel"
, the name of the
destination file will be identical to the one in the argument from
,
with an automatically added software specific extension.
SPSS portable file (with the extension ".por"
) can only be read, but not
written.
The argument to
can also be specified as a path to a specific file,
in which case the software package is determined from its file extension.
The following extentions are currently recognized: .xml
for DDI,
.rds
for R, .dta
for Stata, .sav
for SPSS, .xpt
for SAS, and
.xlsx
for Excel.
Additional parameters can be specified via the three dots argument
...
, that are passed to the respective functions from packages
haven and readxl. For instance the function
write_dta()
has an additional argument called
version
when writing a Stata file.
The most important argument to consider is called user_na
, part of
the function read_sav()
. Defaulted to FALSE
in
package haven, in package DDIwR it is used as
having the value of TRUE
, and it can be deactivated by explicitly
specifying user_na = FALSE
in function convert()
.
The same three dots argument is used to pass additional parameters to other
functions in this package, for instance exportDDI()
when converting to
a DDI file. One of its argument embed
(activated by default) can be
used to control embedding the data in the XML file. Deactivating it will
create a CSV file in the same directory, using the same file name as the
XML file.
When converting from DDI, if the dataset is not embedded in the XML file, the
CSV file is expected to be found in the same directory as the DDI Codebook,
and it should have the same file name as the XML file. Alternatively, the
path to the CSV file can be provided via the csv
argument. Additional
formal parameters of the function read.csv()
can
be passed via the same three dots ...
argument.
The DDI .xml file generates unique IDs for all variables, if not already present in the attributes. These IDs are useful for newer versions of the DDI Codebook, for referencing purposes.
The argument chartonum
signals recoding character categorical
variables, and employs the function recodeCharcat()
.
This only makes sense when recoding to Stata, which does not allow allocating
labels for anything but integer variables.
If the argument to
is left to NULL
, the data is (invisibly) returned
to the R enviroment. Conversion to R, either in the working space or as
a data file, will result (by default) in a data frame containing declared
labelled variables, as defined in package declared.
The current version reads and creates DDI Codebook version 2.6, with future
versions to extend the functionality for DDI Lifecycle versions 3.x and link
to the future package DDI4R for the UML model based version 4. It
extends the standard DDI Codebook by offering the possibility to embed a
serialized version of the R dataset into the XML file containing the
Codebook, within a notes
child of the fileDscr
component. This type of
generated codebook is unique to this package and automatically detected when
converting to another statistical software. This will likely be replaced with
a time insensitive text version.
Converting to SAS is experimental, and it relies on the same package
haven that uses the ReadStat C library. The safest way to
convert, which at the same time consistently converts the missing values, is
to export the data to a CSV file and create a setup file produced by function
setupfile()
and run the commands manually.
Converting data from SAS is possible, however reading the metadata is also
experimental (the current version of haven only partially imports the
metadata). Either specify the path to the catalog file using the argument
catalog_file
from the function read_sas()
,
or have the catalog file in the same directory as the data set, with the same
file name and the extension .sas7bcat
The argument recode
controls how missing values are treated. If the
input file has SPSS like numeric codes, they will be recoded to extended
(a-z) missing types when converting to Stata or SAS. If the input has Stata
like extended codes, they will be recoded to SPSS like numeric codes.
The character encoding
is usually passed to the corresponding functions
from package haven. It can be set to NULL
to reset at the
default in that package.
Converting to SPSS works with numerical and character labelled vectors, with or without labels. Date/Time variables are partially supported by package haven: either having such a variable with no labels and missing values, or if labels and missing values are declared the variable is automatically coerced to numeric, and users may have to make the proper settings in SPSS.
Value
An invisible R data frame, when the argument to
is NULL.
Author(s)
Adrian Dusa
References
DDI - Data Documentation Initiative, see the DDI Alliance website.
See Also
setupfile
,
getMetadata
,
declared
,
labelled
Examples
## Not run:
# Assuming an SPSS file called test.sav is located in the working directory
# The following command imports the file into the R environment:
test <- convert("test.sav")
# The following command will extract the metadata in a DDI Codebook and
# produce a test.xml file in the same directory
convert("test.sav", to = "DDI")
# The data may be saved separately from the DDI file, using:
convert("test.sav", to = "DDI", embed = FALSE)
# To produce a Stata file:
convert("test.sav", to = "Stata")
# To produce an R file:
convert("test.sav", to = "R")
# To produce an Excel file:
convert("test.sav", to = "Excel")
## End(Not run)