import {rio} | R Documentation |
Import
Description
Read in a data.frame from a file. Exceptions to this rule are Rdata, RDS, and JSON input file formats, which return the originally saved object without changing its class.
Usage
import(
file,
format,
setclass = getOption("rio.import.class", "data.frame"),
which,
...
)
Arguments
file |
A character string naming a file, URL, or single-file (can be Gzip or Bzip2 compressed), .zip or .tar archive. |
format |
An optional character string code of file format, which can be used to override the format inferred from |
setclass |
An optional character vector specifying one or more classes
to set on the import. By default, the return object is always a
“data.frame”. Allowed values include “tbl_df”, “tbl”, or
“tibble” (if using tibble), “arrow”, “arrow_table” (if using arrow table; the suggested package |
which |
This argument is used to control import from multi-object files; as a rule |
... |
Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below. |
Details
This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format
is specified).
import
supports the following file formats:
Comma-separated data (.csv), using
data.table::fread()
Pipe-separated data (.psv), using
data.table::fread()
Tab-separated data (.tsv), using
data.table::fread()
SAS (.sas7bdat), using
haven::read_sas()
SAS XPORT (.xpt), using
haven::read_xpt()
SPSS (.sav), using
haven::read_sav()
SPSS compressed (.zsav), using
haven::read_sav()
.Stata (.dta), using
haven::read_dta()
SPSS Portable Files (.por), using
haven::read_por()
.Excel (.xls and .xlsx), using
readxl::read_xlsx()
orreadxl::read_xls()
. Usewhich
to specify a sheet number.R syntax object (.R), using
base::dget()
, seetrust
below.Saved R objects (.RData,.rda), using
base::load()
for single-object .Rdata files. Usewhich
to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame), seetrust
below.Serialized R objects (.rds), using
base::readRDS()
. This can be any R object (not just a data frame), seetrust
below.Serialized R objects (.qs), using
qs::qread()
, which is significantly faster than .rds. This can be any R object (not just a data frame).Epiinfo (.rec), using
foreign::read.epiinfo()
Minitab (.mtp), using
foreign::read.mtp()
Systat (.syd), using
foreign::read.systat()
"XBASE" database files (.dbf), using
foreign::read.dbf()
Weka Attribute-Relation File Format (.arff), using
foreign::read.arff()
Data Interchange Format (.dif), using
utils::read.DIF()
Fortran data (no recognized extension), using
utils::read.fortran()
Fixed-width format data (.fwf), using a faster version of
utils::read.fwf()
that requires awidths
argument and by default in rio hasstringsAsFactors = FALSE
-
CSVY (CSV with a YAML metadata header) using
data.table::fread()
. Apache Arrow Parquet (.parquet), using
nanoparquet::read_parquet()
Feather R/Python interchange format (.feather), using
arrow::read_feather()
Fast storage (.fst), using
fst::read.fst()
JSON (.json), using
jsonlite::fromJSON()
Matlab (.mat), using
rmatio::read.mat()
EViews (.wf1), using
hexView::readEViews()
OpenDocument Spreadsheet (.ods, .fods), using
readODS::read_ods()
orreadODS::read_fods()
. Usewhich
to specify a sheet number.Single-table HTML documents (.html), using
xml2::read_html()
. There is no standard HTML table and we have only tested this with HTML tables exported with this package. HTML tables will only be read correctly if the HTML file can be converted to a list viaxml2::as_list()
. This import feature is not robust, especially for HTML tables in the wild. Please use a proper web scraping framework, e.g.rvest
.Shallow XML documents (.xml), using
xml2::read_xml()
. The data structure will only be read correctly if the XML file can be converted to a list viaxml2::as_list()
.YAML (.yml), using
yaml::yaml.load()
Clipboard import, using
utils::read.table()
withrow.names = FALSE
Google Sheets, as Comma-separated data (.csv)
GraphPad Prism (.pzfx) using
pzfx::read_pzfx()
import
attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg
is stored in attributes(mtcars$mpg)
rather than attributes(mtcars)
). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)
), see gather_attrs()
.
After importing metadata-rich file formats (e.g., from Stata or SPSS), it may be helpful to recode labelled variables to character or factor using characterize()
or factorize()
respectively.
Value
A data frame. If setclass
is used, this data frame may have additional class attribute values, such as “tibble” or “data.table”.
Trust
For serialization formats (.R, .RDS, and .RData), please note that you should only load these files from trusted sources. It is because these formats are not necessarily for storing rectangular data and can also be used to store many things, e.g. code. Importing these files could lead to arbitary code execution. Please read the security principles by the R Project (Plummer, 2024). When importing these files via rio
, you should affirm that you trust these files, i.e. trust = TRUE
. See example below. If this affirmation is missing, the current version assumes trust
to be true for backward compatibility and a deprecation notice will be printed. In the next major release (2.0.0), you must explicitly affirm your trust when importing these files.
Which
For compressed archives (zip and tar, where a compressed file can contain multiple files), it is possible to come to a situation where the parameter which
is used twice to indicate two different concepts. For example, it is unclear for .xlsx.zip
whether which
refers to the selection of an exact file in the archive or the selection of an exact sheet in the decompressed Excel file. In these cases, rio
assumes that which
is only used for the selection of file. After the selection of file with which
, rio
will return the first item, e.g. the first sheet.
Please note, however, .gz
and .bz2
(e.g. .xlsx.gz
) are compressed, but not archive format. In those cases, which
is used the same way as the non-compressed format, e.g. selection of sheet for Excel.
Note
For csv and txt files with row names exported from export()
, it may be helpful to specify row.names
as the column of the table which contain row names. See example below.
References
Plummer, M (2024). Statement on CVE-2024-27322. https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/
See Also
import_list()
, characterize()
, gather_attrs()
, export()
, convert()
Examples
## For demo, a temp. file path is created with the file extension .csv
csv_file <- tempfile(fileext = ".csv")
## .xlsx
xlsx_file <- tempfile(fileext = ".xlsx")
## create CSV to import
export(iris, csv_file)
## specify `format` to override default format: see export()
export(iris, xlsx_file, format = "csv")
## basic
import(csv_file)
## You can certainly import your data with the file name, which is not a variable:
## import("starwars.csv"); import("mtcars.xlsx")
## Override the default format
## import(xlsx_file) # Error, it is actually not an Excel file
import(xlsx_file, format = "csv")
## import CSV as a `data.table`
import(csv_file, setclass = "data.table")
## import CSV as a tibble (or "tbl_df")
import(csv_file, setclass = "tbl_df")
## pass arguments to underlying import function
## data.table::fread is the underlying import function and `nrows` is its argument
import(csv_file, nrows = 20)
## data.table::fread has an argument `data.table` to set the class explicitely to data.table. The
## argument setclass, however, takes precedents over such undocumented features.
class(import(csv_file, setclass = "tibble", data.table = TRUE))
## the default import class can be set with options(rio.import.class = "data.table")
## options(rio.import.class = "tibble"), or options(rio.import.class = "arrow")
## Security
rds_file <- tempfile(fileext = ".rds")
export(iris, rds_file)
## You should only import serialized formats from trusted sources
## In this case, you can trust it because it's generated by you.
import(rds_file, trust = TRUE)