read.any {easyr} | R Documentation |
Read Any File
Description
Flexible read function to handle many types of files. Currently handles CSV, TSV, DBF, RDS, XLS (incl. when formatted as HTML), and XLSX. Also handles common issues like strings being read in as factors (strings are NOT read in as factors by this function, you'd need to convert them later). Author: Bryce Chamberlain. Tech Review: Dominic Dillingham.
Usage
read.any(
filename = NA,
folder = NA,
sheet = 1,
file_type = "",
first_column_name = NA,
header = TRUE,
headers_on_row = NA,
nrows = -1L,
row.names.column = NA,
row.names.remove = TRUE,
make.names = FALSE,
field_name_map = NA,
require_columns = NA,
all_chars = FALSE,
auto_convert_dates = TRUE,
allow_times = FALSE,
check_numbers = TRUE,
nazero = FALSE,
check_logical = TRUE,
stringsAsFactors = FALSE,
na_strings = easyr::nastrings,
na_level = "(Missing)",
ignore_rows_with_na_at = NA,
drop.na.cols = TRUE,
drop.na.rows = TRUE,
fix.dup.column.names = TRUE,
do.trim.sheetname = TRUE,
x = NULL,
isexcel = FALSE,
encoding = "unknown",
verbose = TRUE
)
Arguments
filename |
File path and name for the file to be read in. |
folder |
Folder path to look for the file in. |
sheet |
The sheet to read in. |
file_type |
Specify the file type (CSV, TSV, DBF). If not provided, R will use the file extension to determine the file type. Useful when the file extension doesn't indicate the file type, like .rpt, etc. |
first_column_name |
Define headers location by providing the name of the left-most column. Alternatively, you can choose the row via the [headers_on_row] argument. |
header |
Choose if your file contains headers. |
headers_on_row |
Choose a specific row number to use as headers. Use this when you want to tell read.any exactly where the headers are. |
nrows |
Number of rows to read. Leave blank/NA to read all rows. This only speeds up file reads (CSV, XLSX, etc.), not compressed data that must be read all at once. This is applied BEFORe headers_on_row or first_column_name removes top rows, so it should be greater than those values if headers aren't in the first row. |
row.names.column |
Specify the column (by character name) to use for row names. This drops the columns and lets rows be referenced directly with this id. Must be unique values. |
row.names.remove |
If you move a column to row names, it is removed from the data by default. If you'd like to keep it, set this to FALSE. |
make.names |
Apply make.names function to make column names R-friendly (replaces non-characters with ., starting numbers with x, etc.) |
field_name_map |
Rename fields for consistency. Provide as a named vector where the names are the file's names and the vector values are the output names desired. See examples for how to create this input. |
require_columns |
List of required columns to check for. Calls stop() with helpful message if any aren't found. |
all_chars |
Keep all column types as characters. This makes using bind_rows easer, then you can use atype() later to set types. |
auto_convert_dates |
Identify date fields and automatically convert them to dates |
allow_times |
imes are not allowed in reading data in to facilitate easy binding. If you need times though, set this to TRUE. |
check_numbers |
Identfy numbers formatted as characters and convert them as such. |
nazero |
Convert NAs in numeric columns to 0. |
check_logical |
Identfy logical columns formatted as characters (Yes/No, etc) or numbers (0,1) and convert them as such. |
stringsAsFactors |
Convert characters to factors to increase processing speed and reduce file size. |
na_strings |
Strings to treat like NA. By default we use the easyr NA strings. |
na_level |
dplyr doesn't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip. |
ignore_rows_with_na_at |
Vector or value, numeric or character, identifying column(s) that require a value. read.any will remove these rows after colname swaps and read, before type conversion. Especially helpful for removing things like page numbers at the bottom of an excel report that break type discovery. Suggest using the claim number column here. |
drop.na.cols |
Drop columns with only NA values. |
drop.na.rows |
Drop rows with only NA values. |
fix.dup.column.names |
Adds 'DUPLICATE #' to duplicated column names to avoid issues with multiple columns having the same name. |
do.trim.sheetname |
read.any will trim sheet names to get better matches. This will cause an error if the actual sheet name has spaces on the left or right side. Disable this functionality here. |
x |
If you want to use read.any functionality on an existing data frame, pass it with this argument. |
isexcel |
If you want to use read.any functionality on an existing data frame, you can tell read.any that this data came from excel using isexcel manually. This comes in handy when excel-integer date conversions are necessary. |
encoding |
Encoding passed to fread and read.csv. |
verbose |
Print helpful information via cat. |
Value
Data frame with the data that was read.
Examples
folder = system.file('extdata', package = 'easyr')
read.any('date-time.csv', folder = folder)
# if dates are being converted incorrectly, disable date conversion:
read.any('date-time.csv', folder = folder, auto_convert_dates = FALSE)
# to handle type conversions manually:
read.any('date-time.csv', folder = folder, all_chars = TRUE)