read.any {easyr}R Documentation

Read Any File

Description

Flexible read function to handle many types of files. Currently handles CSV, TSV, DBF, RDS, XLS (incl. when formatted as HTML), and XLSX. Also handles common issues like strings being read in as factors (strings are NOT read in as factors by this function, you'd need to convert them later). Author: Bryce Chamberlain. Tech Review: Dominic Dillingham.

Usage

read.any(
  filename = NA,
  folder = NA,
  sheet = 1,
  file_type = "",
  first_column_name = NA,
  header = TRUE,
  headers_on_row = NA,
  nrows = -1L,
  row.names.column = NA,
  row.names.remove = TRUE,
  make.names = FALSE,
  field_name_map = NA,
  require_columns = NA,
  all_chars = FALSE,
  auto_convert_dates = TRUE,
  allow_times = FALSE,
  check_numbers = TRUE,
  nazero = FALSE,
  check_logical = TRUE,
  stringsAsFactors = FALSE,
  na_strings = easyr::nastrings,
  na_level = "(Missing)",
  ignore_rows_with_na_at = NA,
  drop.na.cols = TRUE,
  drop.na.rows = TRUE,
  fix.dup.column.names = TRUE,
  do.trim.sheetname = TRUE,
  x = NULL,
  isexcel = FALSE,
  encoding = "unknown",
  verbose = TRUE
)

Arguments

filename

File path and name for the file to be read in.

folder

Folder path to look for the file in.

sheet

The sheet to read in.

file_type

Specify the file type (CSV, TSV, DBF). If not provided, R will use the file extension to determine the file type. Useful when the file extension doesn't indicate the file type, like .rpt, etc.

first_column_name

Define headers location by providing the name of the left-most column. Alternatively, you can choose the row via the [headers_on_row] argument.

header

Choose if your file contains headers.

headers_on_row

Choose a specific row number to use as headers. Use this when you want to tell read.any exactly where the headers are.

nrows

Number of rows to read. Leave blank/NA to read all rows. This only speeds up file reads (CSV, XLSX, etc.), not compressed data that must be read all at once. This is applied BEFORe headers_on_row or first_column_name removes top rows, so it should be greater than those values if headers aren't in the first row.

row.names.column

Specify the column (by character name) to use for row names. This drops the columns and lets rows be referenced directly with this id. Must be unique values.

row.names.remove

If you move a column to row names, it is removed from the data by default. If you'd like to keep it, set this to FALSE.

make.names

Apply make.names function to make column names R-friendly (replaces non-characters with ., starting numbers with x, etc.)

field_name_map

Rename fields for consistency. Provide as a named vector where the names are the file's names and the vector values are the output names desired. See examples for how to create this input.

require_columns

List of required columns to check for. Calls stop() with helpful message if any aren't found.

all_chars

Keep all column types as characters. This makes using bind_rows easer, then you can use atype() later to set types.

auto_convert_dates

Identify date fields and automatically convert them to dates

allow_times

imes are not allowed in reading data in to facilitate easy binding. If you need times though, set this to TRUE.

check_numbers

Identfy numbers formatted as characters and convert them as such.

nazero

Convert NAs in numeric columns to 0.

check_logical

Identfy logical columns formatted as characters (Yes/No, etc) or numbers (0,1) and convert them as such.

stringsAsFactors

Convert characters to factors to increase processing speed and reduce file size.

na_strings

Strings to treat like NA. By default we use the easyr NA strings.

na_level

dplyr doesn't like factors to have NAs so we replace NAs with this value for factors only. Set NULL to skip.

ignore_rows_with_na_at

Vector or value, numeric or character, identifying column(s) that require a value. read.any will remove these rows after colname swaps and read, before type conversion. Especially helpful for removing things like page numbers at the bottom of an excel report that break type discovery. Suggest using the claim number column here.

drop.na.cols

Drop columns with only NA values.

drop.na.rows

Drop rows with only NA values.

fix.dup.column.names

Adds 'DUPLICATE #' to duplicated column names to avoid issues with multiple columns having the same name.

do.trim.sheetname

read.any will trim sheet names to get better matches. This will cause an error if the actual sheet name has spaces on the left or right side. Disable this functionality here.

x

If you want to use read.any functionality on an existing data frame, pass it with this argument.

isexcel

If you want to use read.any functionality on an existing data frame, you can tell read.any that this data came from excel using isexcel manually. This comes in handy when excel-integer date conversions are necessary.

encoding

Encoding passed to fread and read.csv.

verbose

Print helpful information via cat.

Value

Data frame with the data that was read.

Examples


folder = system.file('extdata', package = 'easyr')
read.any('date-time.csv', folder = folder)

# if dates are being converted incorrectly, disable date conversion:
read.any('date-time.csv', folder = folder, auto_convert_dates = FALSE)

# to handle type conversions manually:
read.any('date-time.csv', folder = folder, all_chars = TRUE)


[Package easyr version 0.5-11 Index]