loadData {baytrends} | R Documentation |
Load/Clean CSV and TXT Data File
Description
Load and clean comma delimited (*.csv) or tab delimited (*.txt) file and perform some rudimentary data cleaning.
Usage
loadData(
file = NA,
folder = ".",
pk = NA,
remDup = TRUE,
remNAcol = TRUE,
remNArow = TRUE,
convDates = TRUE,
tzSel = "America/New_York",
commChar = "#",
naChar = NA
)
Arguments
file |
file (can use wildcards, e.g., "*.csv") |
folder |
folder (i.e., directory to look in, can use relative path ) |
pk |
vector of columns that form the primary key for data set |
remDup |
logical field indicating whether duplicate rows are deleted |
remNAcol |
logical field indicating whether columns with all NA are deleted |
remNArow |
logical field indicating whether rows with all NA are deleted |
convDates |
vector or logical field indicating whether date-like columns should be converted to POSIXct format (see details) |
tzSel |
time zone to use for date conversions (default: "America/New_York") |
commChar |
character for comment line to be skipped |
naChar |
characters to treat as NA |
Details
This function reads in a single comma delimited (*.csv) or tab
delimited (*.txt) file using either utils::read.table
or utils::read.csv
based on the file extension. The user can use the wildcard feature for the
file argument (e.g., file='*.csv') and the function will identify the most
recently modified csv or txt file in the folder for importing.
Some specific features of this function include the following:
1. Leading '0's in character strings that would otherwise be trimmed and treated as numeric variables (e.g., USGS flow gages, state and county FIPS codes) are preserved. To effectively use this functionality, data maintained in a spreadsheet would be enclosed in quotes (e.g., "01578310"). When exported to csv or txt files the field would be in triple quotes (e.g., """01578310"""). Any column read in as integer is converted to numeric.
2. Rows and columns with no data (i.e., all NA) are deleted unless default settings for remNAcol and remNArow are changed to FALSE.
3. Completely duplicate rows are deleted unless default setting for remDup is changed to FALSE.
4. Rows beginning with '#' are skipped unless commChar set to ""
5. If a primary key (either single or multiple columns) is selected, the function enforces the primary key by deleting duplicate entries based on the primary key. Columns corresponding to the primary key (when specified) are moved to the first columns.
6. If convDates is a vector (i.e., c('beginDate', 'endDate')
), then a date
conversion is attempted for the corresponding columns found in the input
file. If TRUE, then a date conversion is attempted for all columns found in
the input file with 'date' in the name, If FALSE, no date conversion is
attempted.
Some other common time zones include the following: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, America/Honolulu, America/Jamaica, America/Managua, America/Phoenix, America/Metlakatla
A brief table reporting the results of the import are printed.
Note that columns containing just F, T, FALSE, TRUE are stored as logical fields
Value
Returns data frame