detect_dm_csv {LaF} | R Documentation |
Automatically detect data models for CSV-files
Description
Automatically detect data models for CSV-files. Opening of files using the
data models can be done using laf_open
.
Usage
detect_dm_csv(
filename,
sep = ",",
dec = ".",
header = FALSE,
nrows = 1000,
nlines = NULL,
sample = FALSE,
stringsAsFactors = TRUE,
factor_fraction = 0.4,
...
)
Arguments
filename |
character containing the filename of the csv-file. |
sep |
character vector containing the separator used in the file. |
dec |
the character used for decimal points. |
header |
does the first line in the file contain the column names. |
nrows |
the number of lines that should be read in to detect the column types. The more lines the more likely that the correct types are detected. |
nlines |
(only needed when the sample option is used) the expected number of lines in the file. If not specified the number of lines in the file is first calculated. |
sample |
by default the first |
stringsAsFactors |
passed on to |
factor_fraction |
the fraction of unique string in a column below which the column is converted to a factor/categorical. For more information see details. |
... |
additional arguments are passed on to |
Details
The argument factor_fraction
determines the fraction of unique strings
below which the column is converted to factor/categorical. If all column need
to be converted to character a value larger than one can be used. A value
smaller than zero will ensure that all columns will be converted to
categorical. Note that LaF stores the levels of a categorical in memory.
Therefore, for categorical columns with a very large number of (almost) unique
levels can cause memory problems.
Value
read_dm
returns a data model which can be used by
laf_open
. The data model can be written to file using
write_dm
.
See Also
See write_dm
to write the data model to file. The data models
can be used to open a file using laf_open
.
Examples
# Create temporary filename
tmpcsv <- tempfile(fileext="csv")
# Generate test data
ntest <- 10
column_types <- c("integer", "integer", "double", "string")
testdata <- data.frame(
a = 1:ntest,
b = sample(1:2, ntest, replace=TRUE),
c = round(runif(ntest), 13),
d = sample(c("jan", "pier", "tjores", "corneel"), ntest, replace=TRUE),
stringsAsFactors = FALSE
)
# Write test data to csv file
write.table(testdata, file=tmpcsv, row.names=FALSE, col.names=TRUE, sep=',')
# Detect data model
model <- detect_dm_csv(tmpcsv, header=TRUE)
# Create LaF-object
laf <- laf_open(model)
# Cleanup
file.remove(tmpcsv)