| dataset-to-R {crunch} | R Documentation |
as.data.frame method for CrunchDataset
Description
This method is defined principally so that you can use a CrunchDataset as
a data argument to other R functions (such as stats::lm()) without
needing to download the whole dataset. You can, however, choose to download
a true data.frame.
Usage
## S3 method for class 'CrunchDataset'
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
force = FALSE,
categorical.mode = "factor",
row.order = NULL,
include.hidden = TRUE,
...
)
## S3 method for class 'CrunchDataFrame'
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
include.hidden = attr(x, "include.hidden"),
...
)
Arguments
x |
a |
row.names |
part of |
optional |
part of |
force |
logical: actually coerce the dataset to |
categorical.mode |
what mode should categoricals be pulled as? One of factor, numeric, id (default: factor) |
row.order |
vector of indices. Which, and their order, of the rows of
the dataset should be presented as (default: |
|
logical: should hidden variables be included? (default: | |
... |
additional arguments passed to |
Details
By default, the as.data.frame method for CrunchDataset does not return a
data.frame but instead CrunchDataFrame, which behaves like a
data.frame without bringing the whole dataset into memory.
When you access the variables of a CrunchDataFrame,
you get an R vector, rather than a CrunchVariable. This allows modeling functions
that require select columns of a dataset to retrieve only those variables from
the remote server, rather than pulling the entire dataset into local
memory.
If you call as.data.frame() on a CrunchDataset with force = TRUE, you
will instead get a true data.frame. You can also get this data.frame by
calling as.data.frame on a CrunchDataFrame (effectively calling
as.data.frame on the dataset twice)
When a data.frame is returned, the function coerces Crunch Variable
values into their R equivalents using the following rules:
Numeric variables become numeric vectors
Text variables become character vectors
Datetime variables become either
DateorPOSIXtvectorsCategorical variables become either factors with levels matching the Crunch Variable's categories (the default), or, if
categorical.modeis specified as "id" or "numeric", a numeric vector of category ids or numeric values, respectivelyArray variables (Categorical Array, Multiple Response) are decomposed into their constituent categorical subvariables. An array with three subvariables, for example, will result in three columns in the
data.frame.
Column names in the data.frame are the variable/subvariable aliases.
Value
When called on a CrunchDataset, the method returns an object of
class CrunchDataFrame unless force = TRUE, in which case the return is a
data.frame. For CrunchDataFrame, the method returns a data.frame.