mxComputeLoadData {OpenMx} | R Documentation |
Load columns into an MxData object
Description
Usage
mxComputeLoadData(
dest,
column,
method = c("csv", "data.frame"),
...,
path = c(),
originalDataIsIndexOne = FALSE,
byrow = TRUE,
row.names = c(),
col.names = c(),
skip.rows = 0,
skip.cols = 0,
verbose = 0L,
cacheSize = 100L,
checkpointMetadata = TRUE,
na.strings = c("NA"),
observed = NULL,
rowFilter = c()
)
Arguments
dest |
the name of the model where the columns will be loaded |
column |
a character vector. The column names to replace. |
method |
name of the conduit used to load the columns. |
... |
Not used. Forces remaining arguments to be specified by name. |
path |
the path to the file containing the data |
originalDataIsIndexOne |
logical. Whether to use the initial data for index 1 |
byrow |
logical. Whether the data columns are stored in rows. |
row.names |
optional integer. Column containing the row names. |
col.names |
optional integer. Row containing the column names. |
skip.rows |
integer. Number of rows to skip before reading data. |
skip.cols |
integer. Number of columns to skip before reading data. |
verbose |
integer. Level of run-time diagnostic output. Set to zero to disable |
cacheSize |
integer. How many columns to cache per scan through the data. Only used when byrow=FALSE. |
checkpointMetadata |
logical. Whether to add per record metadata to the checkpoint |
na.strings |
character vector. A vector of strings that denote a missing value. |
observed |
data frame. The reservoir of data for |
rowFilter |
logical vector. Whether to skip the source row. |
Details
The purpose of this compute step is to help quickly perform many similar analyses. For example, if we are given a sample of people with a few million SNPs (single-nucleotide polymorphism) per person then we could fit a separate model for each SNP by iterating over the SNP data.
The column names given in the column
parameter must already
exist in the model's MxData object. Pre-existing data is assumed to be
a placeholder and is not used unless
originalDataIsIndexOne
is set to TRUE.
For method='csv'
, the highest performance arrangement is
byrow=TRUE
because entire columns are stored in single
chunks (rows) on the disk and can be easily loaded. For
byrow=FALSE
, the data requires transposition. To load a
single column of observed data, it is necessary to read through
the whole file. This can be slow for large files. To amortize the
cost of transposition, cacheSize
columns are loaded on
every pass through the file.
After mxRun
returns, the dest
mxData object will
contain the most recently loaded data. Hence, any single analysis
of a series can be reproduced by issuing mxComputeLoadData
with the single index associated with a particular dataset,
replacing the compute plan with something like
omxDefaultComputePlan
, and then passing the model back
through mxRun
. This can be a helpful approach when
investigating unexpected results.
See Also
mxComputeLoadMatrix, mxComputeCheckpoint, mxRun, omxDefaultComputePlan