R: Load columns into an MxData object

mxComputeLoadData {OpenMx}

R Documentation

Load columns into an MxData object

Description

Usage

mxComputeLoadData(
  dest,
  column,
  method = c("csv", "data.frame"),
  ...,
  path = c(),
  originalDataIsIndexOne = FALSE,
  byrow = TRUE,
  row.names = c(),
  col.names = c(),
  skip.rows = 0,
  skip.cols = 0,
  verbose = 0L,
  cacheSize = 100L,
  checkpointMetadata = TRUE,
  na.strings = c("NA"),
  observed = NULL,
  rowFilter = c()
)

Arguments

`dest`	the name of the model where the columns will be loaded
`column`	a character vector. The column names to replace.
`method`	name of the conduit used to load the columns.
`...`	Not used. Forces remaining arguments to be specified by name.
`path`	the path to the file containing the data
`originalDataIsIndexOne`	logical. Whether to use the initial data for index 1
`byrow`	logical. Whether the data columns are stored in rows.
`row.names`	optional integer. Column containing the row names.
`col.names`	optional integer. Row containing the column names.
`skip.rows`	integer. Number of rows to skip before reading data.
`skip.cols`	integer. Number of columns to skip before reading data.
`verbose`	integer. Level of run-time diagnostic output. Set to zero to disable
`cacheSize`	integer. How many columns to cache per scan through the data. Only used when byrow=FALSE.
`checkpointMetadata`	logical. Whether to add per record metadata to the checkpoint
`na.strings`	character vector. A vector of strings that denote a missing value.
`observed`	data frame. The reservoir of data for `method='data.frame'`.
`rowFilter`	logical vector. Whether to skip the source row.

Details

The purpose of this compute step is to help quickly perform many similar analyses. For example, if we are given a sample of people with a few million SNPs (single-nucleotide polymorphism) per person then we could fit a separate model for each SNP by iterating over the SNP data.

The column names given in the column parameter must already exist in the model's MxData object. Pre-existing data is assumed to be a placeholder and is not used unless originalDataIsIndexOne is set to TRUE.

For method='csv', the highest performance arrangement is byrow=TRUE because entire columns are stored in single chunks (rows) on the disk and can be easily loaded. For byrow=FALSE, the data requires transposition. To load a single column of observed data, it is necessary to read through the whole file. This can be slow for large files. To amortize the cost of transposition, cacheSize columns are loaded on every pass through the file.

After mxRun returns, the dest mxData object will contain the most recently loaded data. Hence, any single analysis of a series can be reproduced by issuing mxComputeLoadData with the single index associated with a particular dataset, replacing the compute plan with something like omxDefaultComputePlan, and then passing the model back through mxRun. This can be a helpful approach when investigating unexpected results.

Load columns into an MxData object

Description

Usage

Arguments

Details

See Also