prepare_data {bigstep}R Documentation

Data preparation

Description

Create an object of class big which is needed to perform the selection procedure.

Usage

prepare_data(
  y,
  X,
  type = "linear",
  candidates = NULL,
  Xadd = NULL,
  na = NULL,
  maxp = 1e+06,
  verbose = TRUE
)

Arguments

y

a numeric vector of dependent (target) variable.

X

a numeric matrix or an object of class big.matrix. The columns of X should contain dependent variables (predictors).

type

a string, type of the regression model you want to fit. You can use one of these: "linear", "logistic", "poisson".

candidates

a numeric vector, columns from X which will be used in the selection procedure. The order is important. If NULL, every column will be used.

Xadd

a numeric matrix, additional variables which will be included in the model selection procedure (they will not be removed in any step). If NULL, Xadd will contain only a column of ones (the intercept). If you specify Xadd, a column of ones will be automatically added (it is impossible to not include the intercept).

na

a logical. There are any missing values in X? If NULL, it will be checked (it can take some time if X is big, so it is reasonable to set it).

maxp

a numeric. The matrix X will be split into parts with maxp elements. It will not change results, but it is necessary if your computer does not have enough RAM. Set to a lower value if you still have problems.

verbose

a logical. Set FALSE if you do not want to see any information during the selection procedure.

Details

The function automatically removes observations which have missing values in y. Type browseVignettes("bigstep") for more details.

Value

An object of class big.

Examples

X <- matrix(rnorm(20), ncol = 4)
y <- X[, 2] + rnorm(5)
data <- prepare_data(y, X)


[Package bigstep version 1.1.1 Index]