| Split {sharp} | R Documentation |
Splitting observations into non-overlapping sets
Description
Generates a list of length(tau) non-overlapping sets of observation
IDs.
Usage
Split(data, family = NULL, tau = c(0.5, 0.25, 0.25))
Arguments
data |
vector or matrix of data. In regression, this should be the outcome data. |
family |
type of regression model. This argument is defined as in
|
tau |
vector of the proportion of observations in each of the sets. |
Details
With categorical outcomes (i.e. family argument is set to
"binomial", "multinomial" or "cox"), the split is done
such that the proportion of observations from each of the categories in
each of the sets is representative of that of the full sample.
Value
A list of length length(tau) with sets of non-overlapping
observation IDs.
Examples
# Splitting into 3 sets
simul <- SimulateRegression()
ids <- Split(data = simul$ydata)
lapply(ids, length)
# Balanced splits with respect to a binary variable
simul <- SimulateRegression(family = "binomial")
ids <- Split(data = simul$ydata, family = "binomial")
lapply(ids, FUN = function(x) {
table(simul$ydata[x, ])
})