r_prepare_data {eclust} | R Documentation |
Prepare data for regression routines
Description
This function will output the appropriate X and Y matrices in
the right format for regression packages such as mgcv
, caret
and glmnet
Usage
r_prepare_data(data, response = "Y", exposure = "E", probe_names)
Arguments
data |
the data frame which contains the response, exposure, and genes or cpgs or covariates. the columns should be labelled. |
response |
the column name of the response in the |
exposure |
the column name of the exposure in the |
probe_names |
the column names of the genes, or cpg sites or covariates |
Value
a list of length 5:
- X
the X matrix
- Y
the response vector
- E
the exposure vector
- main_effect_names
the names of the main effects including the exposure
- interaction_names
the names of the interaction effects
Examples
data("tcgaov")
tcgaov[1:5,1:6, with = FALSE]
Y <- log(tcgaov[["OS"]])
E <- tcgaov[["E"]]
genes <- as.matrix(tcgaov[,-c("OS","rn","subtype","E","status"),with = FALSE])
trainIndex <- drop(caret::createDataPartition(Y, p = 0.5, list = FALSE, times = 1))
testIndex <- setdiff(seq_len(length(Y)),trainIndex)
## Not run:
cluster_res <- r_cluster_data(data = genes,
response = Y,
exposure = E,
train_index = trainIndex,
test_index = testIndex,
cluster_distance = "tom",
eclust_distance = "difftom",
measure_distance = "euclidean",
clustMethod = "hclust",
cutMethod = "dynamic",
method = "average",
nPC = 1,
minimum_cluster_size = 50)
pc_eclust_interaction <- r_prepare_data(data = cbind(cluster_res$clustersAddon$PC,
survival = Y[trainIndex],
subtype = E[trainIndex]),
response = "survival", exposure = "subtype")
names(pc_eclust_interaction)
dim(pc_eclust_interaction$X)
pc_eclust_interaction$main_effect_names
pc_eclust_interaction$interaction_names
## End(Not run)
[Package eclust version 0.1.0 Index]