R: Prepare data for regression routines

r_prepare_data {eclust}

R Documentation

Prepare data for regression routines

Description

This function will output the appropriate X and Y matrices in the right format for regression packages such as mgcv, caret and glmnet

Usage

r_prepare_data(data, response = "Y", exposure = "E", probe_names)

Arguments

`data`	the data frame which contains the response, exposure, and genes or cpgs or covariates. the columns should be labelled.
`response`	the column name of the response in the `data` argument
`exposure`	the column name of the exposure in the `data` argument
`probe_names`	the column names of the genes, or cpg sites or covariates

Value

a list of length 5:

X: the X matrix
Y: the response vector
E: the exposure vector
main_effect_names: the names of the main effects including the exposure
interaction_names: the names of the interaction effects

Examples

data("tcgaov")
tcgaov[1:5,1:6, with = FALSE]
Y <- log(tcgaov[["OS"]])
E <- tcgaov[["E"]]
genes <- as.matrix(tcgaov[,-c("OS","rn","subtype","E","status"),with = FALSE])
trainIndex <- drop(caret::createDataPartition(Y, p = 0.5, list = FALSE, times = 1))
testIndex <- setdiff(seq_len(length(Y)),trainIndex)

## Not run: 
cluster_res <- r_cluster_data(data = genes,
                              response = Y,
                              exposure = E,
                              train_index = trainIndex,
                              test_index = testIndex,
                              cluster_distance = "tom",
                              eclust_distance = "difftom",
                              measure_distance = "euclidean",
                              clustMethod = "hclust",
                              cutMethod = "dynamic",
                              method = "average",
                              nPC = 1,
                              minimum_cluster_size = 50)

pc_eclust_interaction <- r_prepare_data(data = cbind(cluster_res$clustersAddon$PC,
                                                     survival = Y[trainIndex],
                                                     subtype = E[trainIndex]),
                                        response = "survival", exposure = "subtype")
names(pc_eclust_interaction)
dim(pc_eclust_interaction$X)
pc_eclust_interaction$main_effect_names
pc_eclust_interaction$interaction_names

## End(Not run)

[Package eclust version 0.1.0 Index]