betaclust {betaclust}R Documentation

The betaclust wrapper function

Description

A family of model-based clustering techniques to identify methylation states in beta-valued DNA methylation data.

Usage

betaclust(
  data,
  M = 3,
  N,
  R,
  model_names = "K..",
  model_selection = "BIC",
  parallel_process = FALSE,
  seed = NULL
)

Arguments

data

A dataframe of dimension C \times NR containing methylation values for C CpG sites from R sample types collected from N patients. Samples are grouped together in the dataframe such that the columns are ordered as Sample1_Patient1, Sample1_Patient2, Sample2_Patient1, Sample2_Patient2, etc.

M

Number of methylation states to be identified in a DNA sample type.

N

Number of patients in the study.

R

Number of sample types collected from each patient for the study.

model_names

Models to run from the set of models, K.., KN. and K.R, default = K.. . See details.

model_selection

Information criterion used for model selection. Options are AIC, BIC or ICL (default = BIC).

parallel_process

The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations.

seed

Seed to allow for reproducibility (default = NULL).

Details

This is a wrapper function which can be used to fit all three models (K.., KN., K.R) within a single function.

The K.. and KN. models are used to analyse a single DNA sample type (R = 1) and cluster the C CpG sites into the K clusters which represent the different methylation states in a DNA sample type. As each CpG site can belong to any of the M=3 methylation states (hypomethylation, hemimethylation and hypermethylation), the default value for K=M=3. The thresholds between methylation states are objectively inferred from the clustering solution.

The K.R model is used to analyse R independent sample types collected from N patients, where each sample contains C CpG sites, and cluster the dataset into K=M^R clusters to identify the differentially methylated CpG (DMC) sites between the R DNA sample types.

Value

The function returns an object of the betaclust class which contains the following values:

References

Silva, R., Moran, B., Russell, N.M., Fahey, C., Vlajnic, T., Manecksha, R.P., Finn, S.P., Brennan, D.J., Gallagher, W.M., Perry, A.S.: Evaluating liquid biopsies for methylomic profiling of prostate cancer. Epigenetics 15(6-7), 715-727 (2020). doi:10.1080/15592294.2020.1712876.

Majumdar, K., Silva, R., Perry, A.S., Watson, R.W., Murphy, T.B., Gormley, I.C.: betaclust: a family of mixture models for beta valued DNA methylation data. arXiv [stat.ME] (2022). doi:10.48550/ARXIV.2211.01938.

See Also

beta_k

beta_kn

beta_kr

pca.methylation.data

plot.betaclust

summary.betaclust

threshold

Examples


my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output <- betaclust(pca.methylation.data[1:30,2:9], M, N, R,
            model_names = c("K..","KN.","K.R"), model_selection = "BIC",
            parallel_process = FALSE, seed = my.seed)



[Package betaclust version 1.0.3 Index]