betaclust {betaclust} | R Documentation |

A family of model-based clustering techniques to identify methylation states in beta-valued DNA methylation data.

```
betaclust(
data,
M = 3,
N,
R,
model_names = "K..",
model_selection = "BIC",
parallel_process = FALSE,
seed = NULL
)
```

`data` |
A dataframe of dimension |

`M` |
Number of methylation states to be identified in a DNA sample. |

`N` |
Number of patients in the study. |

`R` |
Number of samples collected from each patient for the study. |

`model_names` |
Models to run from the set of models, K.., KN. and K.R, default = K.. . See details. |

`model_selection` |
Information criterion used for model selection. Options are AIC, BIC or ICL (default = BIC). |

`parallel_process` |
The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations. |

`seed` |
Seed to allow for reproducibility (default = NULL). |

This is a wrapper function which can be used to fit all three models (K.., KN., K.R) within a single function.

The K.. and KN. models are used to analyse a single DNA sample (`R = 1`

) and cluster the `C`

CpG sites into the `K`

clusters which represent the different methylation states in a DNA sample. As each CpG site can belong to any of the `M=3`

methylation states (hypomethylation, hemimethylation and hypermethylation), the default value for `K=M=3`

.
The thresholds between methylation states are objectively inferred from the clustering solution.

The K.R model is used to analyse `R`

independent samples collected from `N`

patients, where each sample contains `C`

CpG sites, and cluster
the dataset into `K=M^R`

clusters to identify the differentially methylated CpG (DMC) sites between the `R`

DNA samples.

The function returns an object of the `betaclust`

class which contains the following values:

information_criterion - The information criterion used to select the optimal model.

ic_output - The information criterion value calculated for each model.

optimal_model - The model selected as optimal.

function_call - The parameters passed as arguments to the function

`betaclust`

.K - The number of clusters identified using the beta mixture models.

C - The number of CpG sites analysed using the beta mixture models.

N - The number of patients analysed using the beta mixture models.

R - The number of samples analysed using the beta mixture models.

optimal_model_results - Information from the optimal model. Specifically,

cluster_size - The total number of CpG sites in each of the K clusters.

llk - A vector containing the log-likelihood value at each step of the EM algorithm.

alpha - This contains the first shape parameter for the beta mixture model.

delta - This contains the second shape parameter for the beta mixture model.

tau - The proportion of CpG sites in each cluster.

z - A matrix of dimension

`C \times K`

containing the posterior probability of each CpG site belonging to each of the`K`

clusters.classification - The classification corresponding to z, i.e. map(z).

uncertainty - The uncertainty of each CpG site's clustering.

thresholds - Threshold points calculated under the K.. or the KN. model.

Silva, R., Moran, B., Russell, N.M., Fahey, C., Vlajnic, T., Manecksha, R.P., Finn, S.P., Brennan, D.J., Gallagher, W.M., Perry, A.S.: Evaluating liquid biopsies for methylomic profiling of prostate cancer. Epigenetics 15(6-7), 715-727 (2020). doi: 10.1080/15592294.2020.1712876.

Majumdar, K., Silva, R., Perry, A.S., Watson, R.W., Murphy, T.B., Gormley, I.C.: betaclust: a family of mixture models for beta valued DNA methylation data. arXiv [stat.ME] (2022). doi: 10.48550/ARXIV.2211.01938.

```
my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output <- betaclust(pca.methylation.data[1:30,2:9], M, N, R,
model_names = c("K..","KN.","K.R"), model_selection = "BIC",
parallel_process = FALSE, seed = my.seed)
```

[Package *betaclust* version 1.0.0 Index]