R: Fit the K.R Model

beta_kr {betaclust}

R Documentation

Fit the K.R Model

Description

A beta mixture model for identifying differentially methylated CpG sites between R DNA sample types collected from N patients.

Usage

beta_kr(data, M = 3, N, R, parallel_process = FALSE, seed = NULL)

Arguments

`data`	A dataframe of dimension `C \times NR` containing methylation values for `C` CpG sites from `R` sample types collected from `N` patients. Samples are grouped together in the dataframe such that the columns are ordered as Sample1_Patient1, Sample1_Patient2, Sample2_Patient1, Sample2_Patient2, etc.
`M`	Number of methylation states to be identified.
`N`	Number of patients in the study.
`R`	Number of sample types collected from each patient for study.
`parallel_process`	The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations.
`seed`	Seed to allow for reproducibility (default = NULL).

Details

The K.R model allows identification of the differentially methylated CpG sites between the R DNA sample types collected from each of N patients. As each CpG site in a DNA sample can belong to one of M methylation states, there can be K=M^R methylation state changes between R DNA sample types. The shape parameters vary for each DNA sample type but are constrained to be equal for each patient. An initial clustering using k-means is performed to identify K clusters. The resulting clustering solution is provided as starting values to the Expectation-Maximisation algorithm. A digamma approximation is used to obtain the maximised parameters in the M-step.

Value

A list containing:

cluster_size - The total number of CpG sites in each of the K clusters.
llk - A vector containing the log-likelihood value at each step of the EM algorithm.
alpha - The first shape parameter for the beta mixture model.
delta - The second shape parameter for the beta mixture model.
tau - The estimated mixing proportion for each cluster.
z - A matrix of dimension C \times K containing the posterior probability of each CpG site belonging to each of the K clusters.
classification - The classification corresponding to z, i.e. map(z).
uncertainty - The uncertainty of each CpG site's clustering.
DM - The AUC and WD metric for distribution similarity in each cluster.

Examples

my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output = beta_kr(pca.methylation.data[1:30,2:9], M, N, R,
                      parallel_process = FALSE, seed = my.seed)

[Package betaclust version 1.0.3 Index]