beta_kr {betaclust} | R Documentation |
Fit the K.R Model
Description
A beta mixture model for identifying differentially methylated CpG sites between R
DNA sample types collected from N
patients.
Usage
beta_kr(data, M = 3, N, R, parallel_process = FALSE, seed = NULL)
Arguments
data |
A dataframe of dimension |
M |
Number of methylation states to be identified. |
N |
Number of patients in the study. |
R |
Number of sample types collected from each patient for study. |
parallel_process |
The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations. |
seed |
Seed to allow for reproducibility (default = NULL). |
Details
The K.R model allows identification of the differentially methylated CpG sites between the R
DNA sample types collected from each of N
patients.
As each CpG site in a DNA sample can belong to one of M
methylation states, there can be K=M^R
methylation state changes between R
DNA sample types.
The shape parameters vary for each DNA sample type but are constrained to be equal for each patient. An initial clustering using k-means is performed to identify K
clusters. The resulting clustering solution is provided as
starting values to the Expectation-Maximisation algorithm. A digamma approximation is used to obtain the maximised
parameters in the M-step.
Value
A list containing:
cluster_size - The total number of CpG sites in each of the K clusters.
llk - A vector containing the log-likelihood value at each step of the EM algorithm.
alpha - The first shape parameter for the beta mixture model.
delta - The second shape parameter for the beta mixture model.
tau - The estimated mixing proportion for each cluster.
z - A matrix of dimension
C \times K
containing the posterior probability of each CpG site belonging to each of theK
clusters.classification - The classification corresponding to z, i.e. map(z).
uncertainty - The uncertainty of each CpG site's clustering.
DM - The AUC and WD metric for distribution similarity in each cluster.
See Also
Examples
my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output = beta_kr(pca.methylation.data[1:30,2:9], M, N, R,
parallel_process = FALSE, seed = my.seed)