R: Calculate the conditional probability of belonging to each...

probaPost {HTSCluster}

R Documentation

Calculate the conditional probability of belonging to each cluster in a Poisson mixture model

Description

This function computes the conditional probabilities t_{ik} that an observation i arises from the k^{\mathrm{th}} component for the current value of the mixture parameters.

Usage

probaPost(y, g, conds, pi, s, lambda)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`g`	Number of clusters
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`pi`	Vector of length `g` containing the current estimate of `\hat{\boldsymbol{\pi}}`
`s`	Vector of length q containing the estimates for the normalized library size parameters for each of the q variables in `y`
`lambda`	(d x `g`) matrix containing the current estimate `\boldsymbol{\lambda}`, where d is the number of conditions (treatment groups)

Value

`t`	(n x `g`) matrix made up of the conditional probability of each observation belonging to each of the `g` clusters

Note

If all values of t_{ik} are 0 (or nearly zero), the observation is assigned with probability one to belong to the cluster with the closest mean (in terms of the Euclidean distance from the observation). To avoid calculation difficulties, extreme values of t_{ik} are smoothed, such that those smaller than 1e-10 or larger than 1-1e-10 are set equal to 1e-10 and 1-1e-10, respectively.

Author(s)

Andrea Rau

References

Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux G. (2015). Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31(9):1420-1427.

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
s <- colSums(y) / sum(y)     ## TC estimate of lib size

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, norm = "TC",
 	conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Calculate the conditional probability of belonging to each cluster
proba <- probaPost(y, g = 3, conds = conds, pi = pi.est, s = s, 
	lambda = lambda.est)

## head(round(proba,2))

[Package HTSCluster version 2.0.11 Index]