estim.mix {imp4p} | R Documentation |
Estimation of a mixture model of MCAR and MNAR values in each column of a data matrix.
Description
This function allows estimating a mixture model of MCAR and MNAR values in each column of data sets similar to the ones which can be studied in MS-based quantitative proteomics. Such data matrices contain intensity values of identified peptides.
Usage
estim.mix(tab, tab.imp, conditions, x.step.mod=150, x.step.pi=150,
nb.rei=200)
Arguments
tab |
A data matrix containing numeric and missing values. Each column of this matrix is assumed to correspond to an experimental sample, and each row to an identified peptide. |
tab.imp |
A matrix where the missing values of |
conditions |
A vector of factors indicating the biological condition to which each column (experimental sample) belongs. |
x.step.mod |
The number of points in the intervals used for estimating the cumulative distribution functions of the mixing model in each column. |
x.step.pi |
The number of points in the intervals used for estimating the proportion of MCAR values in each column. |
nb.rei |
The number of initializations of the minimization algorithm used to estimate the proportion of MCAR values (see Details). |
Details
This function aims to estimate the following mixture model in each column:
F_{tot}(x)=\pi_{na}\times F_{na}(x)+(1-\pi_{na})\times F_{obs}(x)
F_{na}(x)=\pi_{mcar}\times F_{tot}(x)+(1-\pi_{mcar})\times F_{mnar}(x)
where \pi_{na}
is the proportion of missing values, \pi_{mcar}
is the proportion of MCAR values, F_{tot}
is the cumulative distribution function (cdf) of the complete values, F_{na}
is the cdf of the missing values, F_{obs}
is the cdf of the observed values, and F_{mnar}
is the cdf of the MNAR values.
To estimate this model, a first step consists to compute a rough estimate of F_{na}
by assuming that all missing values are MCAR (thanks to the argument tab.imp
). This rough estimate is noted \hat{F}_{na}
.
In a second step, the proportion of MCAR values is estimated. To do so, the ratio
\hat{\pi}(x)=(1-\hat{F}_{na}(x))/(1-\hat{F}_{tot}(x))
is computed for different x
, where
\hat{F}_{tot}(x)=\pi_{na}\times \hat{F}_{na}(x)+(1-\pi_{na})\times \hat{F}_{obs}(x)
with \hat{F}_{obs}
the empirical cdf of the observed values.
Next, the following minimization is performed:
\min_{1>k>0,a>0,d>0}f(k,a,d)
where
f(k,a,d)=\sum_x \frac{1}{s(x)^2}\times [\hat{\pi}(x)-k-(1-k)\frac{\exp(-a\times [x-lower_x]^d)}{1-\hat{F}_{tot}(x)}]^2
where s(x)^2
is an estimate of the asymptotic variance of \hat{\pi}(x)
, lower_x
is an estimate of the minimum of the complete values. To perform this minimization, the function optim
with the method "L-BFGS-B" is used. Because it is function of its initialization, it is possible to reinitialize a number of times the minimisation algorithm with the argument nb.rei
: the parameters leading to the lowest minimum are next kept.
Once k
, a
and d
are estimated, one can use several methods to estimate \pi_{mcar}
: it is estimated
by k
;
Value
A list composed of:
abs.pi |
A numeric matrix containing the intervals used for estimating the ratio
in each column. |
pi.init |
A numeric matrix containing the estimated ratios
where |
var.pi.init |
A numeric matrix containing the estimated asymptotic variances of |
trend.pi.init |
A numeric matrix containing the estimated trend of the model used in the minimization algorithm. |
abs.mod |
A numeric vector containing the interval used for estimating the mixture models in each column. |
pi.na |
A numeric vector containing the proportions of missing values in each column. |
F.na |
A numeric matrix containing the estimated cumulative distribution functions of missing values in each column on the interval |
F.tot |
A numeric matrix containing the estimated cumulative distribution functions of complete values in each column on the interval |
F.obs |
A numeric matrix containing the estimated cumulative distribution functions of observed values in each column on the interval |
pi.mcar |
A numeric vector containing the estimations of the proportion of MCAR values in each column. |
MinRes |
A numeric matrix containing the three parameters of the model used in the minimization algorithm (three first rows), and the value of minimized function. |
Author(s)
Quentin Giai Gianetto <quentin2g@yahoo.fr>
See Also
Examples
#Simulating data
res.sim=sim.data(nb.pept=2000,nb.miss=600);
#Imputation of missing values with a MCAR-devoted algorithm: here the slsa algorithm
dat.slsa=impute.slsa(tab=res.sim$dat.obs,conditions=res.sim$condition,repbio=res.sim$repbio);
#Estimation of the mixture model
res=estim.mix(tab=res.sim$dat.obs, tab.imp=dat.slsa, conditions=res.sim$condition);