estim.mix {imp4p} | R Documentation |
Estimation of a mixture model of MCAR and MNAR values in each column of a data matrix.
Description
This function allows estimating a mixture model of MCAR and MNAR values in each column of data sets similar to the ones which can be studied in MS-based quantitative proteomics. Such data matrices contain intensity values of identified peptides.
Usage
estim.mix(tab, tab.imp, conditions, x.step.mod=150, x.step.pi=150,
nb.rei=200)
Arguments
tab |
A data matrix containing numeric and missing values. Each column of this matrix is assumed to correspond to an experimental sample, and each row to an identified peptide. |
tab.imp |
A matrix where the missing values of |
conditions |
A vector of factors indicating the biological condition to which each column (experimental sample) belongs. |
x.step.mod |
The number of points in the intervals used for estimating the cumulative distribution functions of the mixing model in each column. |
x.step.pi |
The number of points in the intervals used for estimating the proportion of MCAR values in each column. |
nb.rei |
The number of initializations of the minimization algorithm used to estimate the proportion of MCAR values (see Details). |
Details
This function aims to estimate the following mixture model in each column:
where is the proportion of missing values,
is the proportion of MCAR values,
is the cumulative distribution function (cdf) of the complete values,
is the cdf of the missing values,
is the cdf of the observed values, and
is the cdf of the MNAR values.
To estimate this model, a first step consists to compute a rough estimate of by assuming that all missing values are MCAR (thanks to the argument
tab.imp
). This rough estimate is noted .
In a second step, the proportion of MCAR values is estimated. To do so, the ratio
is computed for different , where
with the empirical cdf of the observed values.
Next, the following minimization is performed:
where
where is an estimate of the asymptotic variance of
,
is an estimate of the minimum of the complete values. To perform this minimization, the function
optim
with the method "L-BFGS-B" is used. Because it is function of its initialization, it is possible to reinitialize a number of times the minimisation algorithm with the argument nb.rei
: the parameters leading to the lowest minimum are next kept.
Once k
, a
and d
are estimated, one can use several methods to estimate : it is estimated
by
;
Value
A list composed of:
abs.pi |
A numeric matrix containing the intervals used for estimating the ratio
in each column. |
pi.init |
A numeric matrix containing the estimated ratios
where |
var.pi.init |
A numeric matrix containing the estimated asymptotic variances of |
trend.pi.init |
A numeric matrix containing the estimated trend of the model used in the minimization algorithm. |
abs.mod |
A numeric vector containing the interval used for estimating the mixture models in each column. |
pi.na |
A numeric vector containing the proportions of missing values in each column. |
F.na |
A numeric matrix containing the estimated cumulative distribution functions of missing values in each column on the interval |
F.tot |
A numeric matrix containing the estimated cumulative distribution functions of complete values in each column on the interval |
F.obs |
A numeric matrix containing the estimated cumulative distribution functions of observed values in each column on the interval |
pi.mcar |
A numeric vector containing the estimations of the proportion of MCAR values in each column. |
MinRes |
A numeric matrix containing the three parameters of the model used in the minimization algorithm (three first rows), and the value of minimized function. |
Author(s)
Quentin Giai Gianetto <quentin2g@yahoo.fr>
See Also
Examples
#Simulating data
res.sim=sim.data(nb.pept=2000,nb.miss=600);
#Imputation of missing values with a MCAR-devoted algorithm: here the slsa algorithm
dat.slsa=impute.slsa(tab=res.sim$dat.obs,conditions=res.sim$condition,repbio=res.sim$repbio);
#Estimation of the mixture model
res=estim.mix(tab=res.sim$dat.obs, tab.imp=dat.slsa, conditions=res.sim$condition);