R: IMIX

IMIX {IMIX}

R Documentation

IMIX

Description

Fitting a multivariate mixture model framework, model selection for the best model, and adaptive procedure for FDR control. Input of summary statistics z scores or p values of two or three data types.

Usage

IMIX(
  data_input,
  data_type = c("p", "z"),
  mu_ini = NULL,
  sigma_ini = NULL,
  p_ini = NULL,
  tol = 1e-06,
  maxiter = 1000,
  seed = 10,
  ini.ind = TRUE,
  model = c("all", "IMIX_ind", "IMIX_cor_twostep", "IMIX_cor_restrict", "IMIX_cor"),
  model_selection_method = c("BIC", "AIC"),
  alpha = 0.2,
  verbose = FALSE,
  sort_label = TRUE
)

Arguments

`data_input`	An n x d data frame or matrix of the summary statistics z score or p value, n is the nubmer of genes, d is the number of data types. Each row is a gene, each column is a data type.
`data_type`	Whether the input data is the p values or z scores, default is p value
`mu_ini`	Initial values for the mean of the independent mixture model distribution. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).
`sigma_ini`	Initial values for the standard deviations of the two components in each data type. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).
`p_ini`	Initial values for the proportion of the distribution of the two components in each data type. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).
`tol`	The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon.
`maxiter`	The maximum number of iteration, default is 1000
`seed`	Set.seed, default is 10
`ini.ind`	Use the parameters estimated from IMIX-ind for initial values of other IMIX models, default is TRUE
`model`	Which model to use to compute the data, default is all
`model_selection_method`	Model selection information criteria, based on AIC or BIC, default is BIC
`alpha`	Prespecified nominal level for global FDR control, default is 0.2
`verbose`	Whether to print the full log-likelihood for each iteration, default is FALSE
`sort_label`	Whether to sort the component labels in case component labels switched after convergence of the initial values, default is TRUE, notice that if the users chooose not to, they might need to check the interested IMIX model for the converged mean for the true component labels and perform the adaptive FDR control separately for an acurate result

Value

A list of results of IMIX

`IMIX_ind`	Results of IMIX_ind, assuming all data types are independent
`IMIX_cor_twostep`	Results of IMIX_cor_twostep, by default the mean is the estimated value of IMIX_ind. If the users are interested to use another mean input, they could directly use function IMIX_cor_twostep and specify the mean
`IMIX_cor`	Results of IMIX_cor
`IMIX_cor_restrict`	Results of IMIX_cor_restrict
`AIC/BIC`	The AIC and BIC values of all fitted models
`Selected Model`	The model with the smallest AIC or BIC value, this is determined by user specifications in the function input "model_selection_method", by default is BIC
`significant_genes_with_FDRcontrol`	The output of each gene ordered by the components based on FDR control and within each component ordered by the local FDR, "localFDR" is 1-posterior probability of each gene in the component based on the maximum posterior probability, "class_withoutFDRcontrol" is the classified component based on maximum posterior probability, "class_FDRcontrol" is the classified component based on the across-data-type FDR control at alpha level
`estimatedFDR`	The estimated marginal FDR value for each component starting from component 2 (component 1 is the global null)
`alpha`	Prespecified nominal level for the across-data-type FDR control

References

Ziqiao Wang and Peng Wei. 2020. “IMIX: a multivariate mixture model approach to association analysis through multi-omics data integration.” Bioinformatics. <doi:10.1093/bioinformatics/btaa1001>.

Tatiana Benaglia, Didier Chauveau, David R. Hunter, and Derek Young. 2009. “mixtools: An R Package for Analyzing Finite Mixture Models.” Journal of Statistical Software 32 (6): 1–29. https://www.jstatsoft.org/v32/i06/.

Examples

# A toy example
data("data_p")
set.seed(10)
data <- data_p[sample(1:1000,200,replace = FALSE),]
mu_input <- c(0,3,0,3)
sigma_input <- rep(1,4)
p_input <- rep(0.5,4)
test <- IMIX(data_input = data,data_type = "p",mu_ini = mu_input,sigma_ini = sigma_input,
             p_ini = p_input,alpha = 0.1,model_selection_method = "BIC",
             sort_label = FALSE,model = "IMIX_ind")


# The details of this example can be found in Github vignette
# First load the data
data("data_p")

# Specify initial values (this step could be omitted)
mu_input <- c(0,3,0,3)
sigma_input <- rep(1,4)
p_input <- rep(0.5,4)

# Fit IMIX model
test1 <- IMIX(data_input = data_p,data_type = "p",mu_ini = mu_input,sigma_ini = sigma_input,
p_ini = p_input,alpha = 0.1,model_selection_method = "AIC")

#Results
# Print the estimated across-data-type FDR for each component
test1$estimatedFDR

# The AIC and BIC values for each model
test1$`AIC/BIC` 

# The best fitted model selected by AIC
test1$`Selected Model` 

# The output of IMIX_cor_twostep
str(test1$IMIX_cor_twostep) 

# The output of genes with local FDR values and classified components
dim(test1$significant_genes_with_FDRcontrol)
head(test1$significant_genes_with_FDRcontrol)

[Package IMIX version 1.1.5 Index]