R: Selects genes using the EMMIXgene algorithm.

select_genes {EMMIXgene}

R Documentation

Selects genes using the EMMIXgene algorithm.

Description

Follows the gene selection methodology of G. J. McLachlan, R. W. Bean, D. Peel; A mixture model-based approach to the clustering of microarray expression data , Bioinformatics, Volume 18, Issue 3, 1 March 2002, Pages 413–422, https://doi.org/10.1093/bioinformatics/18.3.413

Usage

select_genes(
  dat,
  filename,
  random_starts = 4,
  max_it = 100,
  ll_thresh = 8,
  min_clust_size = 8,
  tol = 1e-04,
  start_method = "both",
  three = FALSE
)

Arguments

`dat`	A matrix or dataframe containing gene expression data. Rows are genes and columns are samples. Must supply one of filename and dat.
`filename`	Name of file containing gene data. Can be either .csv or space separated .dat. Rows are genes and columns are samples. Must supply one of filename and dat.
`random_starts`	The number of random initializations used per gene when fitting mixtures of t-distributions. Initialization uses k-means by default.
`max_it`	The maximum number of iterations per mixture fit. Default value is 100.
`ll_thresh`	The difference in -2 log lambda used as a threshold for selecting between g=1 and g=2 for each gene. Default value is 8, which was chosen arbitrarily in the original paper.
`min_clust_size`	The minimum number of observations per cluster used when fitting mixtures of t-distributions for each gene. Default value is 8.
`tol`	Tolerance value used for detecting convergence of EMMIX fits.
`start_method`	Default value is "both". Can also choose "random" for purely random starts.
`three`	Also test g=2 vs g=3 where appropriate. Defaults to FALSE.

Value

An EMMIXgene object containing:

`stat`	The difference in log-likelihood for g=1 and g=2 for each gene (or for g=2 and g=3 where relevant).
`g`	The selected number of components for each gene.
`it`	The number of iterations for each genes selected fit.
`selected`	An indicator for each genes selected status
`ranks`	selected gene ids ranked by stat
`genes`	A dataframe of selected genes.
`all_genes`	Returns dat or contents of filename.

Examples

#only run on first 100 genes for speed
alon_sel <- select_genes(alon_data[seq_len(100), ])

[Package EMMIXgene version 0.1.4 Index]