select_genes {EMMIXgene} | R Documentation |
Selects genes using the EMMIXgene algorithm.
Description
Follows the gene selection methodology of G. J. McLachlan, R. W. Bean, D. Peel; A mixture model-based approach to the clustering of microarray expression data , Bioinformatics, Volume 18, Issue 3, 1 March 2002, Pages 413–422, https://doi.org/10.1093/bioinformatics/18.3.413
Usage
select_genes(
dat,
filename,
random_starts = 4,
max_it = 100,
ll_thresh = 8,
min_clust_size = 8,
tol = 1e-04,
start_method = "both",
three = FALSE
)
Arguments
dat |
A matrix or dataframe containing gene expression data. Rows are genes and columns are samples. Must supply one of filename and dat. |
filename |
Name of file containing gene data. Can be either .csv or space separated .dat. Rows are genes and columns are samples. Must supply one of filename and dat. |
random_starts |
The number of random initializations used per gene when fitting mixtures of t-distributions. Initialization uses k-means by default. |
max_it |
The maximum number of iterations per mixture fit. Default value is 100. |
ll_thresh |
The difference in -2 log lambda used as a threshold for selecting between g=1 and g=2 for each gene. Default value is 8, which was chosen arbitrarily in the original paper. |
min_clust_size |
The minimum number of observations per cluster used when fitting mixtures of t-distributions for each gene. Default value is 8. |
tol |
Tolerance value used for detecting convergence of EMMIX fits. |
start_method |
Default value is "both". Can also choose "random" for purely random starts. |
three |
Also test g=2 vs g=3 where appropriate. Defaults to FALSE. |
Value
An EMMIXgene object containing:
stat |
The difference in log-likelihood for g=1 and g=2 for each gene (or for g=2 and g=3 where relevant). |
g |
The selected number of components for each gene. |
it |
The number of iterations for each genes selected fit. |
selected |
An indicator for each genes selected status |
ranks |
selected gene ids ranked by stat |
genes |
A dataframe of selected genes. |
all_genes |
Returns dat or contents of filename. |
Examples
#only run on first 100 genes for speed
alon_sel <- select_genes(alon_data[seq_len(100), ])