R: Gene selection and filter function

gene_selection {GSSTDA}

R Documentation

Gene selection and filter function

Description

Gene selection and calculation of filter function values. After fitting a Cox proportional hazard model to each gene, this function makes a selection of genes according to both their variability within the database and their relationship with survival. Subsequently, with the genes selected, the values of the filtering functions are calculated for each patient. The filter function allows to summarise each vector of each individual in a single data. This function takes into account the survival associated with each gene. In particular, the implemented filter function performs the vector magnitude in the Lp norm (as well as k powers of this magnitude) of the vector resulting of weighting each element of the column vector by the Z score obtained in the cox proportional hazard model.

Usage

gene_selection(data_object, gen_select_type, percent_gen_select, na.rm = TRUE)

Arguments

`data_object`	Object with: full_data Input matrix whose columns correspond to the patients and rows to the genes. survival_time Numerical vector of the same length as the number of columns of `full_data`. In addition, the patients must be in the same order as in `full_data`. For the patients whose sample is pathological should be indicated the time between the disease diagnosis and event (death, relapse or other). If the event has not occurred, it should be indicated the time until the end of follow-up. Patients whose sample is from healthy tissue must have an NA value survival_event Numerical vector of the same length as the number of columns of `full_data`. Patients must be in the same order as in `full_data`. For the the patients with pathological sample should be indicated whether the event has occurred (1) or not (0). Only these values are valid and healthy patients must have an NA value. case_tag Character vector of the same length as the number of columns of `full_data`. Patients must be in the same order as in `full_data`. It must be indicated for each patient whether its sample is from pathological or healthy tissue. One value should be used to indicate whether the patient's sample is healthy and another value should be used to indicate whether the patient's sample is pathological. The user will then be asked which one indicates whether the patient is healthy. Only two values are valid in the vector in total.
`gen_select_type`	Option. Options on how to select the genes to be used in the mapper. Select the "Abs" option, which means that the genes with the highest absolute value are chosen, or the "Top_Bot" option, which means that half of the selected genes are those with the highest value (positive value, i.e. worst survival prognosis) and the other half are those with the lowest value (negative value, i.e. best prognosis). "Top_Bot" default option.
`percent_gen_select`	Percentage (from zero to one hundred) of genes to be selected to be used in mapper. 10 default option.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted. If `FALSE`, an error occurs in case of `NA` rows. TRUE default option.

Value

A gene_selection_object. It contains:

the full_data without NAN's values (data)
the cox_all_matrix (a matrix with the results of the application of proportional hazard models: with the regression coefficients, the odds ratios, the standard errors of each coefficient, the Z values (coef/se_coef) and the p-values for each Z value)
a vector with the name of the selected genes
the matrix of disease components with only the rows of the selected genes (genes_disease_component)
and the vector of the values of the filter function.

Examples


data_object <- list("full_data" = full_data, "survival_time" = survival_time,
"survival_event" = survival_event, "case_tag" = case_tag)
class(data_object) <- "data_object"
gene_selection_obj <- gene_selection(data_object,
gen_select_type ="top_bot", percent_gen_select=10)

[Package GSSTDA version 1.0.0 Index]