dsga {GSSTDA}R Documentation

Disease-Specific Genomic Analysis

Description

Disease-Specific Genomic Analysis (dsga). This analysis, developed by Nicolau et al., allows the calculation of the "disease component" of a expression matrix which consists of, through linear models, eliminating the part of the data that is considered normal or healthy and keeping only the component that is due to the disease. It is intended to precede other techniques like classification or clustering. For more information see Disease-specific genomic analysis: identifying the signature of pathologic biology (doi: 10.1093/bioinformatics/btm033).

Usage

dsga(
  full_data,
  survival_time,
  survival_event,
  case_tag,
  control_tag = NA,
  gamma = NA,
  na.rm = TRUE
)

Arguments

full_data

Input matrix whose columns correspond to the patients and rows to the genes.

survival_time

Numerical vector of the same length as the number of columns of full_data. In addition, the patients must be in the same order as in full_data. For the patients whose sample is pathological should be indicated the time between the disease diagnosis and event (death, relapse or other). If the event has not occurred, it should be indicated the time until the end of follow-up. Patients whose sample is from healthy tissue must have an NA value

survival_event

Numerical vector of the same length as the number of columns of full_data. Patients must be in the same order as in full_data. For the the patients with pathological sample should be indicated whether the event has occurred (1) or not (0). Only these values are valid and healthy patients must have an NA value.

case_tag

Character vector of the same length as the number of columns of full_data. Patients must be in the same order as in full_data. It must be indicated for each patient whether its sample is from pathological or healthy tissue. One value should be used to indicate whether the patient's sample is healthy and another value should be used to indicate whether the patient's sample is pathological. The user will then be asked which one indicates whether the patient is healthy. Only two values are valid in the vector in total.

control_tag

Tag of the healthy sample.E.g. "T"

gamma

A parameter that indicates the magnitude of the noise assumed in the flat data matrix for the generation of the Healthy State Model. If it takes the value NA the magnitude of the noise is assumed to be unknown. By default gamma is unknown.

na.rm

logical. If TRUE, NA rows are omitted. If FALSE, an error occurs in case of NA rows. TRUE default option.

Value

A dsga object. It contains: the full_data without NAN's values, the label designated for healthy samples (control_tag), the case_tag vector without NAN's values, the survival_event, the the survival_time the matrix with the normal space (linear space generated from normal tissue samples) and the matrix of the disease components (the transformed full_data matrix from which the normal component has been removed).

Examples


dsga_obj <- dsga(full_data,  survival_time, survival_event, case_tag)

[Package GSSTDA version 1.0.0 Index]