R: Methods for selecting genes

geneSelection {Patterns}

R Documentation

Methods for selecting genes

Description

Selection of differentially expressed genes.

Usage

## S4 method for signature 'omics_array,omics_array,numeric'
geneSelection(
  x,
  y,
  tot.number,
  data_log = TRUE,
  wanted.patterns = NULL,
  forbidden.patterns = NULL,
  peak = NULL,
  alpha = 0.05,
  Design = NULL,
  lfc = 0
)

## S4 method for signature 'list,list,numeric'
geneSelection(
  x,
  y,
  tot.number,
  data_log = TRUE,
  alpha = 0.05,
  cont = FALSE,
  lfc = 0,
  f.asso = NULL,
  return.diff = FALSE
)

## S4 method for signature 'omics_array,numeric'
genePeakSelection(
  x,
  peak,
  y = NULL,
  data_log = TRUE,
  durPeak = c(1, 1),
  abs_val = TRUE,
  alpha_diff = 0.05
)

Arguments

`x`	either a omics_array object or a list of omics_array objects. In the first case, the omics_array object represents the stimulated measurements. In the second case, the control unstimulated data (if present) should be the first element of the list.
`y`	either a omics_array object or a list of strings. In the first case, the omics_array object represents the stimulated measurements. In the second case, the list is the way to specify the contrast: First element: condition, condition&time or pattern. The condition specification is used when the overall is to compare two conditions. The condition&time specification is used when comparing two conditions at two precise time points. The pattern specification allows to decide which time point should be differentially expressed. Second element: a vector of length 2. The two conditions which should be compared. If a condition is used as control, it should be the first element of the vector. However, if this control is not measured throught time, the option cont=TRUE should be used. Third element: depends on the first element. It is no needed if condition has been specified. If condition&time has been specified, then this is a vector containing the time point at which the comparison should be done. If pattern has been specified, then this is a vector of 0 and 1 of length T, where T is the number of time points. The time points with desired differential expression are provided with 1.
`tot.number`	an integer. The number of selected genes. If tot.number <0 all differentially genes are selected. If tot.number > 1, tot.number is the maximum of diffenrtially genes that will be selected. If 0<tot.number<1, tot.number represents the proportion of diffenrentially genes that are selected.
`data_log`	logical (default to TRUE); should data be logged ?
`wanted.patterns`	a matrix with wanted patterns [only for geneSelection].
`forbidden.patterns`	a matrix with forbidden patterns [only for geneSelection].
`peak`	interger. At which time points measurements should the genes be selected [optionnal for geneSelection].
`alpha`	float; the risk level. Default to 'alpha=0.05'
`Design`	the design matrix of the experiment. Defaults to 'NULL'.
`lfc`	log fold change value used in limma's 'topTable'. Defaults to 0.
`cont`	use contrasts. Defaults to 'FALSE'.
`f.asso`	function used to assess the association between the genes. The default value 'NULL' implies the use of the usual 'mean' function.
`return.diff`	[FALSE] if TRUE then the function returns the stimulated expression of the differentially expressed genes
`durPeak`	vector of size 2 (default to c(1,1)) ; the first elements gives the length of the peak at the left, the second at the right. [only for genePeakSelection]
`abs_val`	logical (default to TRUE) ; should genes be selected on the basis of their absolute value expression ? [only for genePeakSelection]
`alpha_diff`	float; the risk level

Value

A omics_array object.

Author(s)

Frédéric Bertrand , Myriam Maumy-Bertrand.

Examples



 if(require(CascadeData)){
	data(micro_US)
	micro_US<-as.omics_array(micro_US,time=c(60,90,210,390),subject=6)
	data(micro_S)
	micro_S<-as.omics_array(micro_S,time=c(60,90,210,390),subject=6)

  #Basically, to find the 50 more significant expressed genes you will use:
  Selection_1<-geneSelection(x=micro_S,y=micro_US,
  tot.number=50,data_log=TRUE)
  summary(Selection_1)
  
  #If we want to select genes that are differentially 
  #at time t60 or t90 :
  Selection_2<-geneSelection(x=micro_S,y=micro_US,tot.number=30,
  wanted.patterns=
  rbind(c(0,1,0,0),c(1,0,0,0),c(1,1,0,0)))
  summary(Selection_2)

  #To select genes that have a differential maximum of expression at a specific time point.
  
  Selection_3<-genePeakSelection(x=micro_S,y=micro_US,peak=1,
  abs_val=FALSE,alpha_diff=0.01)
  summary(Selection_3)
  }

if(require(CascadeData)){
data(micro_US)
micro_US<-as.omics_array(micro_US,time=c(60,90,210,390),subject=6)
data(micro_S)
micro_S<-as.omics_array(micro_S,time=c(60,90,210,390),subject=6)
#Genes with differential expression at t1
Selection1<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(1,0,0,0)))
#Genes with differential expression at t2
Selection2<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,1,0,0)))
#Genes with differential expression at t3
Selection3<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,0,1,0)))
#Genes with differential expression at t4
Selection4<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,0,0,1)))
#Genes with global differential expression 
Selection5<-geneSelection(x=micro_S,y=micro_US,20)

#We then merge these selections:
Selection<-unionOmics(list(Selection1,Selection2,Selection3,Selection4,Selection5))
print(Selection)

#Prints the correlation graphics Figure 4:
summary(Selection,3)

##Uncomment this code to retrieve geneids.
#library(org.Hs.eg.db)
#
#ff<-function(x){substr(x, 1, nchar(x)-3)}
#ff<-Vectorize(ff)
#
##Here is the function to transform the probeset names to gene ID.
#
#library("hgu133plus2.db")
#
#probe_to_id<-function(n){  
#x <- hgu133plus2SYMBOL
#mp<-mappedkeys(x)
#xx <- unlist(as.list(x[mp]))
#genes_all = xx[(n)]
#genes_all[is.na(genes_all)]<-"unknown"
#return(genes_all)
#}
#Selection@name<-probe_to_id(Selection@name)
  }

[Package Patterns version 1.5 Index]