R: Simulation of microarray data

sim.data {penalizedSVM}

R Documentation

Simulation of microarray data

Description

Simulation of 'n' samples. Each sample has 'sg' genes, only 'nsg' of them are called significant and have influence on class labels. All other '(ng - nsg)' genes are called ballanced. All gene ratios are drawn from a multivariate normal distribution. There is a posibility to create blocks of highly correlated genes.

Usage

sim.data(n = 256, ng = 1000, nsg = 100,
		 p.n.ratio = 0.5, 
		 sg.pos.factor= 1, sg.neg.factor= -1,
		 # correlation info:
		 corr = FALSE, corr.factor = 0.8,
		 # block info:
		 blocks = FALSE, n.blocks = 6, nsg.block = 1, ng.block = 5, 
		 seed = 123, ...)

Arguments

`n`	number of samples, logistic regression works well if `n>200`!
`ng`	number of genes
`nsg`	number of significant genes
`p.n.ratio`	ratio between positive and negative significant genes (default 0.5)
`sg.pos.factor`	impact factor of positive significant genes on the classifaction, default: 1
`sg.neg.factor`	impact factor of negative significant genes on the classifaction,default: -1
`corr`	are the genes correalted to each other? (default FALSE). see Details
`corr.factor`	correlation factorfor genes, between 0 and 1 (default 0.8)
`blocks`	are blocks of highly correlated genes are allowed? (default FALSE)
`n.blocks`	number of blocks
`nsg.block`	number of significant genes per block
`ng.block`	number of genes per block
`seed`	seed
`...`	additional argument(s)

Details

If no blockes (n.blocks=0 or blocks=FALSE) are defined and corr=TRUE create covarance matrix for all genes! with decrease of correlation : cov(i,j)=cov(j,i)= corr.factor^(i-j)

Value

`x`	matrix of simulated data. Genes in rows and samples in columns
`y`	named vector of class labels
`seed`	seed

Author(s)

Wiebke Werft, Natalia Becker

Examples


my.seed<-123

# 1. simulate 20 samples, with 100 genes in each. Only the first two genes
# have an impact on the class labels.
# All genes are assumed to be i.i.d. 
train<-sim.data(n = 20, ng = 100, nsg = 3, corr=FALSE, seed=my.seed )
print(str(train)) 

# 2. change the proportion between positive and negative significant genes 
#(from 0.5 to 0.8)
train<-sim.data(n = 20, ng = 100, nsg = 10, p.n.ratio = 0.8,  seed=my.seed )
rownames(train$x)[1:15]
# [1] "pos1" "pos2" "pos3" "pos4" "pos5" "pos6" "pos7" "pos8" 
# [2] "neg1" "neg2" "bal1" "bal2" "bal3" "bal4" "bal5"

# 3. assume to have correlation for positive significant genes, 
# negative significant genes and 'balanced' genes separatly. 
train<-sim.data(n = 20, ng = 100, nsg = 10, corr=TRUE, seed=my.seed )
#cor(t(train$x[1:15,]))

# 4. add 6 blocks of 5 genes each and only one significant gene per block.
# all genes in the block are correlated with constant correlation factor
#  corr.factor=0.8 		
train<-sim.data(n = 20, ng = 100, nsg = 6, corr=TRUE, corr.factor=0.8,
			 blocks=TRUE, n.blocks=6, nsg.block=1, ng.block=5, seed=my.seed )
print(str(train)) 
# first block
#cor(t(train$x[1:5,]))
# second block
#cor(t(train$x[6:10,]))

[Package penalizedSVM version 1.1.4 Index]