| sim.data {penalizedSVM} | R Documentation | 
Simulation of microarray data
Description
Simulation of 'n' samples. Each sample has 'sg' genes, only 'nsg' of them are called significant and have influence on class labels. All other '(ng - nsg)' genes are called ballanced. All gene ratios are drawn from a multivariate normal distribution. There is a posibility to create blocks of highly correlated genes.
Usage
sim.data(n = 256, ng = 1000, nsg = 100,
		 p.n.ratio = 0.5, 
		 sg.pos.factor= 1, sg.neg.factor= -1,
		 # correlation info:
		 corr = FALSE, corr.factor = 0.8,
		 # block info:
		 blocks = FALSE, n.blocks = 6, nsg.block = 1, ng.block = 5, 
		 seed = 123, ...)
Arguments
n | 
  number of samples, logistic regression works well if   | 
ng | 
 number of genes  | 
nsg | 
 number of significant genes  | 
p.n.ratio | 
 ratio between positive and negative significant genes (default 0.5)  | 
sg.pos.factor | 
 impact factor of positive significant genes on the classifaction, default: 1  | 
sg.neg.factor | 
 impact factor of negative significant genes on the classifaction,default: -1  | 
corr | 
 are the genes correalted to each other? (default FALSE). see Details  | 
corr.factor | 
 correlation factorfor genes, between 0 and 1 (default 0.8)  | 
blocks | 
 are blocks of highly correlated genes are allowed? (default FALSE)  | 
n.blocks | 
 number of blocks  | 
nsg.block | 
 number of significant genes per block  | 
ng.block | 
 number of genes per block  | 
seed | 
 seed  | 
... | 
 additional argument(s)  | 
Details
If no blockes (n.blocks=0 or blocks=FALSE) are defined and corr=TRUE
create covarance matrix for all genes! with decrease of correlation :  cov(i,j)=cov(j,i)= corr.factor^(i-j)
Value
x | 
 matrix of simulated data. Genes in rows and samples in columns  | 
y | 
 named vector of class labels  | 
seed | 
 seed  | 
Author(s)
Wiebke Werft, Natalia Becker
See Also
Examples
my.seed<-123
# 1. simulate 20 samples, with 100 genes in each. Only the first two genes
# have an impact on the class labels.
# All genes are assumed to be i.i.d. 
train<-sim.data(n = 20, ng = 100, nsg = 3, corr=FALSE, seed=my.seed )
print(str(train)) 
# 2. change the proportion between positive and negative significant genes 
#(from 0.5 to 0.8)
train<-sim.data(n = 20, ng = 100, nsg = 10, p.n.ratio = 0.8,  seed=my.seed )
rownames(train$x)[1:15]
# [1] "pos1" "pos2" "pos3" "pos4" "pos5" "pos6" "pos7" "pos8" 
# [2] "neg1" "neg2" "bal1" "bal2" "bal3" "bal4" "bal5"
# 3. assume to have correlation for positive significant genes, 
# negative significant genes and 'balanced' genes separatly. 
train<-sim.data(n = 20, ng = 100, nsg = 10, corr=TRUE, seed=my.seed )
#cor(t(train$x[1:15,]))
# 4. add 6 blocks of 5 genes each and only one significant gene per block.
# all genes in the block are correlated with constant correlation factor
#  corr.factor=0.8 		
train<-sim.data(n = 20, ng = 100, nsg = 6, corr=TRUE, corr.factor=0.8,
			 blocks=TRUE, n.blocks=6, nsg.block=1, ng.block=5, seed=my.seed )
print(str(train)) 
# first block
#cor(t(train$x[1:5,]))
# second block
#cor(t(train$x[6:10,]))