normalizeThis {wrMisc} | R Documentation |
Normalize Data In Various Modes
Description
Generic normalization of 'dat' (by columns), multiple methods may be applied. The choice of normalization procedures must be done with care, plotting the data before and after normalization may be critical to understandig the initial data structure and the effect of the procedure applied. Inappropriate methods chosen may render interpretation of (further) results incorrect.
Usage
normalizeThis(
dat,
method = "mean",
refLines = NULL,
refGrp = NULL,
mode = "proportional",
trimFa = NULL,
minQuant = NULL,
sparseLim = 0.4,
nCombin = 3,
omitNonAlignable = FALSE,
maxFact = 10,
quantFa = NULL,
expFa = NULL,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
Arguments
dat |
matrix or data.frame of data to get normalized |
method |
(character) may be "mean","median","NULL","none", "trimMean", "rowNormalize", "slope", "exponent", "slope2Sections", "vsn"; When |
refLines |
(NULL or numeric) allows to consider only specific lines of 'dat' when determining normalization factors (all data will be normalized) |
refGrp |
Only the columns indicated will be used as reference, default all columns (integer or colnames) |
mode |
(character) may be "proportional", "additive";
decide if normalizatio factors will be applies as multiplicative (proportional) or additive; for log2-omics data |
trimFa |
(numeric, length=1) additional parameters for trimmed mean |
minQuant |
(numeric) only used with |
sparseLim |
(integer) only used with |
nCombin |
(NULL or integer) only used with |
omitNonAlignable |
(logical) only used with |
maxFact |
(numeric, length=2) only used with |
quantFa |
(numeric, length=2) additional parameters for quantiles to use with method='slope' |
expFa |
(numeric, length=1) additional parameters for method='exponent' |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
Details
In most cases of treating 'Omics'-data one works with the hypothesis that there are no global changes in the structure of all data/columns
Under this htpothesis it is very common to assume the the median (via the argument method
) of all samples (ie columns) should remain constant.
For examples samples/columns with less signal will be considered as having received 'accidentally' less material (eg due to the imprecision when transfering very small amounts of liquid samples).
In consequence, a sample having received only 95
Thus, all measures will be multiplied by 1/0.95 (apr 1.053) to compensate for supposed lack of staring material.
With the analysis of 'Omics'-data it is very common to work with data on log-scale.
In this case the argument mode
should be set to additive
, since adding a constant factor to log-data corresponds to a multiplicative factor on regular scale
Please note that (at this point) the methods 'slope', 'exponent', 'slope2Sections' and 'vsn' don't distinguish between additive and proportional modes, but take take the data 'as is'
(you may look at the original documenation for more details, see exponNormalize
, adjBy2ptReg
, justvsn
).
Normalization using method="rowNormalize"
runs rowNormalize
from this package.
In this case, the working hypothesis is, that all values in each row are expected to be the same.
This method could be applied when all series of values (ie columns) are replicate measurements of the same sample.
THere is also an option for treating sparse data (see argument sparseLim
), which may, hovere, consume much more comptational ressources,
in particular, when the value nCombin
is low (compared to the number of samples/columns).
Normalization using method="vsn"
runs justvsn
from vsn
(this requires a minimum of 42 rows of input-data and having the Bioconductor package vsn installed).
Note : Depending on the procedure chosen, the normalized data may appear on a different scale.
Value
This function returns a matrix of normalized data (same dimensions as input)
See Also
rowNormalize
, exponNormalize
, adjBy2ptReg
, justvsn
Examples
set.seed(2015); rand1 <- round(runif(300)+rnorm(300,0,2),3)
dat1 <- cbind(ser1=round(100:1+rand1[1:100]), ser2=round(1.2*(100:1+rand1[101:200])-2),
ser3=round((100:1 +rand1[201:300])^1.2-3))
dat1 <- cbind(dat1, ser4=round(dat1[,1]^seq(2,5,length.out=100)+rand1[11:110],1))
dat1[dat1 <1] <- NA
summary(dat1)
dat1[c(1:5,50:54,95:100),]
no1 <- normalizeThis(dat1, refGrp=1:3, meth="mean")
no2 <- normalizeThis(dat1, refGrp=1:3, meth="trimMean", trim=0.4)
no3 <- normalizeThis(dat1, refGrp=1:3, meth="median")
no4 <- normalizeThis(dat1, refGrp=1:3, meth="slope", quantFa=c(0.2,0.8))
dat1[c(1:10,91:100),]
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs") # high
cor(dat1[,4],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs") # bad
cor(dat1[c(1:10,91:100),4],rowMeans(dat1[c(1:10,91:100),1:2],na.rm=TRUE),use="complete.obs")
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE)^ (1/seq(2,5,length.out=100)),use="complete.obs")