R: Binning of z-scores and estimation of the probabilities in...

ztobins {repfdr}

R Documentation

Binning of z-scores and estimation of the probabilities in each bin for the null and non-null states.

Description

For each study, the function discretizes the z-scores into bins and estimates the probabilities in each bin for the null and non-null states.

The function can plot diagnostic plots (disabled by default) for model fit. These should be monitored for misfit of model to data, before using function output in repfdr. See description of diagnostic plots below.

Usage

ztobins(zmat, n.association.status = 3, n.bins = 120, type = 0, df = 7,
                    central.prop = 0.5,
                    pi0=NULL,plot.diagnostics = FALSE,
                    trim.z=FALSE,trim.z.upper = 8,trim.z.lower = -8,
                    force.bin.number = FALSE,
                    pi.using.plugin = FALSE, pi.plugin.lambda = 0.05)

Arguments

`zmat`	Matrix of z-scores of the features (in rows) in each study (columns).
`n.association.status`	either 2 for no-association\association or 3 for no-associtation\negative-association\positive-association.
`n.bins`	Number of bins in the discretization of the z-score axis (the number of bins is `n.bins - 1`). If the number of z-scores per study is small, we set `n.bins` to a number lower than the default of 120 (about equals to the square root of the number of z-scores). To override the bin number cap (and create a discretization of the data that is sparse), use the `force.bin.number = TRUE` argument.
`type`	Type of fitting used for f; 0 is a natural spline, 1 is a polynomial, in either case with degrees of freedom `df` (so total degrees of freedom including the intercept is `df+1`).
`df`	Degrees of freedom for fitting the estimated density f(z).
`central.prop`	Central proportion of the z-scores used like the area of zero-assumption to estimate pi0.
`pi0`	Sets argument for estimation of proportion of null hypotheses. Default value is NULL (automatic estimation of pi0) for every study. Second option is to supply vector of values between 0 and 1 (with length of the number of studies/ columns of `zmat`. These values will be used for pi0.
`plot.diagnostics`	If set to `TRUE`, will show disgnostics plots for density estimation for each study. First plot is a histogram of counts for each bin (Displayed as white bars), along with fitted density in green. Pink bars represent the observed number of counts in each bins, minus the expected number of null hypotheses by the model (truncated at zero). Red and Orange dashed lines represent the estimated densities for non null distributions fitted by the spline. A blue dashed line represents the density component of Z scores for null SNPS, N(0,1). A second plot is the Normal Q-Q plot of Zscores, converted using `qnorm` to the normal scale. A valid graph should coincide with a the linear fit displayed. A misfit with the linear plot could indicate either a null distribution which is not standard normal (a problem), or an extreme number of non null P-Values (Signal is not sparse, output is still valid). A black dashed line markes the expected fit for the standard normal distribution (with a single black dot for the (0,0) point). If the linear fit for the Q-Q plot (red line) does not match the dashed black line, the null distribution of the data is not standard normal. Misfit in these two plots should be investigated by the user, before using output in `repfdr` Default value is `False`.
`trim.z`	If set to `TRUE`, Z scores above `trim.z.upper` or below `trim.z.lower` will be trimmed at their respective limits. Default value if `FALSE`
`trim.z.upper`	Upper bound for trimming Z scores. Default value is 8
`trim.z.lower`	Lower bound for trimming Z scores. Default value is -8
`force.bin.number`	Set to `T` to be able to create a discretization with `n.bins>sqrt(nrow(zmat))`.
`pi.using.plugin`	Logical flag indicating whether estimation of the number of null hypotheses should be done using the plugin estimator.(Default is `F`). The plugin estimator is `(sum(Pvalues > pi.plugin.lambda) + 1)/(m * (1-pi.plugin.lambda))` where `m` is the number of P-values.
`pi.plugin.lambda`	Parameter used for estimation of proportion of null hypotheses, for one sided tests. Default value is 0.05. This should be set to the type 1 error used for hypothesis testing.

Details

This utility function outputs the first two arguments to be input in the main function repfdr.

Value

A list with:

`pdf.binned.z`	A 3-dimensional array which contains for each study (first dimension), the probabilities of a z-score to fall in the bin (second dimension), under each hypothesis status (third dimension). The third dimension can be of size 2 or 3, depending on the number of association states: if the association can be either null or only in one direction, the dimension is 2; if the association can be either null, or positive, or negative, the dimension is 3.
`binned.z.mat`	A matrix of the bin numbers for each the z-scores (rows) in each study (columns).
`breaks.matrix`	A matrix with `n.bins + 1` rows and ncol(zmat) columns, representing for each study the discretization chosed. Values are the between bin breaks. First and last values are the edges of the outmost bins.
`df`	Number of degrees of freedom, used for spline fitting of density.
`proportions`	Matrix with `n.association.status` rows, and `ncol(zmat)` columns, giving the estimated proportion of each component, for each study.
`PlotWarnings`	Vector of size `ncol{zmat}`, keeping the warnings given for each study (available here, in the plots for each study and printed to console). With no warnings given for study, value is `NA`

Examples


# Simulated example using both the central proportion estimator 
# and the plug in estimator for the proportion of null hypotheses:

set.seed(1)
p = 10000
p1 = 300
z1 = (rnorm(p))
z2 = (rnorm(p))
temp = rnorm(p1, 3.5,0.5)
z1[1:p1] = temp + rnorm(p1,0,0.2)
z2[1:p1] = temp + rnorm(p1,0,0.2)

z1.abs = abs(z1)
z2.abs = abs(z2)
plot(z1,z2)
hist(z1)
hist(z2)

zmat.example = cbind(z1,z2)

ztobins.res = ztobins(zmat.example,
                      plot.diagnostics = TRUE)
ztobins.res$proportions

ztobins.res.plugin.estimator = ztobins(zmat.example,
                           pi.using.plugin = TRUE,
                           plot.diagnostics = TRUE)

ztobins.res.plugin.estimator$proportions

## Not run: 

# three association states case (H in {-1,0,1}):
download.file('http://www.math.tau.ac.il/~ruheller/repfdr_RData/zmat.RData',destfile = "zmat.RData")
load(file = "zmat.RData")

input.to.repfdr3 <- ztobins(zmat, 3, df = 15)
pbz    <- input.to.repfdr3$pdf.binned.z
bz     <- input.to.repfdr3$binned.z.mat

# two association states case (H in {0,1}):
data(zmat_sim)

input.to.repfdr <- ztobins(zmat_sim, 2, n.bins = 100 ,plot.diagnostics = T)
pbz_sim    <- input.to.repfdr$pdf.binned.z
bz_sim     <- input.to.repfdr$binned.z.mat

## End(Not run)

[Package repfdr version 1.2.3 Index]