ptycho {ptycho} | R Documentation |
Sample From Posterior Distributions
Description
Generate MCMC samples from posterior distribution. Two interfaces are
provided: ptycho
generates samples for one design matrix and response
matrix while ptycho.all
runs in batch an object generated by
createData
.
Usage
ptycho(X, y, initStates, groups = NULL,
tau.min = 0.01, tau.max = 10, tau.sd = (tau.max - tau.min)/4,
doGPrior = TRUE, doDetPrior = FALSE, prob.varadd = 0.5,
isOmegaFixed = FALSE, omega = NULL, omega.grp = NULL,
probs.grp = NULL, rho.alpha = 10, rho.lambda = rho.alpha,
only.means = FALSE, nburn = 0, nthin = 1, nSavePerChain,
parallel.chains=FALSE, random.seed=NULL)
ptycho.all(data, across=c("none","traits","sites"), doGrpIndicator,
dir.out, nreplicates=NULL, parallel.replicates=FALSE,
doSetSeed=TRUE, ncolumns=NULL, ...)
Arguments
X |
|
y |
|
initStates |
List containing initial states for chains. Each state is a list with components:
|
groups |
To combine information across variants, list containing
Otherwise, |
tau.min , tau.max |
Endpoints of uniform prior distribution on |
tau.sd |
Standard deviation of the Metropolis-Hastings proposal
distribution for |
doGPrior |
Logical indicating whether to use the g-prior for effect sizes |
doDetPrior |
Unsupported; use default value |
prob.varadd |
If |
isOmegaFixed |
Logical indicating whether |
omega |
If
If |
omega.grp |
If |
probs.grp |
Vector containing the probabilities that the
Metropolis-Hastings proposal will add, leave unchanged, or remove,
respectively, a group. If |
rho.alpha , rho.lambda |
Parameters for the Gamma prior distribution on
|
only.means |
If logical, specifies whether to return samples or the running means of the samples. Can also be a vector containing the iterations (after the burn-in interval) at which to save the means. |
nburn |
Number of MCMC samples to make before starting to save samples or to compute means |
nthin |
Interval between saved samples; default value 1 saves all
samples. Unused if |
nSavePerChain |
If |
parallel.chains |
Logical indicating whether to run chains in parallel; see Details. |
random.seed |
Random seed to pass to chain iterator; if |
data |
Data in format output by |
across |
Whether to combine information across traits, sites, or neither |
doGrpIndicator |
Whether to use priors that incorporate |
dir.out |
Directory to which to |
nreplicates |
Vector of replicates to run; if |
parallel.replicates |
Logical indicating whether to run replicates in parallel; see Details. |
doSetSeed |
If |
ncolumns |
Scalar. If |
... |
Additional arguments passed to |
Details
These functions run MCMC sampling from the posterior of the linear regression
models using hierarchical priors described in Stell and Sabatti (2015). The
function ptycho.all
is a wrapper of ptycho
to simplify running
the simulation experiments over many replicates. These functions determine
which priors to use as follows:
Standard spike and slab priors that do not combine information (basic)
Forptycho.all
, argumentacross
is “none” anddoGrpIndicator
isFALSE
.
Forptycho
, argumenty
has one column,groups
isNULL
, andindic.grp
isNULL
or missing in each entry ofinitStates
.Combine information across traits (Across Traits)
Forptycho.all
, argumentacross
is “traits” anddoGrpIndicator
isTRUE
.
Forptycho
, argumenty
hasp > 1
columns,groups
isNULL
, andindic.grp
is a logical vector of lengthp
in each entry ofinitStates
.Combine information across variants (Across Sites)
Forptycho.all
, argumentacross
is “sites” anddoGrpIndicator
isTRUE
.
Forptycho
, argumenty
has one column,groups
specifies how to combine information, andindic.grp
in each entry ofinitStates
is a logical vector of the same length asgroups$group2var
.Combine information across traits incorrectly (Unadjusted)
Forptycho.all
, argumentacross
is “traits” anddoGrpIndicator
isFALSE
.
Forptycho
, argumenty
hasp > 1
columns,groups
isNULL
, andindic.grp
isNULL
in each entry ofinitStates
.
This prior does not properly correct for multiple hypothesis testing and is only included because it is needed to reproduce results in Stell and Sabatti (2015).
Combining information across both phenotypes and variants is planned for a
future release. These functions perform some checks for compatibility of
X
, y
, groups
, and initStates
; but invalid input
could lead to unpredictable behavior. Singular X
can result in an error;
even strongly correlated covariates can cause difficulties as described by Stell
and Sabatti (2015).
The simplest way to run the simulations in Stell and Sabatti (2015) is, for example,
data <- createPubData("pleiotropy") ptycho.all(data=data, across="traits", doGrpIndicator=TRUE, dir.out="/path/to/output/dir/", only.means=50000*(1:10), nburn=10000) ptycho.all(data=data, across="sites", doGrpIndicator=TRUE, dir.out="/path/to/another/dir/", groups=createGroupsSim(G=10, ncol(data$X)), only.means=50000*(1:10), nburn=10000)
With these calls, the replicates run sequentially and so do the chains of the MCMC sampler; the results will be reproducible because the random seed is set for each replicate.
Parallelization is implemented via the foreach package. The user must not only have it installed but also an appropriate parallel backend, which must be registered. To run chains in parallel using the doMC, for example,
data(ptychoIn) G <- 2; p <- ncol(ptychoIn$X) groups <- createGroupsSim(G, p) state <- list(list(indic.grp=rep(FALSE,G), indic.var=matrix(FALSE,nrow=p,ncol=1), tau=1), list(indic.grp=rep(TRUE,G), indic.var=matrix(TRUE,nrow=p,ncol=1), tau=1)) require(doMC) registerDoMC(length(state)) ptychoOut <- ptycho(X=ptychoIn$X, y=ptychoIn$replicates[[1]]$y, groups=groups, initStates=state, only.means=100*seq_len(5), parallel.chains=TRUE)
The results would not be reproducible, however, even if one set the random seed
before calling ptycho
. For reproducible results, pass the random seed
in the call to ptycho
, which requires that the doRNG package is
also installed. Running the chains in parallel when calling ptycho.all
also requires the option parallel.chains=TRUE
, which uses doRNG
unless doSetSeed=FALSE
. By default, one of the chains starts with all
variants in the model, so that chain takes much longer to run than do the other
chains. Consequently, when running multiple replicates via ptycho.all
,
much greater time savings can be achieved by running the replicates in parallel
with, for example,
data <- createPubData("pleiotropy") require(doMC) registerDoMC(8) ptycho.all(data=data, across="traits", doGrpIndicator=TRUE, dir.out="/path/to/output/dir/", only.means=50000*(1:10), nburn=10000)
In this case, the default behavior of reproducible results does not require doRNG because the seed is set after each parallel worker is created.
We conclude this description with a discussion of the running time of the MCMC
sampler. Our actual data has 5335 subjects, 764 variants and three traits. An
mcmc.list
containing 50,000 samples for each of four chains can take
about 5~GB. Running chains in parallel, it takes less than an hour (on a Linux
computer with 2.6 GHz processors) to perform 510,000 samples per chain. The run
time depends primarily on the number of entries that are TRUE
in the
sampled indic.var
matrices; increasing this will increase run times. A
chain that initially has all entries of indic.var
set to TRUE
will
take longer than one where the model is initially empty. Priors that inflate
the posterior expectation of indic.var[j,k]
(such as combining
information across responses without using indic.grp
) will also take
longer.
Value
The results of ptycho.all
are written to files by save
.
For priors that use only one response, the output for replicate r
and
column c
will be written to ‘rpl<r>col<c>.Rdata’ in the directory
specified by dir.out
. For priors that use multiple responses,
ptycho
is called only once for each replicate, and the file name will
be ‘rpl<r>col1.Rdata’. The object in each such file has the name
smpl
and is the value of a call to ptycho
. The format of these
objects depends upon the argument only.means
. In all cases, however,
it has attribute params
set to a list containing most of the arguments
in the call to ptycho
.
If only.means
is FALSE
, then ptycho
returns an
mcmc.list
whose length is the same as the length of
initStates
. Each entry in this list is an mcmc
object
with nSavePerChain
rows and a column for each entry of indic.var
and indic.grp
plus a column for tau
.
Otherwise, ptycho
returns an object of class ptycho
, which is
actually a matrix. The matrix has a column for each sampled indicator
variable, for tau
and its square (so that its variance can be
computed), and for the chain and iteration numbers. If only.means
is
TRUE
, then each row contains the means of the samples in one chain and
there will be length(initStates) * nSavePerChain
rows.
If
only.means
is a vector, then there will be
length(initStates) * length(only.means)
rows.
Author(s)
Laurel Stell and Chiara Sabatti
Maintainer: Laurel Stell <lstell@stanford.edu>
References
Stell, L. and Sabatti, C. (2015) Genetic variant selection: learning across traits and sites, arXiv:1504.00946.
See Also
createData
for simulating input data.
checkConvergence
and PosteriorStatistics for analyzing
output of ptycho
.
Data describes tinysim
in example below as well as an object
created with ptycho
.
Examples
data(tinysim)
# Use replicate 4.
X <- tinysim$X; p <- ncol(X); nr <- 4
# COMBINE INFORMATION ACROSS RESPONSES
Y <- tinysim$replicates[[nr]]$y; q <- ncol(Y)
# Run 2 chains.
state <- list(list(indic.grp=rep(FALSE,p),
indic.var=matrix(FALSE,nrow=p,ncol=q), tau=1),
list(indic.grp=rep(TRUE,p),
indic.var=matrix(TRUE,nrow=p,ncol=q), tau=1))
# In each chain, discard first 10 burn-in samples, then generate
# 100 samples and save running means after every 20 samples.
smpl.ph <- ptycho(X=X, y=Y, initStates=state, only.means=20*(1:5),
nburn=10)
# COMBINE INFORMATION ACROSS VARIANTS
# Use two groups of variants.
G <- 2; groups <- createGroupsSim(G, p)
# Run 2 chains.
state <- list(list(indic.grp=rep(FALSE,G),
indic.var=matrix(FALSE,nrow=p,ncol=1), tau=1),
list(indic.grp=rep(TRUE,G),
indic.var=matrix(TRUE,nrow=p,ncol=1), tau=1))
# Use response 3.
y <- tinysim$replicates[[nr]]$y[,3,drop=FALSE]
smpl.var <- ptycho(X=X, y=y, groups=groups, initStates=state,
only.means=c(20*(1:5)), nburn=10, nthin=1)