| stan {rstan} | R Documentation | 
Fit a model with Stan
Description
 Fit a model defined in the Stan modeling language and
return the fitted result as an instance of
Fit a model defined in the Stan modeling language and
return the fitted result as an instance of stanfit.
Usage
stan(file, model_name = "anon_model", model_code = "", fit = NA,
  data = list(), pars = NA,
  chains = 4, iter = 2000, warmup = floor(iter/2), thin = 1,
  init = "random", seed = sample.int(.Machine$integer.max, 1),
  algorithm = c("NUTS", "HMC", "Fixed_param"), 
  control = NULL, sample_file = NULL, diagnostic_file = NULL,
  save_dso = TRUE, verbose = FALSE, include = TRUE,
  cores = getOption("mc.cores", 1L),
  open_progress = interactive() && !isatty(stdout()) &&
                  !identical(Sys.getenv("RSTUDIO"), "1"),
  ...,
  boost_lib = NULL, eigen_lib = NULL
  )
Arguments
| file | The path to the Stan program to use.
 A model may also be specified directly as a character string using the
 The  | 
| model_code | A character string either containing the model definition or the name of
a character string object in the workspace. This argument is used only
if arguments  | 
| fit | An instance of S4 class  | 
| model_name | A string to use as the name of the model; defaults
to  | 
| data | A named  | 
| pars | A character vector specifying parameters of interest to be saved.
The default is to save all parameters from the model.
If  | 
| include | Logical scalar defaulting to  | 
| iter | A positive integer specifying the number of iterations for each chain (including warmup). The default is 2000. | 
| warmup | A positive integer specifying the number of warmup (aka burnin)
iterations per chain. If step-size adaptation is on (which it is by default),
this also controls the number of iterations for which adaptation is run (and
hence these warmup samples should not be used for inference). The number of
warmup iterations should be smaller than  | 
| chains | A positive integer specifying the number of Markov chains. The default is 4. | 
| cores | The number of cores to use when executing the Markov chains in parallel.
The default is to use the value of the  | 
| thin | A positive integer specifying the period for saving samples. The default is 1, which is usually the recommended value. Unless your posterior distribution takes up too much memory we do not recommend thinning as it throws away information. The tradition of thinning when running MCMC stems primarily from the use of samplers that require a large number of iterations to achieve the desired effective sample size. Because of the efficiency (effective samples per second) of Hamiltonian Monte Carlo, rarely should this be necessary when using Stan. | 
| init | Specification of initial values for all or some parameters.
Can be the digit  
 When specifying initial values via a  | 
| seed | The seed for random number generation. The default is generated
from 1 to the maximum integer supported by R on the machine. Even if
multiple chains are used, only one seed is needed, with other chains having
seeds derived from that of the first chain to avoid dependent samples.
When a seed is specified by a number,  Using R's  | 
| algorithm | One of the sampling algorithms that are implemented in Stan.
The default and preferred algorithm is  | 
| sample_file | An optional character string providing the name of a file.
If specified the draws for all parameters and other saved quantities
will be written to the file. If not provided, files are not created.
When the folder specified is not writable,  | 
| diagnostic_file | An optional character string providing the name of a file.
If specified the diagnostics data for all parameters will be written
to the file. If not provided, files are not created. When the folder specified
is not writable,  | 
| save_dso | Logical, with default  | 
| verbose | 
 | 
| control | A named  
 In addition, algorithm HMC (called 'static HMC' in Stan) and NUTS share the following parameters: 
 For algorithm NUTS, we can also set: 
 For algorithm HMC, we can also set: 
 For  
 | 
| open_progress | Logical scalar that only takes effect if
 | 
| ... | Other optional parameters: 
 
 
 
 
 
 Deprecated:  
 | 
| boost_lib | The path for an alternative version of the Boost C++ to use instead of the one in the BH package. | 
| eigen_lib | The path for an alternative version of the Eigen C++ library to the one in RcppEigen. | 
Details
The stan function does all of the work of fitting a Stan model and
returning the results as an instance of stanfit. The steps are
roughly as follows:
- Translate the Stan model to C++ code. ( - stanc)
- Compile the C++ code into a binary shared object, which is loaded into the current R session (an object of S4 class - stanmodelis created). (- stan_model)
- Draw samples and wrap them in an object of S4 class - stanfit. (- sampling)
The returned object can be used with methods such as print,
summary, and plot to inspect and retrieve the results of
the fitted model.
stan can also be used to sample again from a fitted model under
different settings (e.g., different iter, data, etc.) by
using the fit argument to specify an existing stanfit object.
In this case, the compiled C++ code for the model is reused.
Value
An object of S4 class stanfit. However, if cores > 1
and there is an error for any of the chains, then the error(s) are printed. If
all chains have errors and an error occurs before or during sampling, the returned
object does not contain samples. But the compiled binary object for the
model is still included, so we can reuse the returned object for another
sampling.
Passing data to Stan
The data passed to stan are preprocessed before being passed to Stan.
If data is not a character vector, the data block of the Stan program
is parsed and R objects of the same name are searched starting from the
calling environment. Then, if data is list-like but not a data.frame
the elements of data take precedence. This behavior is similar to how
a formula is evaluated by the lm function when data is
supplied. In general, each R object being passed to Stan should be either a numeric
vector (including the special case of a 'scalar') or a numeric array (matrix).
The first exception is that an element can be a logical vector: TRUE's
are converted to 1 and FALSE's to 0.
An element can also be a data frame or a specially structured list (see
details below), both of which will be converted into arrays in the
preprocessing.  Using a specially structured list is not
encouraged though it might be convenient sometimes; and when in doubt, just
use arrays.
This preprocessing for each element mainly includes the following:
- Change the data of type from - doubleto- integerif no accuracy is lost. The main reason is that by default, R uses- doubleas data type such as in- a <- 3. But Stan will not read data of type- intfrom- realand it reads data from- intif the data type is declared as- real.
- Check if there is - NAin the data. Unlike BUGS, Stan does not allow missing data. Any- NAvalues in supplied data will cause the function to stop and report an error.
- Check data types. Stan allows only numeric data, that is, doubles, integers, and arrays of these. Data of other types (for example, characters and factors) are not passed to Stan. 
- Check whether there are objects in the data list with duplicated names. Duplicated names, if found, will cause the function to stop and report an error. 
- Check whether the names of objects in the data list are legal Stan names. If illegal names are found, it will stop and report an error. See (Cmd)Stan's manual for the rules of variable names. 
- When an element is of type - data.frame, it will be converted to- matrixby function- data.matrix.
- When an element is of type - list, it is supposed to make it easier to pass data for those declared in Stan code such as- "vector[J] y1[I]"and- "matrix[J,K] y2[I]". Using the latter as an example, we can use a list for- y2if the list has "I" elements, each of which is an array (matrix) of dimension "J*K". However, it is not possible to pass a list for data declared such as- "vector[K] y3[I,J]"; the only way for it is to use an array with dimension "I*J*K". In addition, technically a- data.framein R is also a list, but it should not be used for the purpose here since a- data.framewill be converted to a matrix as described above.
Stan treats a vector of length 1 in R as a scalar.  So technically
if, for example, "array[1] real y;" is defined in the data block, an array
such as "y = array(1.0, dim = 1)" in R should be used. This
is also the case for specifying initial values since the same
underlying approach for reading data from R in Stan is used, in which
vector of length 1 is treated as a scalar.
In general, the higher the optimization level is set, the faster the generated binary code for the model runs, which can be set in a Makevars file. However, the binary code generated for the model runs fast by using a higher optimization level at the cost of longer times to compile the C++ code.
References
The Stan Development Team Stan Modeling Language User's Guide and Reference Manual. https://mc-stan.org.
The Stan Development Team CmdStan Interface User's Guide. https://mc-stan.org.
See Also
- The package vignettes for an example of fitting a model and accessing the contents of - stanfitobjects (https://mc-stan.org/rstan/articles/).
-  stancfor translating model code in Stan modeling language to C++,samplingfor sampling, andstanfitfor the fitted results.
-  as.array.stanfitandextractfor extracting samples fromstanfitobjects.
Examples
## Not run: 
#### example 1
library(rstan)
scode <- "
parameters {
  array[2] real y;
}
model {
  y[1] ~ normal(0, 1);
  y[2] ~ double_exponential(0, 2);
}
"
fit1 <- stan(model_code = scode, iter = 10, verbose = FALSE)
print(fit1)
fit2 <- stan(fit = fit1, iter = 10000, verbose = FALSE)
## using as.array on the stanfit object to get samples
a2 <- as.array(fit2)
## extract samples as a list of arrays
e2 <- extract(fit2, permuted = FALSE)
#### example 2
#### the result of this package is included in the package
excode <- '
  transformed data {
    array[20] real y;
    y[1] = 0.5796;  y[2] = 0.2276;   y[3]  = -0.2959;
    y[4] = -0.3742; y[5] = 0.3885;   y[6]  = -2.1585;
    y[7] = 0.7111;  y[8] = 1.4424;   y[9]  = 2.5430;
    y[10] = 0.3746; y[11] = 0.4773;  y[12] = 0.1803;
    y[13] = 0.5215; y[14] = -1.6044; y[15] = -0.6703;
    y[16] = 0.9459; y[17] = -0.382;  y[18] = 0.7619;
    y[19] = 0.1006; y[20] = -1.7461;
  }
  parameters {
    real mu;
    real<lower=0, upper=10> sigma;
    vector[2] z[3];
    real<lower=0> alpha;
  }
  model {
    y ~ normal(mu, sigma);
    for (i in 1:3)
      z[i] ~ normal(0, 1);
    alpha ~ exponential(2);
  }
'
exfit <- stan(model_code = excode, save_dso = FALSE, iter = 500)
print(exfit)
plot(exfit)
## End(Not run)
## Not run: 
## examples of specify argument `init` for function stan
## define a function to generate initial values that can
## be fed to function stan's argument `init`
# function form 1 without arguments
initf1 <- function() {
  list(mu = 1, sigma = 4, z = array(rnorm(6), dim = c(3,2)), alpha = 1)
}
# function form 2 with an argument named `chain_id`
initf2 <- function(chain_id = 1) {
  # cat("chain_id =", chain_id, "\n")
  list(mu = 1, sigma = 4, z = array(rnorm(6), dim = c(3,2)), alpha = chain_id)
}
# generate a list of lists to specify initial values
n_chains <- 4
init_ll <- lapply(1:n_chains, function(id) initf2(chain_id = id))
exfit0 <- stan(model_code = excode, init = initf1)
stan(fit = exfit0, init = initf2)
stan(fit = exfit0, init = init_ll, chains = n_chains)
## End(Not run)