fit_dfa {bayesdfa}  R Documentation 
Fit a Bayesian DFA
fit_dfa( y = y, num_trends = 1, varIndx = NULL, scale = c("zscore", "center", "none"), iter = 2000, chains = 4, thin = 1, control = list(adapt_delta = 0.99, max_treedepth = 20), nu_fixed = 101, est_correlation = FALSE, estimate_nu = FALSE, estimate_trend_ar = FALSE, estimate_trend_ma = FALSE, estimate_process_sigma = FALSE, equal_process_sigma = TRUE, sample = TRUE, data_shape = c("wide", "long"), obs_covar = NULL, pro_covar = NULL, z_bound = NULL, z_model = c("dfa", "proportion"), trend_model = c("rw", "spline", "gp"), n_knots = NULL, knot_locs = NULL, par_list = NULL, family = "gaussian", verbose = FALSE, gp_theta_prior = c(3, 1), expansion_prior = FALSE, ... )
y 
A matrix of data to fit. See 
num_trends 
Number of trends to fit. 
varIndx 
Indices indicating which timeseries should have shared variances. 
scale 
Character string, used to standardized data. Can be "zscore" to center and standardize data, "center" to just standardize data, or "none". Defaults to "zscore" 
iter 
Number of iterations in Stan sampling, defaults to 2000. 
chains 
Number of chains in Stan sampling, defaults to 4. 
thin 
Thinning rate in Stan sampling, defaults to 1. 
control 
A list of options to pass to Stan sampling. Defaults to

nu_fixed 
Student t degrees of freedom parameter. If specified as
greater than 100, a normal random walk is used instead of a random walk
with a tdistribution. Defaults to 
est_correlation 
Boolean, whether to estimate correlation of
observation error matrix 
estimate_nu 
Logical. Estimate the student t degrees of freedom
parameter? Defaults to 
estimate_trend_ar 
Logical. Estimate AR(1) parameters on DFA trends? Defaults to 'FALSE“, in which case AR(1) parameters are set to 1 
estimate_trend_ma 
Logical. Estimate MA(1) parameters on DFA trends? Defaults to 'FALSE“, in which case MA(1) parameters are set to 0. 
estimate_process_sigma 
Logical. Defaults FALSE, whether or not to estimate process error sigma. If not estimated, sigma is fixed at 1, like conventional DFAs. 
equal_process_sigma 
Logical. If process sigma is estimated, whether or not to estimate a single shared value across trends (default) or estimate equal values for each trend 
sample 
Logical. Should the model be sampled from? If 
data_shape 
If 
obs_covar 
Optional dataframe of data with 4 named columns ("time","timeseries","covariate","value"), representing: (1) time, (2) the time series affected, (3) the covariate number for models with more than one covariate affecting each trend, and (4) the value of the covariate 
pro_covar 
Optional dataframe of data with 4 named columns ("time","trend","covariate","value"), representing: (1) time, (2) the trend affected, (3) the covariate number for models with more than one covariate affecting each trend, and (4) the value of the covariate 
z_bound 
Optional hard constraints for estimated factor loadings – really only applies to model with 1 trend. Passed in as a 2element vector representing the lower and upper bound, e.g. (0, 100) to constrain positive 
z_model 
Optional argument allowing for elements of Z to be constrained to be proportions (each time series modeled as a mixture of trends). Arguments can be "dfa" (default) or "proportion" 
trend_model 
Optional argument to change the model of the underlying latent trend. By default this is set to 'rw', where the trend is modeled as a random walk  as in conentional DFA. Alternative options are 'spline', where Bsplines are used to model the trends or 'gp', where gaussian predictive processes are used. If models other than 'rw' are used, there are some key points. First, the MA and AR parameters on these models will be turned off. Second, for Bsplines the process_sigma becomes an optional scalar on the spline coefficients, and is turned off by default. Third, the number of knots can be specified (more knots = more wiggliness, and n_knots < N). For models with > 2 trends, each trend has their own spline coefficients estimated though the knot locations are assumed shared. If knots aren't specified, the default is N/3. 
n_knots 
The number of knots for the Bspline of Gaussian predictive process models. Optional, defaults to round(N/3) 
knot_locs 
Locations of knots (optional), defaults to uniform spacing between 1 and N 
par_list 
A vector of parameter names of variables to be estimated by Stan. If NULL, this will default to c("x", "Z", "sigma", "log_lik", "psi","xstar") for most models – though if AR / MA, or Studentt models are used additional parameters will be monitored. If you want to use diagnostic tools in rstan, including moment_matching, you will need to pass in a larger list. Setting this argument to "all" will monitor all parameters, enabling the use of diagnostic functions – but making the models a lot larger for storage. Finally, this argument may be a custom string of parameters to monitor, e.g. c("x","sigma") 
family 
String describing the observation model. Default is "gaussian", but included options are "gamma", "lognormal", negative binomial ("nbinom2"), "poisson", or "binomial". The binomial family is assumed to have logit link, gaussian family is assumed to be identity, and the rest are loglink. 
verbose 
Whether to print iterations and information from Stan, defaults to FALSE. 
gp_theta_prior 
A 2element vector controlling the prior on the Gaussian process parameter in cov_exp_quad. This prior is a halfStudent t prior, with the first argument of gp_theta_prior being the degrees of freedom (nu), and the second element being the standard deviation 
expansion_prior 
Defaults to FALSE, if TRUE uses the parameter expansion prior of Ghosh & Dunson 2009 
... 
Any other arguments to pass to 
Note that there is nothing restricting the loadings and trends from
being inverted (i.e. multiplied by 1
) for a given chain. Therefore, if
you fit multiple chains, the package will attempt to determine which chains
need to be inverted using the function find_inverted_chains()
.
plot_loadings plot_trends rotate_trends find_swans
set.seed(42) s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) # only 1 chain and 250 iterations used so example runs quickly: m < fit_dfa(y = s$y_sim, iter = 50, chains = 1) ## Not run: # example of observation error covariates set.seed(42) obs_covar < expand.grid("time" = 1:20, "timeseries" = 1:3, "covariate" = 1) obs_covar$value < rnorm(nrow(obs_covar), 0, 0.1) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, obs_covar = obs_covar) # example of process error covariates pro_covar < expand.grid("time" = 1:20, "trend" = 1:2, "covariate" = 1) pro_covar$value < rnorm(nrow(pro_covar), 0, 0.1) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, num_trends = 2, pro_covar = pro_covar) # example of long format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) m < fit_dfa(y = long, data_shape = "long", iter = 50, chains = 1) # example of long format data with obs covariates s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) obs_covar < expand.grid("time" = 1:20, "timeseries" = 1:3, "covariate" = 1:2) obs_covar$value < rnorm(nrow(obs_covar), 0, 0.1) m < fit_dfa(y = long, data_shape = "long", iter = 50, chains = 1, obs_covar = obs_covar) # example of model with Z constrained to be proportions and wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, z_model = "proportion", iter = 50, chains = 1) # example of model with Z constrained to be proportions and long format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) m < fit_dfa(y = long, data_shape = "long", z_model = "proportion", iter = 50, chains = 1) #' # example of Bspline model with wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, trend_model = "spline", n_knots = 10) # example of Bspline model with wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, trend_model = "gp", n_knots = 5) ## End(Not run)