visitation_model {VisitorCounts} | R Documentation |
Visitation Model
Description
Fits a time series model that uses social media posts and popularity of the social media to model visitation to recreational sites.
Usage
visitation_model(
onsite_usage,
popularity_proxy = NULL,
suspected_periods = c(12, 6, 4, 3),
proportion_of_variance_type = c("leave_out_first", "total"),
max_proportion_of_variance = 0.995,
log_ratio_cutoff = 0.2,
window_length = "auto",
num_trend_components = 2,
criterion = c("cross-correlation", "MSE", "rank"),
possible_lags = -36:36,
leave_off = 6,
estimated_change = 0,
order_of_polynomial_approximation = 7,
order_of_derivative = 1,
ref_series = NULL,
constant = 0,
beta = "estimate",
slope = 0,
is_input_logged = FALSE,
spline = FALSE,
parameter_estimates = c("joint", "separate"),
omit_trend = TRUE,
trend = c("linear", "none", "estimated"),
...
)
Arguments
onsite_usage |
A vector which stores monthly on-site usage for a particular social media platform and recreational site. |
popularity_proxy |
A vector which stores a time series which may be used as a proxy for the monthly popularity of social media over time. The length of |
suspected_periods |
A vector which stores the suspected periods in the descending order of importance. The default option is c(12,6,4,3), corresponding to 12, 6, 4, and 3 months if observations are monthly. |
proportion_of_variance_type |
A character string specifying the option for choosing the maximum number of eigenvalues based on the proportion of total variance explained. If "leave_out_first" is chosen, then the contribution made by the first eigenvector is ignored; otherwise, if "total" is chosen, then the contribution made by all the eigenvectors is considered. |
max_proportion_of_variance |
A numeric specifying the proportion of total variance explained using the method specified in |
log_ratio_cutoff |
A numeric specifying the threshold for the deviation between the estimated period and candidate periods in suspected_periods. The default option is 0.2, which means that if the absolute log ratio between the estimated and candidate period is within 0.2 (approximately a 20 percent difference), then the estimated period is deemed equal to the candidate period. |
window_length |
A character string or positive integer specifying the window length for the SSA estimation. If "auto" is chosen, then the algorithm automatically selects the window length by taking a multiple of 12 which does not exceed half the length of |
num_trend_components |
A positive integer specifying the number of eigenvectors to be chosen for describing the trend in SSA. The default option is 2. This is relevant only when |
criterion |
A character string specifying the criterion for estimating the lag in |
possible_lags |
A numeric vector specifying all the candidate lags for |
leave_off |
A positive integer specifying the number of observations to be left off when estimating the lag. The default option is 6. This is relevant only when |
estimated_change |
A numeric specifying the estimated change in the visitation trend. The default option is 0, implying no change in the trend. |
order_of_polynomial_approximation |
A numeric specifying the order of the polynomial approximation of the difference between time series used in |
order_of_derivative |
A numeric specifying the order of derivative for the approximated difference between lagged |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
constant |
A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) |
beta |
A numeric or a character string specifying the seasonality adjustment factor (beta1). The default option is "estimate", in which case, it is estimated by using the Fisher's z-transformed lag-12 autocorrelation. Even if an actual value is supplied, if |
slope |
A numeric specifying the slope coefficient (beta2) in the model. This constant is applicable only when |
is_input_logged |
A Boolean describing whether the |
spline |
A Boolean specifying whether or not to use a smoothing spline for the lag estimation. This is relevant only when |
parameter_estimates |
A character string specifying how to estimate beta and constant parameters should a reference series be supplied. Both options use least squares estimates, but "separate" indicates that the differenced series should be used to estimate beta separately from the constant, while "joint" indicates to estimate both using non-differenced detrended series. |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
... |
Additional arguments to be passed onto the smoothing spline ( |
Value
visitation_fit |
A vector storing fitted values of visitation model. |
differenced_fit |
A vector storing differenced fitted values of visitation model. (Equal to |
constant |
A numeric storing estimated constant term used in the model (beta0). |
beta |
A numeric storing the estimated seasonality adjustment factor (beta1). |
slope |
A numeric storing estimated slope coefficient term used in the model (beta2). |
proxy_decomposition |
A "decomposition" object representing the automatic decomposition obtained from |
time_series_decomposition |
A "decomposition" object representing the automatic decomposition obtained from |
forecasts_needed |
An integer representing the number of forecasts of |
lag_estimate |
A list storing both the MSE-based estimate and rank-based estimates for the lag. |
criterion |
A string; one of "cross-correlation", "MSE", or "rank", specifying the method used to select the appropriate lag. |
ref_series |
The reference series, if one was supplied. |
omit_trend |
Whether or not trend was considered 0 in the model. This is obsolete and is left only for compatibility. |
trend |
The trend used in the model. |
call |
The model call. |
See Also
See predict.visitation_model
for forecast methods, estimate_lag
for details on the lag estimation, and auto_decompose
for details on the automatic decomposition of time series using singular spectrum analysis (SSA). See the package Rssa for details regarding singular spectrum analysis.
Examples
### load data --------------------
data("park_visitation")
data("flickr_userdays")
park <- "YELL" #Yellowstone National Park
pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12)
nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12)
### fit three models ---------------
vm_pud_linear <- visitation_model(onsite_usage = pud_ts,
ref_series = nps_ts,
parameter_estimates = "joint",
trend = "linear")
vm_pud_only <- visitation_model(onsite_usage = pud_ts,
popularity_proxy = flickr_userdays,
trend = "estimated")
vm_ref_series <- visitation_model(onsite_usage = pud_ts,
popularity_proxy = flickr_userdays,
ref_series = nps_ts,
parameter_estimates = "separate",
possible_lags = -36:36,
trend = "none")
### visualize fit ------------------
plot(vm_pud_linear, ylim = c(-3,3), difference = TRUE)
lines(diff(nps_ts), col = "red")
plot(vm_pud_only, ylim = c(-3,3), difference = TRUE)
lines(diff(nps_ts), col = "red")
plot(vm_ref_series, ylim = c(-3,3), difference = TRUE)
lines(diff(nps_ts), col = "red")