decompose_proxy {VisitorCounts}R Documentation

Decompose Popularity Proxy

Description

Decomposes the popularity proxy time series into trend and seasonality components.

Usage

decompose_proxy(
  onsite_usage,
  popularity_proxy = NULL,
  suspected_periods = c(12, 6, 4, 3),
  proportion_of_variance_type = c("leave_out_first", "total"),
  max_proportion_of_variance = 0.995,
  log_ratio_cutoff = 0.2,
  window_length = "auto",
  num_trend_components = 2,
  criterion = c("cross-correlation", "MSE", "rank"),
  possible_lags = -36:36,
  leave_off = 6,
  estimated_change = 0,
  order_of_polynomial_approximation = 7,
  order_of_derivative = 1,
  ref_series = NULL,
  constant = 0,
  beta = "estimate",
  slope = 0,
  is_input_logged = FALSE,
  spline = FALSE,
  parameter_estimates = c("separate", "joint"),
  omit_trend = TRUE,
  trend = c("linear", "none", "estimated"),
  onsite_usage_decomposition,
  ...
)

Arguments

onsite_usage

A vector which stores monthly on-site usage for a particular social media platform and recreational site.

popularity_proxy

A vector which stores a time series which may be used as a proxy for the monthly popularity of social media over time. The length of popularity_proxy must be the same as that of onsite_usage. The default option is NULL, in which case, no proxy needs to be supplied. Note that this vector cannot have a value of 0.

suspected_periods

A vector which stores the suspected periods in the descending order of importance. The default option is c(12,6,4,3), corresponding to 12, 6, 4, and 3 months if observations are monthly.

proportion_of_variance_type

A character string specifying the option for choosing the maximum number of eigenvalues based on the proportion of total variance explained. If "leave_out_first" is chosen, then the contribution made by the first eigenvector is ignored; otherwise, if "total" is chosen, then the contribution made by all the eigenvectors is considered.

max_proportion_of_variance

A numeric specifying the proportion of total variance explained using the method specified in proportion_of_variance_type. The default option is 0.995.

log_ratio_cutoff

A numeric specifying the threshold for the deviation between the estimated period and candidate periods in suspected_periods. The default option is 0.2, which means that if the absolute log ratio between the estimated and candidate period is within 0.2 (approximately a 20 percent difference), then the estimated period is deemed equal to the candidate period.

window_length

A character string or positive integer specifying the window length for the SSA estimation. If "auto" is chosen, then the algorithm automatically selects the window length by taking a multiple of 12 which does not exceed half the length of onsite_usage. The default option is "auto".

num_trend_components

A positive integer specifying the number of eigenvectors to be chosen for describing the trend in SSA. The default option is 2. This is relevant only when trend is "estimated".

criterion

A character string specifying the criterion for estimating the lag in popularity_proxy. If "cross-correlation" is chosen, it chooses the lag that maximizes the correlation coefficient between lagged popularity_proxy and onsite_usage. If "MSE" is chosen, it does so by identifying the lagged popularity_proxy whose derivative is closest to that of onsite_usage by minimizing the mean squared error. If "rank" is chosen, it does so by firstly ranking the square errors of the derivatives and identifying the lag which would minimize the mean rank.

possible_lags

A numeric vector specifying all the candidate lags for popularity_proxy. The default option is -36:36. This is relevant only when trend is "estimated".

leave_off

A positive integer specifying the number of observations to be left off when estimating the lag. The default option is 6. This is relevant only when trend is "estimated".

estimated_change

A numeric specifying the estimated change in the visitation trend. The default option is 0, implying no change in the trend.

order_of_polynomial_approximation

A numeric specifying the order of the polynomial approximation of the difference between time series used in estimate_lag. The default option is 7, the seventh-degree polynomial. This is relevant only when trend is "estimated".

order_of_derivative

A numeric specifying the order of derivative for the approximated difference between lagged popularity_proxy and onsite_usage. The default option is 1, the first derivative. This is relevant only when trend is "estimated".

ref_series

A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of onsite_usage.

constant

A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) onsite_usage does not require any constant shift, which is unusual. If ref_series is supplied, the constant is overwritten by the least squares estimate.

beta

A numeric or a character string specifying the seasonality adjustment factor (beta1). The default option is "estimate", in which case, it is estimated by using the Fisher's z-transformed lag-12 autocorrelation. Even if an actual value is supplied, if ref_series is supplied, it is overwritten by the least squares estimate.

slope

A numeric specifying the slope coefficient (beta2) in the model. This constant is applicable only when trend is set to "linear". The default option is 0, implying that the linear trend is absent.

is_input_logged

A Boolean describing whether the onsite_usage, ref_series, and popularity_proxy are in the log scale. The default option is FALSE, in which case the inputs will be assumed to not be logged and will be logged before making forecasts. Setting it to TRUE will assume the inputs are logged.

spline

A Boolean specifying whether or not to use a smoothing spline for the lag estimation. This is relevant only when trend is "estimated".

parameter_estimates

A character string specifying how to estimate beta and constant parameters should a reference series be supplied. Both options use least squares estimates, but "separate" indicates that the differenced series should be used to estimate beta separately from the constant, while "joint" indicates to estimate both using non-differenced detrended series.

omit_trend

This is obsolete and is left only for compatibility. In other words, trend will overwrite any option chosen in omit_trend. If trend is NULL, then trend is overwritten according to omit_trend. It is a Boolean specifying whether or not to consider the trend component to be 0. The default option is TRUE, in which case, the trend component is 0. If it is set to FALSE, then it is estimated using data.

trend

A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to omit_trend being TRUE and FALSE, respectively. If NULL, then it follows the value specified in omit_trend.

onsite_usage_decomposition

A "decomposition" class object containing decomposition data for the onsite usage time series (outputs from 'auto_decompose').

...

Additional arguments to be passed onto the smoothing spline (smooth.spline).

Value

proxy_decomposition

A "decomposition" object representing the automatic decomposition obtained from popularity_proxy (see auto_decompose).

lagged_proxy_trend_and_forecasts_window

A 'ts' object storing the potentially lagged popularity proxy trend and any forecasts needed due to the lag.

ts_trend_window

A 'ts' object storing the trend component of the onsite social media usage. This trend component is potentially truncated to match available popularity proxy data.

ts_seasonality_window

A 'ts' object storing the seasonality component of the onsite social media usage. This seasonality component is potentially truncated to match available popularity proxy data.

latest_starttime

A 'tsp' attribute of a 'ts' object representing the latest of the two start times of the potentially lagged populairty proxy and the onsite social media usage.

endtime

A 'tsp' attribute of a 'ts' object representing the time of the final onsite usage observation.

forecasts_needed

An integer representing the number of forecasts of popularity_proxy needed to obtain all fitted values. Negative values indicate extra observations which may be useful for predictions.

lag_estimate

A list storing both the MSE-based esitmate and rank-based estimates for the lag.


[Package VisitorCounts version 2.0.0 Index]