simulation_model9 {fdaoutlier}R Documentation

Convenience function for generating functional data

Description

Periodic functions with outliers of different amplitude. The main model is of the form

Xi(t)=a1isinπ+a2icosπ+ei(t),X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),

with contamination model of the form

Xi(t)=(b1isinπ+b2icosπ)(1ui)+(c1isinπ+c2icosπ)ui+ei(t),X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) + (c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),

where t[0,1]t\in [0,1], π[0,2π]\pi \in [0, 2\pi], a1ia_{1i}, a2ia_{2i} follows uniform distribution in an interval [a1,a2][a_1, a_2] b1ib_{1i}, bi1b_{i1} follows uniform distribution in an interval [b1,b2][b_1, b_2]; c1ic_{1i}, ci1c_{i1} follows uniform distribution in an interval [c1,c2][c_1, c_2]; uiu_i follows Bernoulli distribution and ei(t)e_i(t) is a Gaussian processes with zero mean and covariance function of the form

γ(s,t)=αexpβtsν\gamma(s,t) = \alpha\exp{-\beta|t-s|^\nu}

Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.

Usage

simulation_model9(
  n = 100,
  p = 50,
  outlier_rate = 0.05,
  kprob = 0.5,
  ai = c(3, 8),
  bi = c(1.5, 2.5),
  ci = c(9, 10.5),
  cov_alpha = 1,
  cov_beta = 1,
  cov_nu = 1,
  deterministic = TRUE,
  seed = NULL,
  plot = F,
  plot_title = "Simulation Model 9",
  title_cex = 1.5,
  show_legend = T,
  ylabel = "",
  xlabel = "gridpoints"
)

Arguments

n

The number of curves to generate. Set to 100100 by default.

p

The number of evaluation points of the curves. Curves are usually generated over the interval [0,1][0, 1]. Set to 5050 by default.

outlier_rate

A value between [0,1][0, 1] indicating the percentage of outliers. A value of 0.060.06 indicates about 6%6\% of the observations will be outliers depending on whether the parameter deterministic is TRUE or not. Set to 0.050.05 by default.

kprob

The probability P(ui=1)P(u_i = 1). Set to 0.50.5 by default.

ai

A vector of two values containing a1ia_{1i} and a2ia_{2i} in the main model. Set to c(3, 8) by default.

bi

A vector of 2 values containing b1ib_{1i} and b2ib_{2i} in the contamination model. Set to c(1.5, 2.5) by default.

ci

A vector of 2 values containing $c_1i$ and $c_2i$ in the contamination model. Set to c(9, 10.5) by default.

cov_alpha

A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the α\alpha in the covariance function. Set to 11 by default.

cov_beta

A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the β\beta in the covariance function. Set to 11 by default.

cov_nu

A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the ν\nu in the covariance function. Set to 11 by default.

deterministic

A logical value. If TRUE, the function will always return round(n*outlier_rate) outliers and consequently the number of outliers is always constant. If FALSE, the number of outliers are determined using n Bernoulli trials with probability outlier_rate, and consequently the number of outliers returned is random. TRUE by default.

seed

A seed to set for reproducibility. NULL by default in which case a seed is not set.

plot

A logical value indicating whether to plot data.

plot_title

Title of plot if plot is TRUE

title_cex

Numerical value indicating the size of the plot title relative to the device default. Set to 1.5 by default. Ignored if plot = FALSE.

show_legend

A logical indicating whether to add legend to plot if plot = TRUE.

ylabel

The label of the y-axis. Set to "" by default.

xlabel

The label of the x-axis if plot = TRUE. Set to "gridpoints" by default.

Value

A list containing:

data

a matrix of size n by p containing the simulated data set

true_outliers

a vector of integers indicating the row index of the outliers in the generated data.

Examples

dt <- simulation_model9(plot = TRUE)
dim(dt$data)
dt$true_outliers

[Package fdaoutlier version 0.2.1 Index]