R: Synthetic Data Generation for the Basic Unit-Level SAE Model

makedata {rsae}

R Documentation

Synthetic Data Generation for the Basic Unit-Level SAE Model

Description

This function generates synthetic data (possibly contaminated by outliers) for the basic unit-level SAE model.

Usage

makedata(seed = 1024, intercept = 1, beta = 1, n = 4, g = 20, areaID = NULL,
         ve = 1, ve.contam = 41, ve.epsilon = 0, vu = 1, vu.contam = 41,
         vu.epsilon = 0)

Arguments

`seed`	`[integer]` seed value used in `set.seed` (default `seed = 1024`).
`intercept`	`[numeric]` or `[NULL]` value of the intercept of the fixed-effects model or `NULL` for a model without intercept (default: `intercept = 1`).
`beta`	`[numeric vector]` value of the fixed-effect coefficients (without intercept; default: `beta = 1`). For each given coefficient, a vector of realizations is drawn from the standard normal distribution.
`n`	`[integer]` number of units per area in balanced-data setups (default: `n = 4`).
`g`	`[integer]` number of areas (default: `g = 20`).
`areaID`	`[integer vector]` or `[NULL]`. If one attempts to generate synthetic unbalanced data, one calls `makedata` with a vector, the elements of which area identifiers. This vector should contain a series of (integer valued) area IDs. The number of areas is set equal to the number unique IDs.
`ve`	`[numeric]` nonnegative value of model/ residual variance.
`ve.contam`	`[numeric]` nonnegative value of model variance of the outlier part in a mixture distribution (Tukey-Huber-type contamination model) `e = (1-h)N(0, ve) + hN(0, ve.contam)`.
`ve.epsilon`	`[numeric]` value in `[0,1]` that defines the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval).
`vu`	`[numeric]` value of the (area-level) random-effect variance.
`vu.contam`	`[numeric]` nonnegative value of the (area-level) random-effect variance of the outlier part in the contamination mixture distribution.
`vu.epsilon`	`[numeric]` value in `[0,1]` that defines the relative number of outliers in the contamination mixture distribution of the (area-level) random effects.

Details

Let y_i denote an area-specific n_i-vector of the response variable for the areas i = 1,..., g. Define a (n_i \times p)-matrix X_i of realizations from the std. normal distribution, N(0,1), and let \beta denote a p-vector of regression coefficients. Now, the y_i are drawn using the law y_i \sim N(X_i\beta, v_e I_i + v_u J_i) with v_e and v_u the variances of the model error and random-effect variance, respectively, and I_i and J_i denoting the identity matrix and matrix of ones, respectively.

In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of e_{i,j} and u_i are replaced by the Tukey-Huber contamination mixture:

e_{i,j} \sim (1-\epsilon^{ve})N(0,v_e) + \epsilon^{ve}N(0, v_e^{\epsilon})
u_{i} \sim (1-\epsilon^{vu})N(0,v_u) + \epsilon^{vu}N(0, v_u^{\epsilon})

where \epsilon^{ve} and \epsilon^{vu} regulate the degree of contamination; v_e^{\epsilon} and v_u^{\epsilon} define the variance of the contamination part of the mixture distribution.

Four different contamination setups are possible:

no contamination (i.e., ve.epsilon = vu.epsilon = 0),
contaminated model error (i.e., ve.epsilon != 0 and vu.epsilon = 0),
contaminated random effect (i.e., ve.epsilon = 0 and vu.epsilon != 0),
both are conaminated (i.e., ve.epsilon != 0 and vu.epsilon != 0).

Value

An instance of the class saemodel.

References

Schoch, T. (2012). Robust Unit-Level Small Area Estimation: A Fast Algorithm for Large Datasets. Austrian Journal of Statistics 41, 243–265. doi:10.17713/ajs.v41i4.1548

Stahel, W. A. and A. Welsh (1997). Approaches to robust estimation in the simplest variance components model. Journal of Statistical Planning and Inference 57, 295–319. doi:10.1016/S0378-3758(96)00050-X

Examples

# generate a model with synthetic data
model <- makedata()
model

# summary of the model
summary(model)

[Package rsae version 0.3 Index]