R: Simulation data

genSimData {nonprobsvy}

R Documentation

Simulation data

Description

Generate simulated data according to Chen, Li & Wu (2020), section 5.

Usage

genSimData(N = 10000, n = 1000)

Arguments

`N`	`integer`, population size, default 10000
`n`	`integer`, big data sample, default 1000

Value

genSimData returns a data.frame, with the following columns:

x0 – intercept
x1 – the first variable based on z1
x2 – the second variable based on z2 and x1
x3 – the third variable based on z3 and x1 and x2
x4 – the third variable based on z4 and x1, x2 and x3
\(y30\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.30
\(y60\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.60
\(y80\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.80
rho – true propensity scores for big data such that sum(rho)=n
srs – probabilities of inclusion to random sample such that max(srs)/min(srs)=50

Author(s)

Łukasz Chrostowski, Maciej Beręsewicz

References

Chen, Y., Li, P., & Wu, C. (2020). Doubly Robust Inference With Nonprobability Survey Samples. Journal of the American Statistical Association, 115(532), 2011–2021. doi:10.1080/01621459.2019.1677241

Examples

## generate data with N=20000 and n=2000
genSimData(N = 20000, n = 2000)

## generate data when big data is almost as N
genSimData(N = 10000, n = 9000)

[Package nonprobsvy version 0.1.0 Index]