genSimData {nonprobsvy} | R Documentation |
Simulation data
Description
Generate simulated data according to Chen, Li & Wu (2020), section 5.
Usage
genSimData(N = 10000, n = 1000)
Arguments
N |
|
n |
|
Value
genSimData
returns a data.frame, with the following columns:
x0 – intercept
x1 – the first variable based on z1
x2 – the second variable based on z2 and x1
x3 – the third variable based on z3 and x1 and x2
x4 – the third variable based on z4 and x1, x2 and x3
\(y30\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.30
\(y60\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.60
\(y80\) – \(y\) generated from the model \(y=2+x1+x2+x3+x4+\sigma \cdot \varepsilon\), so the cor(y,y_hat) = 0.80
rho – true propensity scores for big data such that sum(rho)=n
srs – probabilities of inclusion to random sample such that max(srs)/min(srs)=50
Author(s)
Łukasz Chrostowski, Maciej Beręsewicz
References
Chen, Y., Li, P., & Wu, C. (2020). Doubly Robust Inference With Nonprobability Survey Samples. Journal of the American Statistical Association, 115(532), 2011–2021. doi:10.1080/01621459.2019.1677241
Examples
## generate data with N=20000 and n=2000
genSimData(N = 20000, n = 2000)
## generate data when big data is almost as N
genSimData(N = 10000, n = 9000)