sim_pop_data {SPARRAfairness} | R Documentation |
sim_pop_data
Description
Simulates population data with a reasonably realistic joint distribution
Usage
sim_pop_data(
npop,
coef_adjust = 4,
offset = 1,
vcor = NULL,
coefs = c(2, 1, 0, 5, 3, 0, 0),
seed = 12345,
incl_id = TRUE,
incl_reason = TRUE
)
Arguments
npop |
population size |
coef_adjust |
inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates. |
offset |
offset for logistic model (default 1): higher means a lower overall prevalence of admission |
vcor |
a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data. |
coefs |
coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust. |
seed |
random seed (default 12345) |
incl_id |
include an ID column (default TRUE) |
incl_reason |
include a column indicating reason for admission. |
Details
Simulates data for a range of people for the variables
Age (
age
)Sex (
sexM
; 1 if male)Race/ethnicity (
raceNW
: 1 if non-white ethnicity)Number of previous hospital admissions (
PrevAdm
)Deprivation decile (
SIMD
: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)Urban-rural residence status (
urban_rural
: 1 for rural)Mainland-island residence status (
mainland_island
: 1 for island)Hospital admission (
target
: 1/TRUE if admitted to hospital in year following prediction date)
Can optionally add an ID column.
Optionally includes an admission reason for samples with target=1
. These admission reasons
roughly correspond to the first letters of ICD10 categories, and can either correspond to an
admission or death. Admission reasons are simulated with a non-constant multinomial distribution
which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly-
chosen way. The distributions of admission reasons are not however chosen to reflect real
distributions, nor are systematic changes in commonality of admission types across categories
intended to appear realistic.
Value
data frame with realistic values.
Examples
# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])
# See vignette