R: sim_pop

sim_pop_data {SPARRAfairness}

R Documentation

sim_pop_data

Description

Simulates population data with a reasonably realistic joint distribution

Usage

sim_pop_data(
  npop,
  coef_adjust = 4,
  offset = 1,
  vcor = NULL,
  coefs = c(2, 1, 0, 5, 3, 0, 0),
  seed = 12345,
  incl_id = TRUE,
  incl_reason = TRUE
)

Arguments

`npop`	population size
`coef_adjust`	inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates.
`offset`	offset for logistic model (default 1): higher means a lower overall prevalence of admission
`vcor`	a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data.
`coefs`	coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust.
`seed`	random seed (default 12345)
`incl_id`	include an ID column (default TRUE)
`incl_reason`	include a column indicating reason for admission.

Details

Simulates data for a range of people for the variables

Age (age)
Sex (sexM; 1 if male)
Race/ethnicity (raceNW: 1 if non-white ethnicity)
Number of previous hospital admissions (PrevAdm)
Deprivation decile (SIMD: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)
Urban-rural residence status (urban_rural: 1 for rural)
Mainland-island residence status (mainland_island: 1 for island)
Hospital admission (target: 1/TRUE if admitted to hospital in year following prediction date)

Can optionally add an ID column.

Optionally includes an admission reason for samples with target=1. These admission reasons roughly correspond to the first letters of ICD10 categories, and can either correspond to an admission or death. Admission reasons are simulated with a non-constant multinomial distribution which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly- chosen way. The distributions of admission reasons are not however chosen to reflect real distributions, nor are systematic changes in commonality of admission types across categories intended to appear realistic.

Value

data frame with realistic values.

Examples


# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])

# See vignette

[Package SPARRAfairness version 0.0.0.1 Index]