sim_pop_data {SPARRAfairness}R Documentation

sim_pop_data

Description

Simulates population data with a reasonably realistic joint distribution

Usage

sim_pop_data(
  npop,
  coef_adjust = 4,
  offset = 1,
  vcor = NULL,
  coefs = c(2, 1, 0, 5, 3, 0, 0),
  seed = 12345,
  incl_id = TRUE,
  incl_reason = TRUE
)

Arguments

npop

population size

coef_adjust

inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates.

offset

offset for logistic model (default 1): higher means a lower overall prevalence of admission

vcor

a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data.

coefs

coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust.

seed

random seed (default 12345)

incl_id

include an ID column (default TRUE)

incl_reason

include a column indicating reason for admission.

Details

Simulates data for a range of people for the variables

Can optionally add an ID column.

Optionally includes an admission reason for samples with target=1. These admission reasons roughly correspond to the first letters of ICD10 categories, and can either correspond to an admission or death. Admission reasons are simulated with a non-constant multinomial distribution which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly- chosen way. The distributions of admission reasons are not however chosen to reflect real distributions, nor are systematic changes in commonality of admission types across categories intended to appear realistic.

Value

data frame with realistic values.

Examples


# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])

# See vignette

[Package SPARRAfairness version 0.0.0.1 Index]