R: Generate ordinal categorical data

genOrdCat {simstudy}

R Documentation

Generate ordinal categorical data

Description

Ordinal categorical data is added to an existing data set. Correlations can be added via correlation matrix or rho and corstr.

Usage

genOrdCat(
  dtName,
  adjVar = NULL,
  baseprobs,
  catVar = "cat",
  asFactor = TRUE,
  idname = "id",
  prefix = "grp",
  rho = 0,
  corstr = "ind",
  corMatrix = NULL,
  npVar = NULL,
  npAdj = NULL
)

Arguments

`dtName`	Name of complete data set
`adjVar`	Adjustment variable name in dtName - determines logistic shift. This is specified assuming a cumulative logit link.
`baseprobs`	Baseline probability expressed as a vector or matrix of probabilities. The values (per row) must sum to <= 1. If `rowSums(baseprobs) < 1`, an additional category is added with probability `1 - rowSums(baseprobs)`. The number of rows represents the number of new categorical variables. The number of columns represents the number of possible responses - if an particular category has fewer possible responses, assign zero probability to non-relevant columns.
`catVar`	Name of the new categorical field. Defaults to "cat". Can be a character vector with a name for each new variable defined via `baseprobs`. Will be overridden by `prefix` if more than one variable is defined and `length(catVar) == 1`.
`asFactor`	If `asFactor == TRUE` (default), new field is returned as a factor. If `asFactor == FALSE`, new field is returned as an integer.
`idname`	Name of the id column in `dtName`.
`prefix`	A string. The names of the new variables will be a concatenation of the prefix and a sequence of integers indicating the variable number.
`rho`	Correlation coefficient, -1 < rho < 1. Use if corMatrix is not provided.
`corstr`	Correlation structure of the variance-covariance matrix defined by sigma and rho. Options include "ind" for an independence structure, "cs" for a compound symmetry structure, and "ar1" for an autoregressive structure.
`corMatrix`	Correlation matrix can be entered directly. It must be symmetrical and positive definite. It is not a required field; if a matrix is not provided, then a structure and correlation coefficient rho must be specified. (The matrix created via `rho` and `corstr` must also be positive definite.)
`npVar`	Vector of variable names that indicate which variables are to violate the proportionality assumption.
`npAdj`	Matrix with a row for each npVar and a column for each category. Each value represents the deviation from the proportional odds assumption on the logistic scale.

Value

Original data.table with added categorical field.

Examples

# Ordinal Categorical Data ----

def1 <- defData(
  varname = "male",
  formula = 0.45, dist = "binary", id = "idG"
)
def1 <- defData(def1,
  varname = "z",
  formula = "1.2*male", dist = "nonrandom"
)
def1

## Generate data

set.seed(20)

dx <- genData(1000, def1)

probs <- c(0.40, 0.25, 0.15)

dx <- genOrdCat(dx,
  adjVar = "z", idname = "idG", baseprobs = probs,
  catVar = "grp"
)
dx

# Correlated Ordinal Categorical Data ----

baseprobs <- matrix(c(
  0.2, 0.1, 0.1, 0.6,
  0.7, 0.2, 0.1, 0,
  0.5, 0.2, 0.3, 0,
  0.4, 0.2, 0.4, 0,
  0.6, 0.2, 0.2, 0
),
nrow = 5, byrow = TRUE
)

set.seed(333)
dT <- genData(1000)

dX <- genOrdCat(dT,
  adjVar = NULL, baseprobs = baseprobs,
  prefix = "q", rho = .125, corstr = "cs", asFactor = FALSE
)
dX

dM <- data.table::melt(dX, id.vars = "id")
dProp <- dM[, prop.table(table(value)), by = variable]
dProp[, response := c(1:4, 1:3, 1:3, 1:3, 1:3)]

data.table::dcast(dProp, variable ~ response,
  value.var = "V1", fill = 0
)

# proportional odds assumption violated

d1 <- defData(varname = "rx", formula = "1;1", dist = "trtAssign")
d1 <- defData(d1, varname = "z", formula = "0 - 1.2*rx", dist = "nonrandom")

dd <- genData(1000, d1)

baseprobs <- c(.4, .3, .2, .1)
npAdj <- c(0, 1, 0, 0)

dn <- genOrdCat(
  dtName = dd, adjVar = "z",
  baseprobs = baseprobs,
  npVar = "rx", npAdj = npAdj
)

[Package simstudy version 0.8.1 Index]