blrm {rmsb} | R Documentation |

## Bayesian Binary and Ordinal Logistic Regression

### Description

Uses `rstan`

with pre-compiled Stan code, or `cmdstan`

to get posterior draws of parameters from a binary logistic or proportional odds semiparametric ordinal logistic model. The Stan code internally using the qr decompositon on the design matrix so that highly collinear columns of the matrix do not hinder the posterior sampling. The parameters are transformed back to the original scale before returning results to R. Design matrix columns are centered before running Stan, so Stan diagnostic output will have the intercept terms shifted but the results of `blrm()`

for intercepts are for the original uncentered data. The only prior distributions for regression betas are normal with mean zero. Priors are specified on contrasts so that they can be specified on a meaningful scale and so that more complex patterns can be imposed. Parameters that are not involved in any contrasts in `pcontrast`

or `npcontrast`

have non-informative priors. Contrasts are automatically converted to the QR space used in Stan code.

### Usage

```
blrm(
formula,
ppo = NULL,
cppo = NULL,
data = environment(formula),
subset,
na.action = na.delete,
priorsdppo = rep(100, pppo),
iprior = 0,
conc = 1/(0.8 + 0.35 * max(k, 3)),
ascale = 1,
psigma = 1,
rsdmean = if (psigma == 1) 0 else 1,
rsdsd = 1,
normcppo = FALSE,
pcontrast = NULL,
npcontrast = NULL,
backend = c("rstan", "cmdstan"),
iter = 2000,
warmup = iter/2,
chains = 4,
refresh = 0,
progress = if (refresh > 0) "stan-progress.txt" else "",
x = TRUE,
y = TRUE,
loo = n <= 1000,
ppairs = NULL,
method = c("both", "sampling", "optimizing"),
inito = if (length(ppo)) 0 else "random",
inits = inito,
standata = FALSE,
file = NULL,
debug = FALSE,
sampling.args = NULL,
...
)
```

### Arguments

`formula` |
a R formula object that can use |

`ppo` |
formula specifying the model predictors for which proportional odds is not assumed |

`cppo` |
a function that if present causes a constrained partial PO model to be fit. The function specifies the values in the Gamma vector in Peterson and Harrell (1990) equation (6). Sometimes to make posterior sampling better behaved, the function should be scaled and centered. This is done by wrapping |

`data` |
a data frame; defaults to using objects from the calling environment |

`subset` |
a logical vector or integer subscript vector specifying which subset of data whould be used |

`na.action` |
default is |

`priorsdppo` |
vector of prior standard deviations for non-proportional odds parameters. The last element is the only one for which the SD corresponds to the original data scale. This only applies to the unconstrained PPO model. |

`iprior` |
specifies whether to use a Dirichlet distribution for the cell probabilities, which induce a more complex prior distribution for the intercepts ( |

`conc` |
the Dirichlet distribution concentration parameter for the prior distribution of cell probabilities at covariate means. The default is the reciprocal of 0.8 + 0.35 max(k, 3) where k is the number of Y categories. The default is chosen to make the posterior mean of the intercepts more closely match the MLE. For optimizing, the concentration parameter is always 1.0 to obtain results very close to the MLE for providing the posterior mode. |

`ascale` |
scale parameter for the t-distribution for priors for the intercepts if |

`psigma` |
defaults to 1 for a half-t distribution with 4 d.f., location parameter |

`rsdmean` |
the assumed mean of the prior distribution of the standard deviation of random effects. When |

`rsdsd` |
applies only to |

`normcppo` |
set to |

`pcontrast` |
a list specifying contrasts that are to be given Gaussian prior distributions. The predictor combinations specified in |

`npcontrast` |
like |

`backend` |
set to |

`iter` |
number of posterior samples per chain for |

`warmup` |
number of warmup iterations to discard. Default is |

`chains` |
number of separate chains to run |

`refresh` |
see |

`progress` |
see |

`x` |
set to |

`y` |
set to |

`loo` |
set to |

`ppairs` |
set to a file name to run |

`method` |
set to |

`inito` |
intial value for optimization. The default is the |

`inits` |
initial value for sampling, defaults to |

`standata` |
set to |

`file` |
a file name for a |

`debug` |
set to |

`sampling.args` |
a list containing parameters to pass to |

`...` |
passed to |

### Details

The partial proportional odds model of Peterson and Harrell (1990) is implemented, and is invoked when the user specifies a second model formula as the `ppo`

argument. This formula has no left-hand-side variable, and has right-side variables that are a subset of those in `formula`

specifying for which predictors the proportional odds assumption is relaxed.

The Peterson and Harrell (1990) constrained partial proportional odds is also implemented, and is usually preferred to the above unconstrained PPO model as it adds a vector of coefficients instead of a matrix of coefficients. In the constrained PPO model the user provides a function `cppo`

that computes a score for all observed values of the dependent variable. For example with a discrete ordinal outcome `cppo`

may return a value of 1.0 for a specific value of Y and zero otherwise. That will result in a departure from the proportional odds assumption for just that one level of Y. The value returned by `cppo`

at the lowest Y value is never used in any case.

`blrm()`

also handles single-level hierarchical random effects models for the case when there are repeated measurements per subject which are reflected as random intercepts, and a different experimental model that allows for AR(1) serial correlation within subject. For both setups, a `cluster`

term in the model signals the existence of subject-specific random effects.

When using the `cmdstan`

backend, `cmdstanr`

will need to compile the Stan code once per computer, only recompiling the code when the Stan source code changes. By default the compiled code is stored in directory `.rmsb`

under your home directory. Specify `options(rmsbdir=)`

to specify a different location. You should specify `rmsbdir`

to be in a project-specific location if you want to archive code for old projects.

If you want to run MCMC sampling even when no inputs or Stan code have changed, i.e., to use a different random number seed for the sampling process, remove the `file`

before running `blrm`

.

See here and here for multiple examples with results.

### Value

an `rms`

fit object of class `blrm`

, `rmsb`

, `rms`

that also contains `rstan`

or `cmdstanr`

results under the name `rstan`

. In the `rstan`

results, which are also used to produce diagnostics, the intercepts are shifted because of the centering of columns of the design matrix done by `blrm()`

. With `method='optimizing'`

a class-less list is return with these elements: `coefficients`

(MLEs), `beta`

(non-intercept parameters on the QR decomposition scale), `deviance`

(-2 log likelihood), `return_code`

(see `rstan::optimizing()`

), and, if you specified `hessian=TRUE`

to `blrm()`

, the Hessian matrix. To learn about the scaling of orthogonalized QR design matrix columns, look at the `xqrsd`

object in the returned object. This is the vector of SDs for all the columns of the transformed matrix. The returned element `sampling_time`

is the elapsed time for running posterior samplers, in seconds. This will be just a little more than the time for running one CPU core for one chain.

### Author(s)

Frank Harrell and Ben Goodrich

### See Also

`print.blrm()`

, `blrmStats()`

, `stanDx()`

, `stanGet()`

, `coef.rmsb()`

, `vcov.rmsb()`

, `print.rmsb()`

, `coef.rmsb()`

### Examples

```
## Not run:
getHdata(titanic3)
dd <- datadist(titanic3); options(datadist='dd')
f <- blrm(survived ~ (rcs(age, 5) + sex + pclass)^2, data=titanic3)
f # model summary using print.blrm
coef(f) # compute posterior mean parameter values
coef(f, 'median') # compute posterior median values
stanDx(f) # print basic Stan diagnostics
s <- stanGet(f) # extract rstan object from fit
plot(s, pars=f$betas) # Stan posteriors for beta parameters
stanDxplot(s) # Stan diagnostic plots by chain
blrmStats(f) # more details about predictive accuracy measures
ggplot(Predict(...)) # standard rms output
summary(f, ...) # invokes summary.rms
contrast(f, ...) # contrast.rms computes HPD intervals
plot(nomogram(f, ...)) # plot nomogram using posterior mean parameters
# Fit a random effects model to handle multiple observations per
# subject ID using cmdstan
# options(rmsb.backend='cmdstan')
f <- blrm(outcome ~ rcs(age, 5) + sex + cluster(id), data=mydata)
## End(Not run)
```

*rmsb*version 1.1-0 Index]