R: Fitting a Probabilistic Index Model

pim {pim}

R Documentation

Fitting a Probabilistic Index Model

Description

This function fits a probabilistic index model, also known as PIM. It can be used to fit standard PIMs, as well as many different flavours of models that can be reformulated as a pim. The most general models are implemented, but the flexible formula interface allows you to specify a wide variety of different models.

Usage

pim(
  formula,
  data,
  link = c("logit", "probit", "identity"),
  compare = if (model == "marginal") "all" else "unique",
  model = c("difference", "marginal", "regular", "customized"),
  na.action = getOption("na.action"),
  weights = NULL,
  keep.data = FALSE,
  ...
)

Arguments

`formula`	An object of class `formula` (or one that can be coerced to that class): A symbolic description of the model to be fitted. The details of model specification are given under 'Details'.
`data`	an optional data frame, list or environment that contains the variables in the model. Objects that can be coerced by `as.data.frame` can be used too.
`link`	a character vector with a single value that determines the used link function. Possible values are "logit", "probit" and "identity". The default is "logit".
`compare`	a character vector with a single value that describes how the model compares observations. It can take the values "unique" or "all". Alternatively you can pass a matrix with two columns. Each row represents the rownumbers in the original data frame that should be compared to eachother. See Details.
`model`	a single character value with possible values "difference" (the default), "marginal", "regular" or "customized". If the formula indicates a customized model (by the use of `L()` or `R()`), this parameter is set automatically to "customized". Currently, only the options "difference", "marginal" and "customized" are implemented.
`na.action`	the name of a function which indicates what should happen when the data contains NAs. The default is set by the `na.action` setting of `options`, and is `na.fail` when unset.
`weights`	Currently not implemented.
`keep.data`	a logical value indicating whether the model matrix should be saved in the object. Defaults to `FALSE`. See Details.
`...`	extra parameters sent to `pim.fit`

Details

PIMs are based on a set of pseudo-observations constructed from the comparison between a range of possible combinations of 2 observations. We call the set of pseudo observations poset in the context of this package.

By default, this poset takes every unique combination of 2 observations (compare = "unique"). You can either use a character value, or use a matrix or list to identify the set of observation pairs that have to be used as pseudo-observations. Note that the matrix and list should be either nameless, or have the (col)names 'L' and 'R'. If any other names are used, these are ignored and the first column/element is considered to be 'L'. See also new.pim.poset.

It's possible to store the model matrix and psuedo responses in the resulting object. By default this is not done (keep.data = FALSE) as this is less burden on the memory and the pim.formula object contains all information to reconstruct both the model matrix and the pseudo responses. If either the model matrix or the pseudo responses are needed for further calculations, setting keep.data to TRUE might reduce calculation time for these further calculations.

Value

An object of class pim. See pim-class for more information.

The enhanced formula interface

In case you want to fit a standard PIM, you can specify the model in mostly the same way as for lm. There's one important difference: a PIM has by default no intercept. To add an intercept, use + 1 in the formula.

Next to this, you can use the functions L and R in a formula to indicate which part of the poset you refer to. Remember a poset is essentially a matrix-like object with indices refering to the pseudo-observations. Using L() and R() you can define exactly how the pseudo-observations fit in the model. Keep in mind that any calculation done with these functions, has to be wrapped in a call to I(), just like you would do in any other formula interface.

You don't have to specify the model though. If you choose the option model = 'difference', every variable in the formula will be interpreted as I(R(x) - L(x)). If you use the option model = 'marginal', every variable will be interpreted as R(X).

If you don't specify any special function (i.e. L, R, P or PO), the lefthand side of the formula is defined as PO(y). The function PO calculates pseudo observations; it is 1 if the value of the dependent variable for the observation from the L-poset is smaller than, 0 if it is larger than and 0.5 if it is equal to the value for value from the R-poset (see also PO)

Examples

data('FEVData')
# The most basic way to use the function
Model <- pim(FEV~ Smoke*Sex , data=FEVData)

# A model with intercept
# The argument xscalm is passed to nleqslv via pim.fit and estimator.nleqslv
# By constructing the estimator functions wisely, you can control most of
# the fitting process from the pim() function.
data('EngelData')
Model2 <- pim(foodexp ~ income + 1, data=EngelData,
   compare="all",
   xscalm = 'auto')

# A marginal model
# It makes sense to use the identity link in combination with the
# score estimator for the variance-covariance matrix
data('DysData')
Model3 <- pim(SPC_D2 ~ out, data = DysData,
  model = 'marginal', link = 'identity',
  vcov.estim = score.vcov)

# A Model using logical comparisons, this is also possible!
# Model the chance that both observations have a different
# outcome in function of whether they had a different Chemo treatment
Model6 <- pim(P(L(out) != R(out)) ~ I(L(Chemo) != R(Chemo)),
   data=DysData,
   compare="all")

# Implementation of the friedman test in the context of a pim
# warpbreaks data where we consider tension as a block
# To do so, you provide the argument compare with a custom
# set of comparisons
data(warpbreaks)
wb <- aggregate(warpbreaks$breaks,
                by = list(w = warpbreaks$wool,
                          t = warpbreaks$tension),
                FUN = mean)
comp <- expand.grid(1:nrow(wb), 1:nrow(wb))
comp <- comp[wb$t[comp[,1]] == wb$t[comp[,2]],] # only compare within blocks
m <- pim(x ~ w, data = wb, compare = comp, link = "identity",  vcov.estim = score.vcov)
summary(m)
friedman.test(x ~ w | t, data = wb)
## Not run: 
# This illustrates how a standard model is actually built in a pim contex
Model4 <- pim(PO(L(Height),R(Height)) ~ I(R(Age) - L(Age)) + I(R(Sex) - L(Sex)),
data=FEVData,
estim = "estimator.BB")
# is the same as
Model5 <- pim(Height ~ Age + Sex, data = FEVData, estim = "estimator.BB")
summary(Model4)
summary(Model5)

## End(Not run)

[Package pim version 2.0.2 Index]