D_regularized {multid}R Documentation

Multivariate group difference estimation with regularized binomial regression

Description

Multivariate group difference estimation with regularized binomial regression

Usage

D_regularized(
  data,
  mv.vars,
  group.var,
  group.values,
  alpha = 0.5,
  nfolds = 10,
  s = "lambda.min",
  type.measure = "deviance",
  rename.output = TRUE,
  out = FALSE,
  size = NULL,
  fold = FALSE,
  fold.var = NULL,
  pcc = FALSE,
  auc = FALSE,
  pred.prob = FALSE,
  prob.cutoffs = seq(0, 1, 0.2),
  append.data = FALSE
)

Arguments

data

A data frame or list containing two data frames (regularization and estimation data, in that order).

mv.vars

Character vector. Variable names in the multivariate variable set.

group.var

The name of the group variable.

group.values

Vector of length 2, group values (e.g. c("male", "female) or c(0,1)).

alpha

Alpha-value for penalizing function ranging from 0 to 1: 0 = ridge regression, 1 = lasso, 0.5 = elastic net (default).

nfolds

Number of folds used for obtaining lambda (range from 3 to n-1, default 10).

s

Which lambda value is used for predicted values? Either "lambda.min" (default) or "lambda.1se".

type.measure

Which measure is used during cross-validation. Default "deviance".

rename.output

Logical. Should the output values be renamed according to the group.values? Default TRUE.

out

Logical. Should results and predictions be calculated on out-of-bag data set? (Default FALSE)

size

Integer. Number of cases in regularization data per each group. Default 1/4 of cases.

fold

Logical. Is regularization applied across sample folds with separate predictions for each fold? (Default FALSE, see details)

fold.var

Character string. Name of the fold variable. (default NULL)

pcc

Logical. Include probabilities of correct classification? Default FALSE.

auc

Logical. Include area under the receiver operating characteristics? Default FALSE.

pred.prob

Logical. Include table of predicted probabilities? Default FALSE.

prob.cutoffs

Vector. Cutoffs for table of predicted probabilities. Default seq(0,1,0.20).

append.data

Logical. If TRUE, the data is appended to the predicted variables.

Details

fold = TRUE will apply manually defined data folds (supplied with fold.var) for regularization and obtain estimates for each separately. This can be a good solution, for example, when the data are clustered within countries. In such case, the cross-validation procedure is applied across countries.

out = TRUE will use separate data partition for regularization and estimation. That is, the first cross-validation procedure is applied within the regularization set and the weights obtained are then used in the estimation data partition. The size of regularization set is defined with size. When used with fold = TRUE, size means size within a fold."

For more details on these options, please refer to the vignette and README of the multid package.

Value

D

Multivariate descriptive statistics and differences.

pred.dat

A data.frame with predicted values.

cv.mod

Regularized regression model from cv.glmnet.

P.table

Table of predicted probabilities by cutoffs.

References

Lönnqvist, J. E., & Ilmarinen, V. J. (2021). Using a continuous measure of genderedness to assess sex differences in the attitudes of the political elite. Political Behavior, 43, 1779–1800. doi:10.1007/s11109-021-09681-2

Ilmarinen, V. J., Vainikainen, M. P., & Lönnqvist, J. E. (2023). Is there a g-factor of genderedness? Using a continuous measure of genderedness to assess sex differences in personality, values, cognitive ability, school grades, and educational track. European Journal of Personality, 37, 313-337. doi:10.1177/08902070221088155

See Also

cv.glmnet

Examples

D_regularized(
  data = iris[iris$Species == "setosa" | iris$Species == "versicolor", ],
  mv.vars = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
  group.var = "Species", group.values = c("setosa", "versicolor")
)$D

# out-of-bag predictions
D_regularized(
  data = iris[iris$Species == "setosa" | iris$Species == "versicolor", ],
  mv.vars = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
  group.var = "Species", group.values = c("setosa", "versicolor"),
  out = TRUE, size = 15, pcc = TRUE, auc = TRUE
)$D

# separate sample folds
# generate data for 10 groups
set.seed(34246)
n1 <- 100
n2 <- 10
d <-
  data.frame(
    sex = sample(c("male", "female"), n1 * n2, replace = TRUE),
    fold = sample(x = LETTERS[1:n2], size = n1 * n2, replace = TRUE),
    x1 = rnorm(n1 * n2),
    x2 = rnorm(n1 * n2),
    x3 = rnorm(n1 * n2)
  )

# Fit and predict with same data
D_regularized(
  data = d,
  mv.vars = c("x1", "x2", "x3"),
  group.var = "sex",
  group.values = c("female", "male"),
  fold.var = "fold",
  fold = TRUE,
  rename.output = TRUE
)$D

# Out-of-bag data for each fold
D_regularized(
  data = d,
  mv.vars = c("x1", "x2", "x3"),
  group.var = "sex",
  group.values = c("female", "male"),
  fold.var = "fold",
  size = 17,
  out = TRUE,
  fold = TRUE,
  rename.output = TRUE
)$D

[Package multid version 1.0.0 Index]