R: cClus

vClus {qreport}

R Documentation

cClus

Description

Make Variable Clustering Quarto Report Section

Usage

vClus(
  d,
  exclude = NULL,
  corrmatrix = FALSE,
  redundancy = FALSE,
  spc = FALSE,
  trans = FALSE,
  rexclude = NULL,
  fracmiss = 0.2,
  maxlevels = 10,
  minprev = 0.05,
  imputed = NULL,
  horiz = FALSE,
  label = "fig-varclus",
  print = TRUE,
  redunargs = NULL,
  spcargs = NULL,
  transaceargs = NULL,
  transacefile = NULL,
  spcfile = NULL
)

Arguments

`d`	a data frame or table
`exclude`	formula or vector of character strings containing variables to exclude from analysis
`corrmatrix`	set to `TRUE` to use `Hmisc::plotCorrM()` to depict a Spearman rank correlation matrix.
`redundancy`	set to `TRUE` to run `Hmisc::redun()` on non-excluded variables
`spc`	set to `TRUE` to run `Hmisc::princmp()` to do a sparse principal component analysis with the argument `method='sparse'` passed
`trans`	set to `TRUE` to run `Hmisc::transace()` to transform each predictor before running redundancy or principal components analysis. `transace` is run on the stacked filled-in data if `imputed` is given.
`rexclude`	extra variables to exclude from `transace` transformating-finding, redundancy analysis, and sparce principal components (formula or character vector)
`fracmiss`	if the fraction of `NA`s for a variable exceeds this the variable will not be included
`maxlevels`	if the maximum number of distinct values for a categorical variable exceeds this, the variable will be dropped
`minprev`	the minimum proportion of non-missing observations in a category for a binary variable to be retained, and the minimum relative frequency of a category before it will be combined with other small categories
`imputed`	an object created by `Hmisc::aregImpute()` or `mice::mice()` that contains information from multiple imputation that causes `vClus` to create all the filled-in datasets, stack them into one tall dataset, and pass that dataset to `Hmisc::redun()` or `Hmisc::princmp()` so that `NA`s can be handled efficiently in redundancy analysis and sparse principal components, i.e., without excluding partial records. Variable clustering and the correlation matrix are already efficient because they use pairwise deletion of `NA`s.
`horiz`	set to `TRUE` to draw the dendrogram horizontally
`label`	figure label for Quarto
`print`	set to `FALSE` to not let `dataframeReduce` report details
`redunargs`	a `list()` of other arguments passed to `Hmisc::redun()`
`spcargs`	a `list()` of other arguments passed to `Hmisc::princmp()`
`transaceargs`	a `list()` of other arguments passed to `Hmisc::transace()`
`transacefile`	similar to `spcfile` and can be used when `trans=TRUE`
`spcfile`	a character string specifying an `.rds` R binary file to hold the results of sparse principal component analysis. Using `Hmisc::runifChanged()`, if the file name is specified and no inputs have changed since the last run, the result is read from the file. Otherwise a new run is made and the file is recreated if `spcfile` is specified. This is done because sparse principal components can take several minutes to run on large files.

Details

Draws a variable clustering dendrogram and optionally graphically depicts a correlation matrix. See this for an example. Uses Hmisc::varclus().

Value

makes Quarto tabs and prints output, returning nothing unless spc=TRUE or trans=TRUE are used, in which case a list with components princmp and/or transace is returned and these components can be passed to special print and plot methods for spc or to ggplot_transace. The user can put scree plots and PC loading plots in separate code chunks that use different figure sizes that way.

Author(s)

Frank Harrell

Examples

## Not run: 
vClus(mydata, exclude=.q(country, city))

## End(Not run)

[Package qreport version 1.0-1 Index]