vClus {qreport}R Documentation

cClus

Description

Make Variable Clustering Quarto Report Section

Usage

vClus(
  d,
  exclude = NULL,
  corrmatrix = FALSE,
  redundancy = FALSE,
  spc = FALSE,
  trans = FALSE,
  rexclude = NULL,
  fracmiss = 0.2,
  maxlevels = 10,
  minprev = 0.05,
  imputed = NULL,
  horiz = FALSE,
  label = "fig-varclus",
  print = TRUE,
  redunargs = NULL,
  spcargs = NULL,
  transaceargs = NULL,
  transacefile = NULL,
  spcfile = NULL
)

Arguments

d

a data frame or table

exclude

formula or vector of character strings containing variables to exclude from analysis

corrmatrix

set to TRUE to use Hmisc::plotCorrM() to depict a Spearman rank correlation matrix.

redundancy

set to TRUE to run Hmisc::redun() on non-excluded variables

spc

set to TRUE to run Hmisc::princmp() to do a sparse principal component analysis with the argument method='sparse' passed

trans

set to TRUE to run Hmisc::transace() to transform each predictor before running redundancy or principal components analysis. transace is run on the stacked filled-in data if imputed is given.

rexclude

extra variables to exclude from transace transformating-finding, redundancy analysis, and sparce principal components (formula or character vector)

fracmiss

if the fraction of NAs for a variable exceeds this the variable will not be included

maxlevels

if the maximum number of distinct values for a categorical variable exceeds this, the variable will be dropped

minprev

the minimum proportion of non-missing observations in a category for a binary variable to be retained, and the minimum relative frequency of a category before it will be combined with other small categories

imputed

an object created by Hmisc::aregImpute() or mice::mice() that contains information from multiple imputation that causes vClus to create all the filled-in datasets, stack them into one tall dataset, and pass that dataset to Hmisc::redun() or Hmisc::princmp() so that NAs can be handled efficiently in redundancy analysis and sparse principal components, i.e., without excluding partial records. Variable clustering and the correlation matrix are already efficient because they use pairwise deletion of NAs.

horiz

set to TRUE to draw the dendrogram horizontally

label

figure label for Quarto

print

set to FALSE to not let dataframeReduce report details

redunargs

a list() of other arguments passed to Hmisc::redun()

spcargs

a list() of other arguments passed to Hmisc::princmp()

transaceargs

a list() of other arguments passed to Hmisc::transace()

transacefile

similar to spcfile and can be used when trans=TRUE

spcfile

a character string specifying an .rds R binary file to hold the results of sparse principal component analysis. Using Hmisc::runifChanged(), if the file name is specified and no inputs have changed since the last run, the result is read from the file. Otherwise a new run is made and the file is recreated if spcfile is specified. This is done because sparse principal components can take several minutes to run on large files.

Details

Draws a variable clustering dendrogram and optionally graphically depicts a correlation matrix. See this for an example. Uses Hmisc::varclus().

Value

makes Quarto tabs and prints output, returning nothing unless spc=TRUE or trans=TRUE are used, in which case a list with components princmp and/or transace is returned and these components can be passed to special print and plot methods for spc or to ggplot_transace. The user can put scree plots and PC loading plots in separate code chunks that use different figure sizes that way.

Author(s)

Frank Harrell

See Also

Hmisc::varclus(), Hmisc::plotCorrM(), Hmisc::dataframeReduce(), Hmisc::redun(), Hmisc::princmp(), Hmisc::transace()

Examples

## Not run: 
vClus(mydata, exclude=.q(country, city))

## End(Not run)

[Package qreport version 1.0-1 Index]