R: Views of a multidimensional dataset, ranked by their...

findviews_to_predict {findviews}

R Documentation

Views of a multidimensional dataset, ranked by their prediction power.

Description

findviews_to_predict detects groups of mutually dependent columns, ranks them by predictive power, and plots them with Shiny and ggplot.

Usage

findviews_to_predict(target, data, view_size_max = NULL,
  clust_method = "complete", ...)

Arguments

`target`	Name of the variable to be predicted.
`data`	Data frame or matrix to be processed
`view_size_max`	Maximum number of columns in the views. If set to `NULL`, findviews uses `log2(ncol(data))`, rounded upwards and capped at 5.
`clust_method`	Character describing a clustering method, used internally by `hclust`. Example values are "complete", "single" or "average".
`...`	Optional Shiny parameters, used in Shiny's `runApp` function.

Details

The function findviews_to_predict takes a data set and a target variable as input. It detects clusters of statistically dependent columns in the data set - e.g., views - and ranks those groups according to how well they predict the target variable.

To detect the views, findviews_to_predict relies on findviews. To evaluate their predictive power, it uses the mutual information between the joint distribution of the columns and that of the target variable. Internally, findviews_to_predict discretizes all the continuous variables with equi-width binning.

Note: findviews_to_predict removes the column to be predicted (the target column) from the dataset before it creates the column groups. Hence, the views it returns may be different from those return by calling by findviews directly on the dataset.

Examples

## Not run: 
findviews_to_predict('mpg', mtcars)
findviews_to_predict('mpg', mtcars, view_size_max = 4)

## End(Not run)

[Package findviews version 0.1.3 Index]