findviews_to_predict {findviews}R Documentation

Views of a multidimensional dataset, ranked by their prediction power.

Description

findviews_to_predict detects groups of mutually dependent columns, ranks them by predictive power, and plots them with Shiny and ggplot.

Usage

findviews_to_predict(target, data, view_size_max = NULL,
  clust_method = "complete", ...)

Arguments

target

Name of the variable to be predicted.

data

Data frame or matrix to be processed

view_size_max

Maximum number of columns in the views. If set to NULL, findviews uses log2(ncol(data)), rounded upwards and capped at 5.

clust_method

Character describing a clustering method, used internally by hclust. Example values are "complete", "single" or "average".

...

Optional Shiny parameters, used in Shiny's runApp function.

Details

The function findviews_to_predict takes a data set and a target variable as input. It detects clusters of statistically dependent columns in the data set - e.g., views - and ranks those groups according to how well they predict the target variable.

To detect the views, findviews_to_predict relies on findviews. To evaluate their predictive power, it uses the mutual information between the joint distribution of the columns and that of the target variable. Internally, findviews_to_predict discretizes all the continuous variables with equi-width binning.

Note: findviews_to_predict removes the column to be predicted (the target column) from the dataset before it creates the column groups. Hence, the views it returns may be different from those return by calling by findviews directly on the dataset.

Examples

## Not run: 
findviews_to_predict('mpg', mtcars)
findviews_to_predict('mpg', mtcars, view_size_max = 4)

## End(Not run)


[Package findviews version 0.1.3 Index]