univariate {scorecardModelUtils} | R Documentation |
Univariate analysis of variables
Description
The function gives univariate analysis of the variables as output dataframe. The univariate statistics includes - minimum, maximum, mean, median, number of distinct values, variable type, counts of null value, percentage of null value, maximum population percentage among all classes/values, correlation with target. It also returns the list of names of character and numerical variable types along with variable name with population concentration more than a threshold at a class/value.
Usage
univariate(base, target, threshold)
Arguments
base |
input dataframe |
target |
column / field name for the target variable to be passed as string (must be 0/1 type) |
threshold |
sparsity threshold, to be provided as decimal/fraction |
Value
The function returns an object of class "univariate" which is a list containing the following components:
univar_table |
univariate summary of variables |
num_var_name |
array of column names of numerical type variables |
char_var_name |
array of column names of categorical type variables |
sparse_var_name |
array of column names where population concentration at a class or value is more then the sparsity threshold |
Author(s)
Arya Poddar <aryapoddar290990@gmail.com>
Examples
data <- iris
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
univariate_list <- univariate(base = data,target = "Y",threshold = 0.95)
univariate_list$univar_table
univariate_list$num_var_name
univariate_list$char_var_name
univariate_list$sparse_var_name