univariate {scorecardModelUtils}R Documentation

Univariate analysis of variables

Description

The function gives univariate analysis of the variables as output dataframe. The univariate statistics includes - minimum, maximum, mean, median, number of distinct values, variable type, counts of null value, percentage of null value, maximum population percentage among all classes/values, correlation with target. It also returns the list of names of character and numerical variable types along with variable name with population concentration more than a threshold at a class/value.

Usage

univariate(base, target, threshold)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

threshold

sparsity threshold, to be provided as decimal/fraction

Value

The function returns an object of class "univariate" which is a list containing the following components:

univar_table

univariate summary of variables

num_var_name

array of column names of numerical type variables

char_var_name

array of column names of categorical type variables

sparse_var_name

array of column names where population concentration at a class or value is more then the sparsity threshold

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Examples

data <- iris
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)

univariate_list <- univariate(base = data,target = "Y",threshold = 0.95)
univariate_list$univar_table
univariate_list$num_var_name
univariate_list$char_var_name
univariate_list$sparse_var_name

[Package scorecardModelUtils version 0.0.1.0 Index]