smbinning.eda {smbinning} | R Documentation |
Exploratory Data Analysis (EDA)
Description
It shows basic statistics for each characteristic in a data frame. The report includes:
Field: Field name.
Type: Factor, numeric, integer, other.
Recs: Number of records.
Miss: Number of missing records.
Min: Minimum value.
Q25: First quartile. It splits off the lowest 25% of data from the highest 75%.
Q50: Median or second quartile. It cuts data set in half.
Avg: Average value.
Q75: Third quartile. It splits off the lowest 75% of data from the highest 25%.
Max: Maximum value.
StDv: Standard deviation of a sample.
Neg: Number of negative values.
Pos: Number of positive values.
OutLo: Number of outliers. Records below
Q25-1.5*IQR
, whereIQR=Q75-Q25
.OutHi: Number of outliers. Records above
Q75+1.5*IQR
, whereIQR=Q75-Q25
.
Usage
smbinning.eda(df, rounding = 3, pbar = 1)
Arguments
df |
A data frame. |
rounding |
Optional parameter to define the decimal points shown in the output table. Default is 3. |
pbar |
Optional parameter that turns on or off a progress bar. Default value is 1. |
Value
The command smbinning.eda
generates two data frames that list each characteristic
with basic statistics such as extreme values and quartiles;
and also percentages of missing values and outliers, among others.
Examples
# Load library and its dataset
library(smbinning) # Load package and its data
# Example: Exploratory data analysis of dataset
smbinning.eda(smbsimdf1,rounding=3)$eda # Table with basic statistics
smbinning.eda(smbsimdf1,rounding=3)$edapct # Table with basic percentages