ExpNumStat {SmartEDA} | R Documentation |
Summary statistics for numerical variables
Description
Function provides summary statistics for all numerical variable. This function automatically scans through each variable and select only numeric/integer variables. Also if we know the target variable, function will generate relationship between target variable and each independent variable.
Usage
ExpNumStat(
data,
by = "A",
gp = NULL,
Qnt = NULL,
Nlim = 10,
MesofShape = 2,
Outlier = FALSE,
round = 3,
weight = NULL,
dcast = FALSE,
val = NULL
)
Arguments
data |
dataframe or matrix |
by |
group by A (summary statistics by All), G (summary statistics by group), GA (summary statistics by group and Overall) |
gp |
target variable if any, default NULL |
Qnt |
default NULL. Specified quantile is c(.25,0.75) will find 25th and 75th percentiles |
Nlim |
numeric variable limit (default value is 3 which means it will only consider those variable having more than 3 unique values and variable type is numeric/integer) |
MesofShape |
Measures of shapes (Skewness and kurtosis). |
Outlier |
Calculate the lower hinge, upper hinge and number of outlier |
round |
round off |
weight |
a vector of weights, it must be equal to the length of data |
dcast |
fast dcast from data.table |
val |
Name of the column whose values will be filled to cast (see Details sections for list of column names) |
Details
column descriptions
-
Vname
is Variable name -
Group
is Target variable -
TN
is Total sample (included NA observations) -
nNeg
is Total negative observations -
nPos
is Total positive observations -
nZero
is Total zero observations -
NegInf
is Negative infinite count -
PosInf
is Positive infinite count -
NA_value
is Not Applicable count -
Per_of_Missing
is Percentage of missing -
Min
is minimum value -
Max
is maximum value -
Mean
is average value -
Median
is median value -
SD
is Standard deviation -
CV
is coefficient of variations (SD/mean)*100 -
IQR
is Inter quartile range -
Qnt
is quantile values -
MesofShape
is Skewness and Kurtosis -
Outlier
is Number of outlier -
Cor
is Correlation b/w target and independent variables
Value
summary statistics for numeric independent variables
Summary by:
-
Only overall level
-
Only group level
-
Both overall and group level
See Also
Examples
# Descriptive summary of numeric variables is Summary by Target variables
ExpNumStat(mtcars,by="G",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=TRUE,round=3)
# Descriptive summary of numeric variables is Summary by Overall
ExpNumStat(mtcars,by="A",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=TRUE,round=3)
# Descriptive summary of numeric variables is Summary by Overall and Group
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=seq(0,1,.1),MesofShape=1,
Outlier=TRUE,round=2)
# Summary by specific statistics for all numeric variables
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=FALSE,round=2,dcast = TRUE,val = "IQR")
# Weighted summary statistics
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=FALSE,round=2,dcast = TRUE,val = "IQR", weight = "wt")