ExpCustomStat {SmartEDA} | R Documentation |
Customized summary statistics
Description
Table of descriptive statistics. Output returns matrix object containing descriptive information on all input variables for each level or combination of levels in categorical/group variable. Also while running the analysis user can filter out the data by individual variable level or across data level.
Usage
ExpCustomStat(
data,
Cvar = NULL,
Nvar = NULL,
stat = NULL,
gpby = TRUE,
filt = NULL,
dcast = FALSE,
value = NULL
)
Arguments
data |
data frame or Matrix |
Cvar |
qualitative variables on which to stratify / subgroup or run categorical summaries |
Nvar |
quantitative variables on which to run summary statistics for. |
stat |
descriptive statistics. Specify which summary statistics required (Included all base stat functions like 'mean','medain','max','min','sum','IQR','sd','var',quantile like P0.1, P0.2 etc'). Also added two more stat here are 'PS' is percentage of shares and 'Prop' is column percentage |
gpby |
default value is True. Group level summary will be created based on list of categorical variable. If summary required at each categorical variable level then keep this option as FALSE |
filt |
filter out data while running the summary statistics. Filter can apply across data or individual variable level using filt option. If there are multiple filters, seperate the conditons by using '^'. Ex: Nvar = c("X1","X2","X3","X4"), let say we need to exclude data X1>900 for X1 variable, X2==10 for X2 variable, Gender !='Male' for X3 variable and all data for X4 then filt should be, filt = c("X1>900"^"X2==10"^"Gender!='Male'"^all) or c("X1>900"^"X2==10"^"Gender!='Male'"^ ^). in case if you want to keep all data for some of the variable listed in Nvar, then specify inside the filt like ^all^ or ^ ^(single space) |
dcast |
fast dcast from data.table |
value |
If dcast is TRUE, pass the variable name which needs to come on column |
Details
Filter unique value from all the numeric variables
Case1: Excluding unique values or outliers values like '999' or '9999' or '888' etc from each selected variables.
Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)
Exclude 999:
x = c(23,24,34,12,12,23,45)
y = c(1,3,4,0,0,8,0)
Case2: Summarise the data with selected descriptive statistics like 'mean' and 'median' or 'sum' and 'variance' etc..
Case3: Aggregate the data with different statistics using group by statement
Case4: Reshape the summary statistics.. etc
The complete functionality of 'ExpCustomStat' function is detailed in vignette help page with example code.
Value
summary statistics as dataframe. Usage of this function is detailed in user guide vignettes document.
Examples
## Selected summary statistics 'Count,sum, percentage of shares' for
## disp and mpg variables by vs, am and gear
ExpCustomStat(mtcars, Cvar=c("vs","am","gear"), Nvar = c("disp","mpg"),
stat = c("Count","sum","PS"), gpby = TRUE, filt = NULL)
ExpCustomStat(mtcars, Cvar=c("gear"), Nvar = c("disp","mpg"),
stat = c("Count","sum","var"), gpby = TRUE, filt = "am==1")
ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp","mpg"),
stat = c("Count","sum","mean","median"), gpby = TRUE, filt = "am==1")
## Selected summary statistics 'Count and fivenum stat for disp and mpg
## variables by gear
ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp", "mpg"),
stat = c("Count",'min','p0.25','median','p0.75','max'), gpby = TRUE)