R: Validates results after using 'igate' or 'categorical.igate'.

validate {igate}

R Documentation

Validates results after using `igate` or `categorical.igate`.

Description

Takes a new data frame to be used for validation and the causes and control bands obtained from igate or categorical.igate and returns all those observations that fall within these control bands.

Usage

validate(validation_df, target, causes, results_df, type = NULL)

Arguments

`validation_df`	Data frame to be used for validation. It is recommended to use a different data frame from the one used in `igate`/ `categorical.igate`. The same data frame can be used if just a sanity check of the results is performed. This data frame must contain the `target` variable as well as all the causes determined by `igate`/ `categorical.igate`.
`target`	Target variable that was used in `igate` or `categorical.igate`.
`causes`	Causes determined by `igate` or `categorical.igate`. If you saved the results of `igate`/ `categorical.igate` in an object `results`, simply use `results$Causes` here.
`results_df`	The data frame containing the results of `igate` or `categorical.igate`.
`type`	The type of igate that was performed: either `"continuous"` or `"categorical"`. If not provided function will try to guess the correct type based on the type of `validation_df[[target]]`.

Details

If a value of Good_Count or Bad_count is very low in the second data frame, it means that this cause is excluding a lot of observations from the first data frame. Consider re-running validate with this cause removed from causes.

Value

A list of three data frames is returned. The first data frame contains those observations in validation_df that fall into *all* the good resp. bad control bands specified in results_df. The columns are target, then one column for each of the causes and a new column expected_quality which is "good" if the observation falls into all the good control bands and "bad" if it falls into all the bad control bands.

The second data frame has three columns

`Cause`	Each of the `causes`.
`Good_Count`	If we selected all those observations that fall into the good band of this cause, how many observations would we select?
`Bad_Count`	If we selected all those observations that fall into the bad band of this cause, how many observations would we select?

The third data frame summarizes the first data frame: If type = "continuous" it has three columns:

`expected_quality`	Either `"good"` or `"bad"`.
`max_target`	The maximum value for `target` for the observations with "good" expected quality resp. "bad" expected quality.
`min_target`	Minimum value of `target` for good resp. bad expected quality.

If type = "categorical" it has the following three columns:

`expected_quality`	Either `"good"` or `"bad"`.
`Category`	A list of categories of the observations with expected quality good resp. bad.
`Frequency`	A count how often the respective `Category` appears amongs the observations with good/ bad expected quality.

Examples

validate(iris, target = "Sepal.Length", causes = resultsIris$Causes, results_df = resultsIris)

[Package igate version 0.3.3 Index]