validate {igate} | R Documentation |
Validates results after using igate
or categorical.igate
.
Description
Takes a new data frame to be used for validation and the causes and control bands
obtained from igate
or categorical.igate
and returns
all those observations that fall within these control bands.
Usage
validate(validation_df, target, causes, results_df, type = NULL)
Arguments
validation_df |
Data frame to be used for validation. It is recommended to use
a different data frame from the one used in |
target |
Target variable that was used in |
causes |
Causes determined by |
results_df |
The data frame containing the results of |
type |
The type of igate that was performed: either |
Details
If a value of Good_Count
or Bad_count
is very low in the second
data frame, it means that this cause is excluding a lot of observations from the
first data frame. Consider re-running validate
with this cause removed from
causes
.
Value
A list of three data frames is returned. The first data frame contains those observations
in validation_df
that fall into *all* the good resp. bad control bands specified in results_df
.
The columns are target
, then one column for each of the causes
and a new column
expected_quality
which is "good"
if the observation falls into all the good
control bands and "bad"
if it falls into all the bad control bands.
The second data frame has three columns
Cause | Each of the causes . |
Good_Count | If we selected all those observations that fall into the good band of this cause, how many observations would we select? |
Bad_Count | If we selected all those observations that fall into the bad band of this cause, how many observations would we select? |
The third data frame summarizes the first data frame: If type = "continuous"
it has
three columns:
expected_quality | Either "good" or "bad" . |
max_target | The maximum value for target for the observations with "good"
expected quality resp. "bad" expected quality. |
min_target | Minimum value of target for good resp. bad expected quality.
|
If type = "categorical"
it has the following three columns:
expected_quality | Either "good" or "bad" . |
Category | A list of categories of the observations with expected quality good resp. bad. |
Frequency | A count how often the respective Category appears amongs the observations with
good/ bad expected quality.
|
Examples
validate(iris, target = "Sepal.Length", causes = resultsIris$Causes, results_df = resultsIris)