validate_get_twcv {synr} | R Documentation |
Check if color data are valid and get TWCV
Description
Checks if passed color data are valid, i. e. are bountiful and varied enough according to passed validation criteria. This function is normally only used indirectly through 'Participant$check_valid_get_twcv()' or 'ParticipantGroup$get_valid_twcv()'.
Usage
validate_get_twcv(
color_matrix,
dbscan_eps = 20,
dbscan_min_pts = 4,
max_var_tight_cluster = 150,
max_prop_single_tight_cluster = 0.6,
safe_num_clusters = 3,
safe_twcv = 250
)
Arguments
color_matrix |
An n-by-3 numerical matrix where each row corresponds to a single point in 3D color space. |
dbscan_eps |
One-element numerical vector: radius of ‘epsilon neighborhood’ when applying DBSCAN clustering. |
dbscan_min_pts |
One-element numerical vector: Minimum number of points required in the epsilon neighborhood for core points (including the core point itself). |
max_var_tight_cluster |
One-element numerical vector: maximum variance for a cluster to be considered 'tight-knit'. |
max_prop_single_tight_cluster |
One-element numerical vector: maximum proportion of points allowed to be within a 'tight-knit' cluster (if this threshold is exceeded, the data are categorized as invalid). |
safe_num_clusters |
One-element numerical vector: minimum number of clusters that guarantees validity if points are 'non-tight-knit'. |
safe_twcv |
One-element numerical vector: minimum total within-cluster variance (TWCV) score that guarantees validity if points are 'non-tight-knit'. |
Value
A list with components
valid |
One-element logical vector |
reason_invalid |
One-element character vector, empty if valid is TRUE |
twcv |
One-element numeric (or NA if can't be calculated) vector, indicating TWCV |
num_clusters |
One-element numeric (or NA if can't be calculated) vector, indicating the number of identified clusters counting toward the tally compared with 'safe_num_clusters' |
Details
This function relies heavily on the DBSCAN algorithm and its implementation in the R package 'dbscan', for clustering color points. For further information regarding the 'dbscan_eps' and 'dbscan_min_pts' parameters as well as DBSCAN itself, please see the 'dbscan' documentation. Once clustering is done, passed validation criteria are applied:
If too high a proportion of all color points (cut-off specified with ‘max_prop_single_tight_cluster') fall within a single ’tight-knit' cluster (with a cluster variance less than or equal to 'max_var_tight_cluster'), then the data are always classified as invalid.
If the first criterion is cleared, and points form more than 'safe_num_cluster' clusters, data are always classified as valid.
If the first criterion is cleared, and the Total Within-Cluster Variance (TWCV) score is greater than or equal to 'safe_twcv', data are always classified as valid.
Note that this means data can be classified as valid by either having at least 'safe_num_cluster' clusters, or by having points composing a smaller number of clusters but spaced relatively far apart within these clusters.
The DBSCAN 'noise' cluster only counts towards the 'cluster tally' (compared with 'safe_num_cluster') if it includes at least 'dbscan_min_pts' points. Points in the noise cluster are however always included in other calculations, e. g. total within-cluster variance (TWCV).
See Also
point_3d_variance
for single-cluster variance,
total_within_cluster_variance
for TWCV.