R: Statistical k-means Test

plr_kmeans_test {PVplr}

R Documentation

Statistical k-means Test

Description

The method builds linear models by day, identifies outliers, and performs 2-means clustering by slopes. If the lower identified cluster is significantly less than the higher mean, and constitutes less than 25% of the data, it is identified as soiled and returned. Otherwise, the outlier points are identified as soiled and returned.

Usage

plr_kmeans_test(
  df,
  var_list,
  mean_ratio = 0.7,
  plot = FALSE,
  file_path,
  file_name,
  set_cutoff = FALSE
)

Arguments

`df`	A df containing pv data. Should be 'cleaned' by `plr_cleaning`.
`var_list`	A list of the dataframe's standard variable names, obtained from the output of `plr_variable_check`.
`mean_ratio`	This scales the higher identified cluster's mean for comparison. Higher values will be more likely to identify the second mean as soiled, and vice versa. Values should range from 0 to 1.
`plot`	optional; Boolean; whether to return the box plot generated by the method to identify outliers.
`file_path`	optional; location to store the boxplot if plot is set TRUE. Note this is not necessary if you select to plot - only if you wish to save it.
`file_name`	optional; name of file to save boxplot if plot is set to TRUE.
`set_cutoff`	Defaults to FALSE; pass a numeric value to cut off all slopes less than the cutoff value. This bypasses entirely the outlier and clustering calculuations to remove slope values you believe to be soiled.

Value

The method returns a dataframe containing the values that should be removed. If you want to discard them, try using dplyr::filter().

[Package PVplr version 0.1.2 Index]