plr_kmeans_test {PVplr}R Documentation

Statistical k-means Test

Description

The method builds linear models by day, identifies outliers, and performs 2-means clustering by slopes. If the lower identified cluster is significantly less than the higher mean, and constitutes less than 25% of the data, it is identified as soiled and returned. Otherwise, the outlier points are identified as soiled and returned.

Usage

plr_kmeans_test(
  df,
  var_list,
  mean_ratio = 0.7,
  plot = FALSE,
  file_path,
  file_name,
  set_cutoff = FALSE
)

Arguments

df

A df containing pv data. Should be 'cleaned' by plr_cleaning.

var_list

A list of the dataframe's standard variable names, obtained from the output of plr_variable_check.

mean_ratio

This scales the higher identified cluster's mean for comparison. Higher values will be more likely to identify the second mean as soiled, and vice versa. Values should range from 0 to 1.

plot

optional; Boolean; whether to return the box plot generated by the method to identify outliers.

file_path

optional; location to store the boxplot if plot is set TRUE. Note this is not necessary if you select to plot - only if you wish to save it.

file_name

optional; name of file to save boxplot if plot is set to TRUE.

set_cutoff

Defaults to FALSE; pass a numeric value to cut off all slopes less than the cutoff value. This bypasses entirely the outlier and clustering calculuations to remove slope values you believe to be soiled.

Value

The method returns a dataframe containing the values that should be removed. If you want to discard them, try using dplyr::filter().


[Package PVplr version 0.1.2 Index]