remove_empty_features {SmartMeterAnalytics}R Documentation

Removes variables with no necessary information from a data.frame

Description

Removes variable names from a list of variables that contain only, or a large portion of, NA values or have zero bandwidth (if they are numeric) and returns the variable names.

Usage

remove_empty_features(
  all.features,
  dataset,
  percentage_NA_allowed = NA,
  bandwidth = (.Machine$double.eps^0.5),
  verbose = FALSE
)

Arguments

all.features

a character vector with all column names of dataset that should be considered by the function

dataset

the dataset as a data.frame

percentage_NA_allowed

the percentage of missing values per vector that should be allowed without removing the feature. All features with NA values that are higher than this level are excluded.

bandwidth

The length of the interval that values of variable must exceed to be not removed. By default, half of .Machine$double.eps is used.

verbose

boolean if debug messages should be printed when a variable is removed from the list (uses futile.logger package)

Details

The function checks all given column names for the portion of NA values. If the number of NA of Inf exceeds percentage_NA_allowed, the column name is removed from the variable set. Besides, all numeric variables are checked if they have almost zero bandwidth, are removed.

Value

a vector of variable names that are not considered as empty

Author(s)

Konstantin Hopf konstantin.hopf@uni-bamberg.de

See Also

naInf_omit, replaceNAsFeatures


[Package SmartMeterAnalytics version 1.0.3 Index]