Dataset {D2MCS}R Documentation

Simple Dataset handler.

Description

Creates a valid simple dataset object.

Methods

Public methods


Method new()

Method for initializing the object arguments during runtime.

Usage
Dataset$new(
  filepath,
  header = TRUE,
  sep = ",",
  skip = 0,
  normalize.names = FALSE,
  string.as.factor = FALSE,
  ignore.columns = NULL
)
Arguments
filepath

The name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, 'getwd()'.

header

A logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: 'header' is set to 'TRUE' if and only if the first row contains one fewer field than the number of columns.

sep

The field separator character. Values on each line of the file are separated by this character.

skip

Defines the number of header lines should be skipped.

normalize.names

A logical value indicating whether the columns names should be automatically renamed to ensure R compatibility.

string.as.factor

A logical value indicating if character columns should be converted to factors (default = FALSE).

ignore.columns

Specify the columns from the input file that should be ignored.


Method getColumnNames()

Get the name of the columns comprising the dataset.

Usage
Dataset$getColumnNames()
Returns

A character vector with the name of each column.


Method getDataset()

Gets the full dataset.

Usage
Dataset$getDataset()
Returns

A data.frame with all the loaded information.


Method getNcol()

Obtains the number of columns present in the dataset.

Usage
Dataset$getNcol()
Returns

An integer of length 1 or NULL


Method getNrow()

Obtains the number of rows present in the dataset.

Usage
Dataset$getNrow()
Returns

An integer of length 1 or NULL


Method getRemovedColumns()

Get the columns removed or ignored.

Usage
Dataset$getRemovedColumns()
Returns

A list containing the name of the removed columns.


Method cleanData()

Removes data.frame columns matching some criterion.

Usage
Dataset$cleanData(remove.funcs = NULL, remove.na = TRUE, remove.const = FALSE)
Arguments
remove.funcs

A vector of functions use to define which columns must be removed.

remove.na

A logical value indicating whether NA values should be removed.

remove.const

A logical value used to indicate if constant values should be removed.


Method removeColumns()

Applies cleanData function over an specific set of columns.

Usage
Dataset$removeColumns(
  columns,
  remove.funcs = NULL,
  remove.na = FALSE,
  remove.const = FALSE
)
Arguments
columns

Set of columns (numeric or character) where removal operation should be applied.

remove.funcs

A vector of functions use to define which columns must be removed.

remove.na

A logical value indicating whether NA values should be removed.

remove.const

A logical value used to indicate if constant values should be removed.


Method createPartitions()

Creates a k-folds partition from the initial dataset.

Usage
Dataset$createPartitions(
  num.folds = NULL,
  percent.folds = NULL,
  class.balance = NULL
)
Arguments
num.folds

A numeric for the number of folds (partitions)

percent.folds

A numeric vector with the percentage of instances containing each fold.

class.balance

A logical value indicating if class balance should be kept.


Method createSubset()

Creates a Subset for testing or classification purposes. A target class should be provided for testing purposes.

Usage
Dataset$createSubset(
  num.folds = NULL,
  opts = list(remove.na = TRUE, remove.const = FALSE),
  class.index = NULL,
  positive.class = NULL
)
Arguments
num.folds

A numeric defining the number of folds that should we used to build the Subset.

opts

A list with optional parameters. Valid arguments are remove.na (removes columns with NA values) and remove.const (ignore columns with constant values).

class.index

A numeric value identifying the column representing the target class

positive.class

Defines the positive class value.

Returns

A Subset object.


Method createTrain()

Creates a set for training purposes. A class should be defined to guarantee full-compatibility with supervised models.

Usage
Dataset$createTrain(
  class.index,
  positive.class,
  num.folds = NULL,
  opts = list(remove.na = TRUE, remove.const = FALSE)
)
Arguments
class.index

A numeric value identifying the column representing the target class

positive.class

Defines the positive class value.

num.folds

A numeric defining the number of folds that should we used to build the Subset.

opts

A list with optional parameters. Valid arguments are remove.na (removes columns with NA values) and remove.const (ignore columns with constant values).

Returns

A Trainset object.

See Also

HDDataset


[Package D2MCS version 1.0.1 Index]