sparsify {mltools}R Documentation

Sparsify

Description

Convert a data.table object into a sparse matrix (with the same number of rows).

Usage

sparsify(dt, sparsifyNAs = FALSE, naCols = "none")

Arguments

dt

A data.table object

sparsifyNAs

Should NAs be converted to 0s and sparsified?

naCols
  • "none" Don't generate columns to identify NA values

  • "identify" For each column of dt with an NA value, generate a column in the sparse matrix with 1s indicating NAs. Columns will be named like "color_NA"

  • "efficient" For each column of dt with an NA value, generate a column in the sparse matrix with 1s indicating either NAs or Non NAs - whichever is more memory efficient. Columns will be named like "color_NA" or "color_NotNA"

Details

Converts a data.table object to a sparse matrix (class "dgCMatrix"). Requires the Matrix package. All sparsified data is assumed to take on the value 0/FALSE

### Data Type | Description & NA handling

numeric | If sparsifyNAs = FALSE, only 0s will be sparsified If sparsifyNAs = TRUE, 0s and NAs will be sparsified

factor (unordered) | Each level will generate a sparsified binary column Column names are feature_level, e.g. "color_red", "color_blue"

factor (ordered) | Levels are converted to numeric, 1 - NLevels If sparsifyNAs = FALSE, NAs will remain as NAs If sparsifyNAs = TRUE, NAs will be sparsified

logical | TRUE and FALSE values will be converted to 1s and 0s If sparsifyNAs = FALSE, only FALSEs will be sparsified If sparsifyNAs = TRUE, FALSEs and NAs will be sparsified

Examples

library(data.table)
library(Matrix)

dt <- data.table(
  intCol=c(1L, NA_integer_, 3L, 0L),
  realCol=c(NA, 2, NA, NA),
  logCol=c(TRUE, FALSE, TRUE, FALSE),
  ofCol=factor(c("a", "b", NA, "b"), levels=c("a", "b", "c"), ordered=TRUE),
  ufCol=factor(c("a", NA, "c", "b"), ordered=FALSE)
)

sparsify(dt)
sparsify(dt, sparsifyNAs=TRUE)
sparsify(dt[, list(realCol)], naCols="identify")
sparsify(dt[, list(realCol)], naCols="efficient")

[Package mltools version 0.3.5 Index]