makeX {glmnet} | R Documentation |

## convert a data frame to a data matrix with one-hot encoding

### Description

Converts a data frame to a data matrix suitable for input to `glmnet`

.
Factors are converted to dummy matrices via "one-hot" encoding. Options deal
with missing values and sparsity.

### Usage

```
makeX(train, test = NULL, na.impute = FALSE, sparse = FALSE, ...)
```

### Arguments

`train` |
Required argument. A dataframe consisting of vectors, matrices and factors |

`test` |
Optional argument. A dataframe matching 'train' for use as testing data |

`na.impute` |
Logical, default |

`sparse` |
Logical, default |

`...` |
additional arguments, currently unused |

### Details

The main function is to convert factors to dummy matrices via "one-hot"
encoding. Having the 'train' and 'test' data present is useful if some
factor levels are missing in either. Since a factor with k levels leads to a
submatrix with 1/k entries zero, with large k the `sparse=TRUE`

option
can be helpful; a large matrix will be returned, but stored in sparse matrix
format. Finally, the function can deal with missing data. The current
version has the option to replace missing observations with the mean from
the training data. For dummy submatrices, these are the mean proportions at
each level.

### Value

If only 'train' was provided, the function returns a matrix 'x'. If missing values were imputed, this matrix has an attribute containing its column means (before imputation). If 'test' was provided as well, a list with two components is returned: 'x' and 'xtest'.

### Author(s)

Trevor Hastie

Maintainer: Trevor Hastie hastie@stanford.edu

### See Also

`glmnet`

### Examples

```
set.seed(101)
### Single data frame
X = matrix(rnorm(20), 10, 2)
X3 = sample(letters[1:3], 10, replace = TRUE)
X4 = sample(LETTERS[1:3], 10, replace = TRUE)
df = data.frame(X, X3, X4)
makeX(df)
makeX(df, sparse = TRUE)
### Single data freame with missing values
Xn = X
Xn[3, 1] = NA
Xn[5, 2] = NA
X3n = X3
X3n[6] = NA
X4n = X4
X4n[9] = NA
dfn = data.frame(Xn, X3n, X4n)
makeX(dfn)
makeX(dfn, sparse = TRUE)
makeX(dfn, na.impute = TRUE)
makeX(dfn, na.impute = TRUE, sparse = TRUE)
### Test data as well
X = matrix(rnorm(10), 5, 2)
X3 = sample(letters[1:3], 5, replace = TRUE)
X4 = sample(LETTERS[1:3], 5, replace = TRUE)
dft = data.frame(X, X3, X4)
makeX(df, dft)
makeX(df, dft, sparse = TRUE)
### Missing data in test as well
Xn = X
Xn[3, 1] = NA
Xn[5, 2] = NA
X3n = X3
X3n[1] = NA
X4n = X4
X4n[2] = NA
dftn = data.frame(Xn, X3n, X4n)
makeX(dfn, dftn)
makeX(dfn, dftn, sparse = TRUE)
makeX(dfn, dftn, na.impute = TRUE)
makeX(dfn, dftn, sparse = TRUE, na.impute = TRUE)
```

*glmnet*version 4.1-8 Index]