| train {recosystem} | R Documentation |
Training a Recommender Model
Description
This method is a member function of class "RecoSys"
that trains a recommender model. It will read from a training data source and
create a model file at the specified location. The model file contains
necessary information for prediction.
The common usage of this method is
r = Reco()
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
opts = list())
Arguments
r |
Object returned by |
train_data |
An object of class "DataSource" that describes the source
of training data, typically returned by function
|
out_model |
Path to the model file that will be created.
If passing |
opts |
A number of parameters and options for the model training. See section Parameters and Options for details. |
Parameters and Options
The opts argument is a list that can supply any of the following parameters:
lossCharacter string, the loss function. Default is "l2", see below for details.
dimInteger, the number of latent factors. Default is 10.
costp_l1Numeric, L1 regularization parameter for user factors. Default is 0.
costp_l2Numeric, L2 regularization parameter for user factors. Default is 0.1.
costq_l1Numeric, L1 regularization parameter for item factors. Default is 0.
costq_l2Numeric, L2 regularization parameter for item factors. Default is 0.1.
lrateNumeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.
niterInteger, the number of iterations. Default is 20.
nthreadInteger, the number of threads for parallel computing. Default is 1.
nbinInteger, the number of bins. Must be greater than
nthread. Default is 20.nmfLogical, whether to perform non-negative matrix factorization. Default is
FALSE.verboseLogical, whether to show detailed information. Default is
TRUE.
The loss option may take the following values:
For real-valued matrix factorization,
"l2"Squared error (L2-norm)
"l1"Absolute error (L1-norm)
"kl"Generalized KL-divergence
For binary matrix factorization,
"log"Logarithmic error
"squared_hinge"Squared hinge loss
"hinge"Hinge loss
For one-class matrix factorization,
"row_log"Row-oriented pair-wise logarithmic loss
"col_log"Column-oriented pair-wise logarithmic loss
Author(s)
Yixuan Qiu <https://statr.me>
References
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.
W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.
See Also
$tune(), $output(), $predict()
Examples
## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_data = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
# The model will be saved to a file
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)
## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
train_data = data_memory(train_df[, 1], train_df[, 2], rating = train_df[, 3])
set.seed(123)
# The model will be stored in memory
r$train(train_data, out_model = NULL,
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)
## Training model from data in a sparse matrix
if(require(Matrix))
{
mat = Matrix::sparseMatrix(i = train_df[, 1], j = train_df[, 2], x = train_df[, 3],
repr = "T", index1 = FALSE)
train_data = data_matrix(mat)
r$train(train_data, out_model = NULL,
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1))
}