train {recosystem} | R Documentation |
Training a Recommender Model
Description
This method is a member function of class "RecoSys
"
that trains a recommender model. It will read from a training data source and
create a model file at the specified location. The model file contains
necessary information for prediction.
The common usage of this method is
r = Reco() r$train(train_data, out_model = file.path(tempdir(), "model.txt"), opts = list())
Arguments
r |
Object returned by |
train_data |
An object of class "DataSource" that describes the source
of training data, typically returned by function
|
out_model |
Path to the model file that will be created.
If passing |
opts |
A number of parameters and options for the model training. See section Parameters and Options for details. |
Parameters and Options
The opts
argument is a list that can supply any of the following parameters:
loss
Character string, the loss function. Default is "l2", see below for details.
dim
Integer, the number of latent factors. Default is 10.
costp_l1
Numeric, L1 regularization parameter for user factors. Default is 0.
costp_l2
Numeric, L2 regularization parameter for user factors. Default is 0.1.
costq_l1
Numeric, L1 regularization parameter for item factors. Default is 0.
costq_l2
Numeric, L2 regularization parameter for item factors. Default is 0.1.
lrate
Numeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.
niter
Integer, the number of iterations. Default is 20.
nthread
Integer, the number of threads for parallel computing. Default is 1.
nbin
Integer, the number of bins. Must be greater than
nthread
. Default is 20.nmf
Logical, whether to perform non-negative matrix factorization. Default is
FALSE
.verbose
Logical, whether to show detailed information. Default is
TRUE
.
The loss
option may take the following values:
For real-valued matrix factorization,
"l2"
Squared error (L2-norm)
"l1"
Absolute error (L1-norm)
"kl"
Generalized KL-divergence
For binary matrix factorization,
"log"
Logarithmic error
"squared_hinge"
Squared hinge loss
"hinge"
Hinge loss
For one-class matrix factorization,
"row_log"
Row-oriented pair-wise logarithmic loss
"col_log"
Column-oriented pair-wise logarithmic loss
Author(s)
Yixuan Qiu <https://statr.me>
References
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.
W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.
See Also
$tune()
, $output()
, $predict()
Examples
## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_data = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
# The model will be saved to a file
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)
## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
train_data = data_memory(train_df[, 1], train_df[, 2], rating = train_df[, 3])
set.seed(123)
# The model will be stored in memory
r$train(train_data, out_model = NULL,
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)
## Training model from data in a sparse matrix
if(require(Matrix))
{
mat = Matrix::sparseMatrix(i = train_df[, 1], j = train_df[, 2], x = train_df[, 3],
repr = "T", index1 = FALSE)
train_data = data_matrix(mat)
r$train(train_data, out_model = NULL,
opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1))
}