| ModelGenerator {wordpredictor} | R Documentation |
Generates n-gram models from a text file
Description
It provides a method for generating n-gram models. The n-gram models may be customized by specifying data cleaning and tokenization options.
Details
It provides a method that generates a n-gram model. The n-gram model may be customized by specifying the data cleaning and tokenization options.
The data cleaning options include removal of punctuation, stop words, extra space, non-dictionary words and bad words. The tokenization options include n-gram number and word stemming.
Super class
wordpredictor::Base -> ModelGenerator
Methods
Public methods
Inherited methods
Method new()
It initializes the current object. It is used to set the maximum n-gram number, sample size, input file name, data cleaner options, tokenization options and verbose option.
Usage
ModelGenerator$new( name = NULL, desc = NULL, fn = NULL, df = NULL, n = 4, ssize = 0.3, dir = ".", dc_opts = list(), tg_opts = list(), ve = 0 )
Arguments
nameThe model name.
descThe model description.
fnThe model file name.
dfThe path of the input text file. It should be the short file name and should be present in the data directory.
nThe n-gram size of the model.
ssizeThe sample size as a proportion of the input file.
dirThe directory containing the input and output files.
dc_optsThe data cleaner options.
tg_optsThe token generator options.
veThe level of detail in the information messages.
Method generate_model()
It generates the model using the parameters passed to the object's constructor. It generates a n-gram model file and saves it to the model directory.
Usage
ModelGenerator$generate_model()
Examples
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# ModelGenerator class object is created
mg <- ModelGenerator$new(
name = "default-model",
desc = "1 MB size and default options",
fn = "def-model.RDS",
df = "input.txt",
n = 4,
ssize = 0.99,
dir = ed,
dc_opts = list(),
tg_opts = list(),
ve = ve
)
# The n-gram model is generated
mg$generate_model()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
Method clone()
The objects of this class are cloneable with this method.
Usage
ModelGenerator$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## ------------------------------------------------
## Method `ModelGenerator$generate_model`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# ModelGenerator class object is created
mg <- ModelGenerator$new(
name = "default-model",
desc = "1 MB size and default options",
fn = "def-model.RDS",
df = "input.txt",
n = 4,
ssize = 0.99,
dir = ed,
dc_opts = list(),
tg_opts = list(),
ve = ve
)
# The n-gram model is generated
mg$generate_model()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()