ModelGenerator {wordpredictor} | R Documentation |
Generates n-gram models from a text file
Description
It provides a method for generating n-gram models. The n-gram models may be customized by specifying data cleaning and tokenization options.
Details
It provides a method that generates a n-gram model. The n-gram model may be customized by specifying the data cleaning and tokenization options.
The data cleaning options include removal of punctuation, stop words, extra space, non-dictionary words and bad words. The tokenization options include n-gram number and word stemming.
Super class
wordpredictor::Base
-> ModelGenerator
Methods
Public methods
Inherited methods
Method new()
It initializes the current object. It is used to set the maximum n-gram number, sample size, input file name, data cleaner options, tokenization options and verbose option.
Usage
ModelGenerator$new( name = NULL, desc = NULL, fn = NULL, df = NULL, n = 4, ssize = 0.3, dir = ".", dc_opts = list(), tg_opts = list(), ve = 0 )
Arguments
name
The model name.
desc
The model description.
fn
The model file name.
df
The path of the input text file. It should be the short file name and should be present in the data directory.
n
The n-gram size of the model.
ssize
The sample size as a proportion of the input file.
dir
The directory containing the input and output files.
dc_opts
The data cleaner options.
tg_opts
The token generator options.
ve
The level of detail in the information messages.
Method generate_model()
It generates the model using the parameters passed to the object's constructor. It generates a n-gram model file and saves it to the model directory.
Usage
ModelGenerator$generate_model()
Examples
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("input.txt") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, rp = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # ModelGenerator class object is created mg <- ModelGenerator$new( name = "default-model", desc = "1 MB size and default options", fn = "def-model.RDS", df = "input.txt", n = 4, ssize = 0.99, dir = ed, dc_opts = list(), tg_opts = list(), ve = ve ) # The n-gram model is generated mg$generate_model() # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
Method clone()
The objects of this class are cloneable with this method.
Usage
ModelGenerator$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## ------------------------------------------------
## Method `ModelGenerator$generate_model`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# ModelGenerator class object is created
mg <- ModelGenerator$new(
name = "default-model",
desc = "1 MB size and default options",
fn = "def-model.RDS",
df = "input.txt",
n = 4,
ssize = 0.99,
dir = ed,
dc_opts = list(),
tg_opts = list(),
ve = ve
)
# The n-gram model is generated
mg$generate_model()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()