R: Generates n-gram models from a text file

ModelGenerator {wordpredictor}

R Documentation

Generates n-gram models from a text file

Description

It provides a method for generating n-gram models. The n-gram models may be customized by specifying data cleaning and tokenization options.

Details

It provides a method that generates a n-gram model. The n-gram model may be customized by specifying the data cleaning and tokenization options.

The data cleaning options include removal of punctuation, stop words, extra space, non-dictionary words and bad words. The tokenization options include n-gram number and word stemming.

Super class

wordpredictor::Base -> ModelGenerator

Methods

Inherited methods

Method `new()`

It initializes the current object. It is used to set the maximum n-gram number, sample size, input file name, data cleaner options, tokenization options and verbose option.

Usage

ModelGenerator$new(
  name = NULL,
  desc = NULL,
  fn = NULL,
  df = NULL,
  n = 4,
  ssize = 0.3,
  dir = ".",
  dc_opts = list(),
  tg_opts = list(),
  ve = 0
)

Arguments

name: The model name.
desc: The model description.
fn: The model file name.
df: The path of the input text file. It should be the short file name and should be present in the data directory.
n: The n-gram size of the model.
ssize: The sample size as a proportion of the input file.
dir: The directory containing the input and output files.
dc_opts: The data cleaner options.
tg_opts: The token generator options.
ve: The level of detail in the information messages.

Method `generate_model()`

It generates the model using the parameters passed to the object's constructor. It generates a n-gram model file and saves it to the model directory.

Usage

ModelGenerator$generate_model()

Examples

# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code

# ModelGenerator class object is created
mg <- ModelGenerator$new(
    name = "default-model",
    desc = "1 MB size and default options",
    fn = "def-model.RDS",
    df = "input.txt",
    n = 4,
    ssize = 0.99,
    dir = ed,
    dc_opts = list(),
    tg_opts = list(),
    ve = ve
)
# The n-gram model is generated
mg$generate_model()

# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ModelGenerator$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `ModelGenerator$generate_model`
## ------------------------------------------------

# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code

# ModelGenerator class object is created
mg <- ModelGenerator$new(
    name = "default-model",
    desc = "1 MB size and default options",
    fn = "def-model.RDS",
    df = "input.txt",
    n = 4,
    ssize = 0.99,
    dir = ed,
    dc_opts = list(),
    tg_opts = list(),
    ve = ve
)
# The n-gram model is generated
mg$generate_model()

# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()

[Package wordpredictor version 0.0.3 Index]

Generates n-gram models from a text file

Description

Details

Super class

Methods

Public methods

Method new()

Usage

Arguments

Method generate_model()

Usage

Examples

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `generate_model()`

Method `clone()`