smoothers {kgrams} | R Documentation |
k-gram Probability Smoothers
Description
Information on available k-gram continuation probability smoothers.
List of smoothers currently supported by kgrams
-
"ml"
: Maximum Likelihood estimate (Markov 1913). -
"add_k"
: Add-k smoothing (Dale and Laplace 1995; Lidstone 1920; Johnson 1932; Jeffreys 1998). -
"abs"
: Absolute discounting (Ney and Essen 1991). -
"wb"
: Witten-Bell smoothing (Bell et al. 1990; Witten and Bell 1991) -
"kn"
: Interpolated Kneser-Ney. (Kneser and Ney 1995; Chen and Goodman 1999). -
"mkn"
: Interpolated modified Kneser-Ney. (Chen and Goodman 1999). -
"sbo"
: Stupid Backoff (Brants et al. 2007).
Usage
smoothers()
info(smoother)
Arguments
smoother |
a string. Code name of probability smoother. |
Value
smoothers()
returns a character vector, the list of code names
of probability smoothers available in kgrams.
info(smoother)
returns NULL
(invisibly) and prints some
information on the selected smoothing technique.
Author(s)
Valerio Gherardi
References
Bell TC, Cleary JG, Witten IH (1990).
Text compression.
Prentice-Hall, Inc.
Brants T, Popat AC, Xu P, Och FJ, Dean J (2007).
“Large Language Models in Machine Translation.”
In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 858–867.
https://aclanthology.org/D07-1090/.
Chen SF, Goodman J (1999).
“An empirical study of smoothing techniques for language modeling.”
Computer Speech & Language, 13(4), 359–394.
Dale AI, Laplace P (1995).
Philosophical essay on probabilities.
Springer.
Jeffreys H (1998).
The theory of probability.
OUP Oxford.
Johnson WE (1932).
“Probability: The deductive and inductive problems.”
Mind, 41(164), 409–423.
Kneser R, Ney H (1995).
“Improved backing-off for M-gram language modeling.”
1995 International Conference on Acoustics, Speech, and Signal Processing, 1, 181-184 vol.1.
Lidstone GJ (1920).
“Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities.”
Transactions of the Faculty of Actuaries, 8(182-192), 13.
Markov AA (1913).
“Essai d'une Recherche Statistique Sur le Texte du Roman Eugene Oneguine.”
Bull. Acad. Imper. Sci. St. Petersburg, 7.
Ney H, Essen U (1991).
“On smoothing techniques for bigram-based natural language modelling.”
In Acoustics, Speech, and Signal Processing, IEEE International Conference on, 825–828.
IEEE Computer Society.
Witten IH, Bell TC (1991).
“The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression.”
Ieee transactions on information theory, 37(4), 1085–1094.
Examples
# List available smoothers
smoothers()
# Get information on smoother "kn", i.e. Interpolated Kneser-Ney
info("kn")