weighting_functions {svs} | R Documentation |
Weighting Functions
Description
Local and global weighting functions.
Usage
lw_tf(x)
lw_raw(x)
lw_log(x)
lw_bin(x)
gw_idf(x)
gw_idf_alt(x)
gw_gfidf(x)
gw_nor(x)
gw_ent(x)
gw_bin(x)
gw_raw(x)
Arguments
x |
A numeric matrix. |
Details
There are many local and global weighting functions. In this package, local weighting functions are prefixed with lw_
and
global weighting functions with gw_
, so users can define their own weighting functions.
Local weighting functions (i.e. weighting every cell in the matrix):
lw_tf
Term frequency: f(x) = x.
lw_raw
Raw frequency, which is the same as the term frequency: f(x) = x.
lw_log
Logarithm: f(x) = log(x + 1).
lw_bin
Binary: f(x) = 1 if x > 0 and 0 otherwise.
Global weighting functions, weighting the columns of the matrix (hence, these weighting functions work according to expectation for a document-term matrix, i.e. with the documents as the rows and the terms as the columns):
gw_idf
Inverse document frequency: f(x) = log( nrow(x) / n + 1) where n = the number of rows in which the column >0.
gw_idf_alt
Alternative definition of the inverse document frequency: f(x) = log( nrow(x) / n) + 1 where n = the number of rows in which the column >0.
gw_gfidf
Global frequency multiplied by inverse document frequency: f(x) = colSums(x) / n where n = the number of rows in which the column >0.
gw_nor
Normal(ized) frequency: f(x) = x / colSums(x^2).
gw_ent
Entropy: f(x) = 1 + the relative Shannon entropy.
gw_bin
Binary: f(x) = 1.
gw_raw
Raw, which is the same as binary: f(x) = 1.
Value
A numeric matrix.
See Also
Examples
SndT_Fra <- read.table(system.file("extdata", "SndT_Fra.txt", package = "svs"),
header = TRUE, sep = "\t", quote = "\"", encoding = "UTF-8",
stringsAsFactors = FALSE)
tab_SndT_Fra <- table(SndT_Fra)
lw_log(tab_SndT_Fra)
gw_idf(tab_SndT_Fra)