R: Calculate Lexical Specificity Score

specificities {textometry}

R Documentation

Calculate Lexical Specificity Score

Description

Calculate the specificity - or association or surprise - score of a word being present f times or more in a sub-corpus of t words given that it appears a total of F times in a whole corpus of T words.

Usage

specificities(lexicaltable, types=NULL, parts=NULL)

Arguments

`lexicaltable`	a complete lexical table, i.e. a numeric matrix where each line represents a word and each column a part of the corpus. Each cell gives the frequency of the given word in the corresponding part of the corpus.
`types`	list of rows (words) for which the specificity score must be calculated. If `NULL`, the specificity score is calculated for every row; If `types` is a character vector, it indicates the row names for which the specificity score is to be calculated (an error is thrown if `lexicaltable` has no row names); If `types` is an integer vector, it indicates the index of rows for which the specificity score is to be calculated.
`parts`	list of columns (parts) for which the specificity score must be calculated. If `NULL`, the specificity index is calculated for every part; If `parts` is a character vector, it indicates the column names for which the specificity score is to be calculated (an error is thrown if `lexicaltable` has no column names); If `parts` is an integer vector, it indicates the index of columns for which the specificity score is to be calculated.

Value

Returns a matrix of nrow(lexicaltable) * ncol(lexicaltable) (the number of rows and columns may be reduced using types or parts), each cell giving the specificity score.

Author(s)

Matthieu Decorde, Serge Heiden, Sylvain Loiseau, Lise Vaudor

References

Lafon P. (1980) Sur la variabilit\'e de la fr\'e quence des formes dans un corpus, Mots, 1, pp. 127–165. https://www.persee.fr/doc/mots_0243-6450_1980_num_1_1_1008

Examples

data(robespierre);
spe <- specificities(robespierre);
string <- paste("The word %s appears f=%d times in a sub-corpus of t=%d words,",
" given a total frequency of F=%d in the robespierre corpus made",
" of T=%d words. The corresponding specificity score is %f", sep="");
print(sprintf(string,
'peuple',
robespierre['peuple','D4'],
colSums(robespierre)['D4'],
rowSums(robespierre)['peuple'],
sum(robespierre),
spe['peuple', 'D4']));

[Package textometry version 0.1.6 Index]