R: Semantic neighborhood density

SND {LSAfun}

R Documentation

Semantic neighborhood density

Description

Returns semantic neighborhood with semantic neighborhood size and density

Usage

SND(x,n=NA,threshold=3.5,tvectors=tvectors)

Arguments

`x`	a character vector of `length(x) = 1` or a numeric of `length=ncol(tvectors)` vector with same dimensionality as the semantic space
`n`	if specified as a numeric, determines the size of the neighborhood as the `n` nearest words to `x`. If `n=NA` (default), the semantic neighborhood will be determined according to a similarity threshold (see `threshold`)
`threshold`	specifies the similarity threshold that determines if a word is counted as a neighbor for `x`, following the method by Buchanan et al. (2011) (see `Description` below)
`tvectors`	the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Details

There are two principle approaches to determine the semantic neighborhood of a target word:

Set an a priori size of the semantic neighborhood to a fixed value n (e.g., Marelli & Baroni, 2015). The n closest words to the target word are counted as its semantic neighbors. The semantic neighborhood size is then necessarily n; the semantic neighborhood density is the mean similarity between these neighbors and the target word (see also plausibility)
Determine the semantic neighborhood based on a similarity threshold; all words whose similarity to the target word exceeds this threshold are counted as its semantic neighbors (e.g., Buchanan, Westbury, & Burgess, 2001). First, the similarity between the target word and all words in the semantic space is computed. These similarities are then transformed into z-scores. Traditionally, the threshold is set to z = 3.5 (e.g., Buchanan, Westbury, & Burgess, 2001).

If a single target word is used as x, this target word itself (which always has a similarity of 1 to itself) is excluded from these computations so that it cannot be counted as its own neighbor

Value

A list of three elements:

neighbors: A names numeric vector of all identified neighbors, with the names being these neighbors and the values their similarity to x
n_size: The number of neighbors as a numeric
SND: The semantic neighborhood density (SND) as a numeric

Author(s)

Fritz Guenther

References

Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531-544.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122, 485-515.

Examples

data(wonderland)

SND("cheshire",n=20,tvectors=wonderland)

SND("alice",threshold=2,tvectors=wonderland)

[Package LSAfun version 0.7.1 Index]