R: Proximity Matrix Between Words

word_proximity {qdap}

R Documentation

Proximity Matrix Between Words

Description

word_proximity - Generate proximity measures to ascertain a mean distance measure between word uses.

Usage

word_proximity(
  text.var,
  terms,
  grouping.var = NULL,
  parallel = TRUE,
  cores = parallel::detectCores()/2
)

## S3 method for class 'word_proximity'
weight(x, type = "scale", ...)

Arguments

`text.var`	The text variable.
`terms`	A vector of quoted terms.
`grouping.var`	The grouping variables. Default `NULL` generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
`parallel`	logical. If `TRUE` attempts to run the function on multiple cores. Note that this may not mean a speed boost if you have one core or if the data set is smaller as the cluster takes time to create.
`cores`	The number of cores to use if `parallel = TRUE`. Default is half the number of available cores.
`x`	An object to be weighted.
`type`	A weighting type of: c(`"scale_log"`, `"scale"`, `"rev_scale"`, `"rev_scale_log"`, `"log"`, `"sqrt"`, `"scale_sqrt"`, `"rev_sqrt"`, `"rev_scale_sqrt"`). The weight type section name (i.e. `A_B_C` where `A`, `B`, and `C` are sections) determines what action will occur. `log` will use `log`, `sqrt` will use `sqrt`, `scale` will standardize the values. `rev` will multiply by -1 to give the inverse sign. This enables a comparison similar to correlations rather than distance.
`...`	ignored.

Details

Note that row names are the first word and column names are the second comparison word. The values for Word A compared to Word B will not be the same as Word B compared to Word A. This is because, unlike a true distance measure, word_proximity's matrix is asymmetrical. word_proximity computes the distance by taking each sentence position for Word A and comparing it to the nearest sentence location for Word B.

Value

Returns a list of matrices of proximity measures in the unit of average sentences between words (defaults to scaled).

Note

The match.terms is character sensitive. Spacing is an important way to grab specific words and requires careful thought. Using "read" will find the words "bread", "read" "reading", and "ready". If you want to search for just the word "read" you'd supply a vector of c(" read ", " reads", " reading", " reader").

Examples

## Not run: 
wrds <- word_list(pres_debates2012$dialogue, 
    stopwords = c("it's", "that's", Top200Words))
wrds2 <- tolower(sort(wrds$rfswl[[1]][, 1]))

(x <- with(pres_debates2012, word_proximity(dialogue, wrds2)))
plot(x)
plot(weight(x))
plot(weight(x, "rev_scale_log"))

(x2 <- with(pres_debates2012, word_proximity(dialogue, wrds2, person)))

## The spaces around `terms` are important
(x3 <- with(DATA, word_proximity(state, spaste(qcv(the, i)))))
(x4 <- with(DATA, word_proximity(state, qcv(the, i))))

## End(Not run)

[Package qdap version 2.4.6 Index]