| eval.similarity.correlation {wordspace} | R Documentation | 
Evaluate DSM on Correlation with Similarity Ratings (wordspace)
Description
Performs evaluation by comparing the distances (or similarities) computed by a DSM with (typically human) word similarity ratings.
Well-know examples are the noun pair ratings collected by Rubenstein & Goodenough (1965; RG65) and Finkelstein et al. (2002;  WordSim353).
The quality of the DSM predictions is measured by Spearman rank correlation rho.
Usage
eval.similarity.correlation(task, M, dist.fnc=pair.distances,
                            details=FALSE, format=NA, taskname=NA,
                            word1.name="word1", word2.name="word2", score.name="score",
                            ...)
Arguments
| task | a data frame containing word pairs (usually in columns  | 
| M | a scored DSM matrix, passed to  | 
| dist.fnc | a callback function used to compute distances or similarities between word pairs.
It will be invoked with character vectors containing the components of the word pairs as first and second argument,
the DSM matrix  | 
| details | if  | 
| format | if the task definition specifies POS-disambiguated lemmas in CWB/Penn format, they can automatically be transformed into some other notation conventions; see  | 
| taskname | optional row label for the short report ( | 
| ... | any further arguments are passed to  | 
| word1.name | the name of the column of  | 
| word2.name | the name of the column of  | 
| score.name | the name of the column of  | 
Details
DSM distances are computed for all word pairs and compared with similarity ratings from the gold standard. As an evaluation criterion, Spearman rank correlation between the DSM and gold standard scores is computed. The function also reports a confidence interval for Pearson correlation, which might require suitable transformation to ensure a near-linear relationship in order to be meaningful.
NB: Since the correlation between similarity ratings and DSM distances will usually be negative, the evaluation report omits minus signs on the correlation coefficients.
With the default dist.fnc, the distance values can optionally be transformed through an arbitrary function specified in the transform argument (see pair.distances for details).
Examples include transform=log (esp. for neighbour rank as a distance measure) 
and transform=function (x) 1/(1+x) (in order to transform distances into similarities).
Note that Spearman rank correlation is not affected by any monotonic transformation, so the main evaluation results
will remain unchanged.
If one or both words of a pair are not found in the DSM, the distance is set to a fixed value 10% above the
maximum of all other DSM distances, or 10% below the minimum in the case of similarity values.
This is done in order to avoid numerical and visualization problems with Inf values;
the particular value used does not affect the rank correlation coefficient.
With the default dist.fnc callback, additional arguments method and p can be used to select 
a distance measure (see dist.matrix for details); rank=TRUE can be specified in order to 
use neighbour rank as a measure of semantic distance.
Value
The default short report (details=FALSE) is a data frame with a single row and the following columns:
| rho | (absolute value of) Spearman rank correlation coefficient  | 
| p.value | p-value indicating evidence for a significant correlation | 
| missing | number of pairs not included in the DSM | 
| r | (absolute value of) Pearson correlation coefficient  | 
| r.lower | lower bound of confidence interval for Pearson correlation | 
| r.upper | upper bound of confidence interval for Pearson correlation | 
The detailed report (details=TRUE) is a copy of the original task data with two additional columns:
| distance | distance calculated by the DSM for each word pair, possibly transformed (numeric) | 
| missing | whether word pair is missing from the DSM (logical) | 
In addition, the short report is appended to the data frame as an attribute "eval.result", 
and the optional taskname value as attribute "taskname".  The data frame is marked as an
object of class eval.similarity.correlation, for which suitable print
and plot methods are defined.
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
References
Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.
Rubenstein, Herbert and Goodenough, John B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627–633.
See Also
Suitable gold standard data sets in this package: RG65, WordSim353
Support functions: pair.distances, convert.lemma
Plotting and printing evaluation results: plot.eval.similarity.correlation, print.eval.similarity.correlation
Examples
eval.similarity.correlation(RG65, DSM_Vectors)
## Not run: 
plot(eval.similarity.correlation(RG65, DSM_Vectors, details=TRUE))
## End(Not run)