KrennPPV {corpora}R Documentation

German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)

Description

This data set lists 5102 frequent combinations of verbs and prepositional phrases (PP) extracted from a German newspaper corpus. The collocational status of each PP-verb combination was manually annotated by Brigitte Krenn (2000). In addition, pre-computed scores of several standard association measures are provided.

The KrennPPV candidate set forms part of the data used in the evaluation study of Evert & Krenn (2005).

Usage


KrennPPV

Format

A data frame with 5102 rows and the following columns:

PP:

the prepositional phrase, represented by preposition and lemma of the nominal head (character). Preposition-article fusion is indicated by a + sign. For example, the prepositional phrase im letzten Jahr would appear as in:Jahr in the data set.

verb:

the verb lemma (character). Separated particle verbs have been recombined.

is.colloc:

whether the PP-verb combination is a lexical collocation (logical)

is.SVC:

whether a PP-verb collocation is a support verb construction (logical)

is.figur:

whether a PP-verb-collocation is a figurative expression (logical)

freq:

co-occurrence frequency of the PP-verb combination within clauses (integer)

MI:

Mutual Information association measure

Dice:

Dice coefficient association measure

z.score:

z-score association measure

t.score:

t-score association measure

chisq:

chi-squared association measure (without Yates' continuity correction)

chisq.corr:

chi-squared association measure (with Yates' continuity correction)

log.like:

log-likelihood association measure

Fisher:

Fisher's exact test as an association measure (negative logarithm of one-sided p-value)

See Evert (2008) and http://www.collocations.de/AM/ for details on these association measures.

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

References

Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, chapter 58, pages 1212–1248. Mouton de Gruyter, Berlin, New York.

Evert, Stefan and Krenn, Brigitte (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language, 19(4), 450–466.

Krenn, Brigitte (2000). The Usual Suspects: Data-Oriented Models for the Identification and Representation of Lexical Collocations, volume~7 of Saarbrücken Dissertations in Computational Linguistics and Language Technology. DFKI & Universität des Saarlandes, Saarbrücken, Germany.


[Package corpora version 0.6 Index]