| serbianLex {ndl} | R Documentation |
Serbian lexicon with 1187 prime-target pairs.
Description
The 1187 prime-target pairs and their lexical properties used in the simulation study of Experiment 1 of Baayen et al. (2011).
Usage
data(serbianLex)
Format
A data frame with 1187 observations on the following 14 variables:
TargetA factor specifying the target noun form
PrimeA factor specifying the prime noun form
PrimeLemmaA factor specifying the lemma of the prime
TargetLemmaA factor specifying the target lemma
LengthA numeric vector with the length in letters of the target
WeightedREA numeric vector with the weighted relative entropy of the prime and target inflectional paradigms
NormLevenshteinDistA numeric vector with the normalized Levenshtein distance of prime and target forms
TargetLemmaFreqA numeric vector with log frequency of the target lemma
PrimeSurfFreqA numeric vector with log frequency of the prime form
PrimeConditionA factor with prime conditions, levels:
DD,DSSD,SSCosineSimA numeric vector with the cosine similarity of prime and target vector space semantics
IsMascA vector of logicals,
TRUEif the noun is masculine.TargetGenderA factor with the gender of the target, levels:
f,m, andnTargetCaseA factor specifying the case of the target noun, levels:
acc,dat,nomMeanLogObsRTMean log-transformed observed reaction time
References
Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P. and Marelli, M. (2011), An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482.
Examples
# calculate the weight matrix for the full set of Serbian nouns
data(serbian)
serbian$Cues <- orthoCoding(serbian$WordForm, grams=2)
serbian$Outcomes <- serbian$LemmaCase
sw <- estimateWeights(cuesOutcomes=serbian)
# calculate the meaning activations for all unique word forms
desiredItems <- unique(serbian["Cues"])
desiredItems$Outcomes <- ""
activations <- estimateActivations(desiredItems, sw)$activationMatrix
rownames(activations) <- unique(serbian[["WordForm"]])
activations <- activations + abs(min(activations))
activations[1:5,1:6]
# calculate simulated latencies for the experimental materials
data(serbianLex)
syntax <- c("acc", "dat", "gen", "ins", "loc", "nom", "Pl", "Sg")
we <- 0.4 # compound cue weight
strengths <- rep(0, nrow(serbianLex))
for(i in 1:nrow(serbianLex)) {
target <- serbianLex$Target[i]
prime <- serbianLex$Prime[i]
targetLemma <- as.character(serbianLex$TargetLemma[i])
primeLemma <- as.character(serbianLex$PrimeLemma[i])
targetOutcomes <- c(targetLemma, primeLemma, syntax)
primeOutcomes <- c(targetLemma, primeLemma, syntax)
p <- activations[target, targetOutcomes]
q <- activations[prime, primeOutcomes]
strengths[i] <- sum((q^we)*(p^(1-we)))
}
serbianLex$SimRT <- -strengths
lengthPenalty <- 0.3
serbianLex$SimRT2 <- serbianLex$SimRT +
(lengthPenalty * (serbianLex$Length>5))
cor.test(serbianLex$SimRT, serbianLex$MeanLogObsRT)
cor.test(serbianLex$SimRT2, serbianLex$MeanLogObsRT)
serbianLex.lm <- lm(SimRT2 ~ Length + WeightedRE*IsMasc +
NormLevenshteinDist + TargetLemmaFreq +
PrimeSurfFreq + PrimeCondition, data=serbianLex)
summary(serbianLex.lm)