read.pangloss {interlineaR} | R Documentation |
Read a file in the format used in the pangloss collection
Description
The pangloss collection (http://lacito.vjf.cnrs.fr/pangloss/index_en.html) is a large collection of interlinearized texts.
Usage
read.pangloss(url, DOI = NULL, get.texts = TRUE, get.sentences = TRUE,
get.words = TRUE, get.morphemes = TRUE)
Arguments
url |
a length one character vector with the url of the document to be imported |
DOI |
an unique identifier |
get.texts |
should the 'texts' data.frame be included in the result ? |
get.sentences |
should the 'sentences' data.frame be included in the result ? |
get.words |
should the 'words' data.frame be included in the result ? |
get.morphemes |
should the 'morphemes' data.frame be included in the result ? |
Value
a list with up to 5 slots corresponding to different units and named "texts", "sentences", "words", "morphemes". Each slot contains a data frame where each line describe an occurrence of the corresponding unit.
References
http://lacito.vjf.cnrs.fr/pangloss/index_en.html
Examples
path <- system.file("exampleData", "FOURMI.xml", package="interlineaR")
corpus <- read.pangloss(path)
head(corpus$morphemes)
[Package interlineaR version 1.0 Index]