R: Read a file in the format used in the pangloss collection

read.pangloss {interlineaR}

R Documentation

Read a file in the format used in the pangloss collection

Description

The pangloss collection (http://lacito.vjf.cnrs.fr/pangloss/index_en.html) is a large collection of interlinearized texts.

Usage

read.pangloss(url, DOI = NULL, get.texts = TRUE, get.sentences = TRUE,
  get.words = TRUE, get.morphemes = TRUE)

Arguments

`url`	a length one character vector with the url of the document to be imported
`DOI`	an unique identifier
`get.texts`	should the 'texts' data.frame be included in the result ?
`get.sentences`	should the 'sentences' data.frame be included in the result ?
`get.words`	should the 'words' data.frame be included in the result ?
`get.morphemes`	should the 'morphemes' data.frame be included in the result ?

Value

a list with up to 5 slots corresponding to different units and named "texts", "sentences", "words", "morphemes". Each slot contains a data frame where each line describe an occurrence of the corresponding unit.

References

http://lacito.vjf.cnrs.fr/pangloss/index_en.html

Examples

path <- system.file("exampleData", "FOURMI.xml", package="interlineaR")
corpus <- read.pangloss(path)
head(corpus$morphemes)

[Package interlineaR version 1.0 Index]