read.pangloss {interlineaR}R Documentation

Read a file in the format used in the pangloss collection

Description

The pangloss collection (http://lacito.vjf.cnrs.fr/pangloss/index_en.html) is a large collection of interlinearized texts.

Usage

read.pangloss(url, DOI = NULL, get.texts = TRUE, get.sentences = TRUE,
  get.words = TRUE, get.morphemes = TRUE)

Arguments

url

a length one character vector with the url of the document to be imported

DOI

an unique identifier

get.texts

should the 'texts' data.frame be included in the result ?

get.sentences

should the 'sentences' data.frame be included in the result ?

get.words

should the 'words' data.frame be included in the result ?

get.morphemes

should the 'morphemes' data.frame be included in the result ?

Value

a list with up to 5 slots corresponding to different units and named "texts", "sentences", "words", "morphemes". Each slot contains a data frame where each line describe an occurrence of the corresponding unit.

References

http://lacito.vjf.cnrs.fr/pangloss/index_en.html

Examples

path <- system.file("exampleData", "FOURMI.xml", package="interlineaR")
corpus <- read.pangloss(path)
head(corpus$morphemes)

[Package interlineaR version 1.0 Index]