R: Parse a dictionary in XML LIFT (Lexicon Interchange FormaT)...

read.lift {interlineaR}

R Documentation

Parse a dictionary in XML LIFT (Lexicon Interchange FormaT) vocabulary and turn it into a set of data.frame

Description

The dictionary is turned into a list of up to four data frame: "entries", "senses", "examples" and "relations". The data frame are pointing to each other through IDs, following a relational data model.

Usage

read.lift(file, vernacular.languages, analysis.languages = "en",
  get.entry = TRUE, get.sense = TRUE, get.example = TRUE,
  get.relation = TRUE, entry.fields = available.entry.fields(),
  sense.fields = available.sense.fields(),
  example.fields = available.example.fields(),
  relation.fields = available.relation.fields(), simplify = FALSE,
  sep = ";")

Arguments

`file`	: a length-one character vector containing the path to a LIFT XML document.
`vernacular.languages`	character vector: the code of the vernacular language.
`analysis.languages`	character vector: code of the object language used in the glosses and analyses.
`get.entry`	logical length-1 vector: include the entries table in the result?
`get.sense`	logical length-1 vector: include the senses table in the result?
`get.example`	logical length-1 vector: include the examples table in the result?
`get.relation`	logical length-1 vector: include the relations table in the result?
`entry.fields`	character vector: names of the fields to be included in the entries table. See available.entry.fields() for the complete list of the available fields.
`sense.fields`	character vector: names of the fields to be included in the senses table. See available.sense.fields() for the complete list of the available fields.
`example.fields`	character vector: names of the fields to be included in the examples table. See available.example.fields() for the complete list of the available fields.
`relation.fields`	character vector: names of the fields to be included in the relations table. See available.relation.fields() for the complete list of the available fields.
`simplify`	logical length-1 vector: if true, columns containing only empty values are removed from all data frame.
`sep`	character vector: the character used to join multiple notes in the same language.

Details

"Field" in this document denote a piece of information in LIFT, such as the "gloss" in a sense or "citation form" of an entry. A field may correspond to several columns in the resulting data frame, since fields are multilingual. "gloss" is an analysis field, thus if two analysis.languages are declared, for instance "en" and "fr", then two columns will be present, gloss.en and gloss.fr, in the senses data frame. The "citation form" field, on the other hand, is an vernacular language field, thus if several vernacular fields are declared, several form columns will be present in the entries data frame.

Value

a list with up to four slots named "entries", "senses", "examples" and "relations", each slot containing a data.frame

References

http://code.google.com/p/lift-standard

Examples

path <- system.file("exampleData", "tuwariDictionary.lift", package="interlineaR")
dictionary <- read.lift(path, vernacular.languages="tww")

# Reduce the size of the data frames by filtering to columns actually containing something...
dictionary <- read.lift(path, vernacular.languages="tww", simplify=TRUE)

# Get information in the different analysis languages used in the document (english and tok pisin)
dictionary <- read.lift(path, vernacular.languages="tww", analysis.languages=c("en", "tpi"))

# Restrict to entries and senses dataframe, and explicitly ask for some fields:
dictionary <- read.lift(
  path,
  vernacular.languages="tww",
  get.example=FALSE,
  get.relation=FALSE,
  entry.fields=c("lexical-unit", "morph-type"),
  sense.fields=c("grammatical-info.value", "gloss", "definition",
  "semantic-domain-ddp4", "grammatical-info.traits")
)

[Package interlineaR version 1.0 Index]