R: Reading different versions of linguistic multialignments.

read_align {qlcData}

R Documentation

Reading different versions of linguistic multialignments.

Description

Multialignments of strings are a central step for historical linguistics (quite similar to multialignments in bioinformatics). There is no consensus (yet) about the file-structure for multialignments in linguistics. Currently, this functions offers to read various flavours of multialignment, trying to harmonize the internal R-structure.

Usage

read_align(file, flavor)

Arguments

`file`	Multialignment to be read
`flavor`	Currently two flavours are implemented `"PAD"` and `"BDPA"`

Details

The flavor "PAD" refers to the Phonetische Atlas Deutschlands, which provides multialignments for german dialects. The flavor "BDPA" refers to the Benchmark Database for Phonetic Alignments.

Value

Multialignment-files often contain various different kinds of information. An attempt is made to turn the data into a list with the following elements:

`meta`	: Metadata
`align`	: The actual alignments as a dataframe. When IDs are present in the original file, they are used as rownames. Some attempt is made to add useful column names.
`doculects`	: The rows of the alignment normally are some kind of doculects ("languages", "dialects"). However, because these doculects might occur more than once (when two different, but cognate words from a languages are included) these names are not used as rownames of `$align`, but presented separately here.
`annotations`	: The columns of a multialignment can have annotations, e.g. metathesis or orthographic standard. These annotations are saved here as a dataframe with the same number of columns as the `$align` dataframe. The name of the annotation is put in the rownames.

Author(s)

Michael Cysouw <cysouw@mac.com>

References

BDPA is available at https://alignments.lingpy.org. PAD is available at https://github.com/cysouw/PAD/

[Package qlcData version 0.3 Index]