read_align {qlcData} | R Documentation |
Reading different versions of linguistic multialignments.
Description
Multialignments of strings are a central step for historical linguistics (quite similar to multialignments in bioinformatics). There is no consensus (yet) about the file-structure for multialignments in linguistics. Currently, this functions offers to read various flavours of multialignment, trying to harmonize the internal R-structure.
Usage
read_align(file, flavor)
Arguments
file |
Multialignment to be read |
flavor |
Currently two flavours are implemented |
Details
The flavor "PAD"
refers to the Phonetische Atlas Deutschlands, which provides multialignments for german dialects. The flavor "BDPA"
refers to the Benchmark Database for Phonetic Alignments.
Value
Multialignment-files often contain various different kinds of information. An attempt is made to turn the data into a list with the following elements:
meta |
: Metadata |
align |
: The actual alignments as a dataframe. When IDs are present in the original file, they are used as rownames. Some attempt is made to add useful column names. |
doculects |
: The rows of the alignment normally are some kind of doculects ("languages", "dialects"). However, because these doculects might occur more than once (when two different, but cognate words from a languages are included) these names are not used as rownames of |
annotations |
: The columns of a multialignment can have annotations, e.g. metathesis or orthographic standard. These annotations are saved here as a dataframe with the same number of columns as the |
Author(s)
Michael Cysouw <cysouw@mac.com>
References
BDPA is available at https://alignments.lingpy.org. PAD is available at https://github.com/cysouw/PAD/