R: Label ancestries based on best match to individual labels

admix_label_cols {popkin}

R Documentation

Label ancestries based on best match to individual labels

Description

Returns labels for each ancestry (columns) of an admixture matrix which is the best matching label among the average individual (rows) of each subpopulation. More specifically, each ancestry is associated to the subpopulation label in which its admixture proportion was the highest averaging over all individuals from that subpopulation. If there are two or more ancestries that match to the same label, these are made unique by appending its order of appearance (if the label is "A", then the first column that matches to it is labeled "A1", the next one "A2", etc).

Usage

admix_label_cols(Q, labs)

Arguments

`Q`	The admixture proportions matrix.
`labs`	Subpopulation labels for individuals (rows of `Q`).

Value

The best label assignments for the ancestries (columns of Q), made unique by indexes if there are overlaps.

Examples

# toy admixture matrix with labels for individuals/rows that match well with ancestry/columns
Q <- matrix(
    c(
        0.1, 0.8, 0.1,
        0.1, 0.7, 0.2,
        0.0, 0.4, 0.6,
        0.0, 0.3, 0.7,
        0.9, 0.0, 0.1
    ),
    nrow = 5,
    ncol = 3,
    byrow = TRUE
)
labs <- c('X', 'X', 'Y', 'Y', 'Z')

# to calculate matches and save as column names, do this:
colnames( Q ) <- admix_label_cols( Q, labs )

# expected column names: c('Z', 'X', 'Y')