mat_scramble {gscramble}R Documentation

Scramble a matrix of genotype data

Description

This function assumes that M is a matrix with L rows (number of markers) and 2 * N (N = number of individuals) columns. There are two ways that the data might be permuted. In the first, obtained with preserve_haplotypes = FALSE, the position of missing data within the matrix is held constant, but all non-missing sites within a row (i.e. all gene copies at a locus) get scrambled amongst the samples. In the second way, just the columns are permuted. This preserves haplotypes in the data, if there are any. The second approach should only be used if haplotypes are inferred in the individuals.

Usage

mat_scramble(
  M,
  preserve_haplotypes = FALSE,
  row_groups = NULL,
  preserve_individuals = FALSE
)

Arguments

M

a matrix with L rows (number of markers) and 2 * N columns where N is the number of individuals. Missing data must be coded as NA

preserve_haplotypes

logical indicating whether the haplotypes set to be TRUE

row_groups

if not NULL must be a list of indexes of adjacent rows that are all in the same groups. For example: list(1:10, 11:15, 16:30). They should be in order and complete. In practice, these should correspond to the indexes of markers on different chromosomes.

preserve_individuals

logical indicating whether the genes within each individual should stay togeter.

Details

There is now an additional way of permuting: if preserve_individuals = TRUE, then entire individuals are permuted. If preserve_haplotypes = FALSE, then the gene copies at each locus are randomly ordered within each individual before permuating them. If preserve_haplotypes = TRUE then that initial permutation is not done. This should only be done if the individuals are phased and that phasing is represented in how the genotypes are stored in the matrix.

Value

This function returns a matrix of the same dimensions and storage.mode as the input, M; however the elements have been permuted according to the options specified by the users.

Examples

# make a matrix with alleles named as I.M.g, where I is individual
# number, M is marker number, and g is either "a" or "b" depending
# on which gene copy in the diploid it is.  4 indivs and 7 markers...
Mat <- matrix(
 paste(
   rep(1:4, each = 7 * 2),
   rep(1:7, 4 * 2),
   rep(c("a", "b"), each = 7),
   sep = "."
 ),
 nrow = 7
)

# without preserving haplotypes
S1 <- mat_scramble(Mat)

# preserving haplotypes with markers 1-7 all on one chromosome
S2 <- mat_scramble(Mat, preserve_haplotypes = TRUE)

# preserving haplotypes with markers 1-3 on one chromosome and 4-7 on another
S3 <- mat_scramble(Mat, row_groups = list(1:3, 4:7))

# preserving individuals, but not haplotypes, with two chromosomes
S4 <- mat_scramble(Mat, row_groups = list(1:3, 4:7), preserve_individuals = TRUE)

# preserving individuals by chromosome, but not haplotypes, with two chromosomes
S5 <- mat_scramble(Mat, row_groups = list(1:3, 4:7), preserve_individuals = "BY_CHROM")

# preserving individuals by chromosome, and preserving haplotypes, with two chromosomes
S6 <- mat_scramble(Mat, row_groups = list(1:3, 4:7),
                 preserve_individuals = "BY_CHROM", preserve_haplotypes = TRUE)

[Package gscramble version 1.0.1 Index]