snp.recode {ASRgenomics}R Documentation

Recodes the molecular matrix M for downstream analyses

Description

Reads molecular data in format of bi-allelic nucleotide bases (AA, AG, GG, CC, etc.) and recodes them as 0, 1, 2 and NA to be used in other downstream analyses.

Usage

snp.recode(
  M = NULL,
  map = NULL,
  marker = NULL,
  ref = NULL,
  alt = NULL,
  recoding = c("ATGCto012"),
  na.string = NA,
  rename.markers = TRUE,
  message = TRUE
)

Arguments

M

A character matrix with SNP data of full form (n \times p), with n individuals and p markers Individual and marker names are assigned to rownames and colnames, respectively. Data in matrix is coded as AA, AG, GG, CC, etc (default = NULL).

map

(Optional) A data frame with the map information with p rows. If NULL a dummy map is generated considering a single chromosome and sequential positions for markers and includes reference allele and alternative allele (default = NULL).

marker

A character indicating the name of the column in data frame map with the identification of markers. This is mandatory if map is provided (default = NULL).

ref

A character indicating the name of the column in the map containing the reference allele for recoding. If absent, then conversion will be based on the major allele (most frequent). The marker information of a given individual with two of the specified major alleles in ref will be coded as 2. This is mandatory if map is provided (default = NULL).

alt

A character indicating the name of the column in the map containing the alternative allele for recoding. If absent, then it will be inferred from the data. The marker information of a given individual with two of the specified alleles in alt will be coded as 0 (default = NULL).

recoding

A character indicating the recoding option to be performed. Currently, only the nucleotide bases (AA, AG, ...) to allele count is available ("ATGCto012") (default = "ATGCto012").

na.string

A character that is interpreted as missing values (default = "NA").

rename.markers

If TRUE marker names (as provided in M) will be expanded to store the reference and alternative alleles. For example, from AX-88234566 to AX-88234566_C_A. In the event of unidentified alleles, 0 will be used (default = TRUE).

message

If TRUE diagnostic messages are printed on screen (default = TRUE).

Value

A list with the following two elements:

Examples

# Create bi-allelic base data set.
Mnb <- matrix(c(
  "A-",  NA, "GG",   "CC",   "AT",   "CC",   "AA",   "AA",
  "AAA", NA, "GG",   "AC",   "AT",   "CG",   "AA",   "AT",
  "AA",  NA, "GG",   "CC",   "AA",   "CG",   "AA",   "AA",
  "AA",  NA, "GG",   "AA",   "AA",    NA,    "AA",   "AA",
  "AT",  NA, "GG",   "AA",   "TT",   "CC",   "AT",   "TT",
  "AA",  NA,   NA,   "CC",    NA,    "GG",   "AA",   "AA",
  "AA",  NA,   NA,   "CC",   "TT",   "CC",   "AA",   "AT",
  "TT",  NA, "GG",   "AA",   "AA",   "CC",   "AA",   "AA"),
  ncol = 8, byrow = TRUE, dimnames = list(paste0("ind", 1:8),
                                       paste0("m", 1:8)))
Mnb

# Recode without map (but map is created).
Mr <- snp.recode(M = Mnb, na.string = NA)
Mr$Mrecode
Mr$map

# Create map.
mapnb <- data.frame(
 marker = paste0("m", 1:8),
 reference = c("A", "T", "G", "C", "T", "C", "A", "T"),
 alternative = c("T", "G", "T", "A", "A", "G", "T", "A")
 )
 mapnb

# Recode with map without alternative allele.
Mr <- snp.recode(M = Mnb, map = mapnb, marker = "marker", ref = "reference",
           na.string = NA, rename.markers = TRUE)
Mr$Mrecode
Mr$map

# Notice that the alternative allele is in the map as a regular variable,
# but in the names it is inferred from data (which might be 0 (missing)).

# Recode with map with alternative allele.
Mr <- snp.recode(M = Mnb, map = mapnb, marker = "marker",
 ref = "reference", alt = "alternative",
 na.string = NA, rename.markers = TRUE)
Mr$Mrecode
Mr$map # Now the alternative is also on the names.

# We can also recode without renaming the markers.
Mr <- snp.recode(M = Mnb, map = mapnb, marker = "marker", ref = "reference",
           na.string = NA, rename.markers = FALSE)
Mr$Mrecode
Mr$map # Now the alternative is also on the names.


[Package ASRgenomics version 1.1.4 Index]