codeMarkers {statgenGWAS} | R Documentation |
Code and impute markers
Description
codeMarkers
codes markers in a gData
object and optionally
performs imputation of missing values as well.
The function performs the following steps:
replace strings in
naStrings
byNA
.remove genotypes with a fraction of missing values higher than
nMissGeno
.remove SNPs with a fraction of missing values higher than
nMiss
.recode SNPs to numerical values.
remove SNPs with a minor allele frequency lower than
MAF
.optionally remove duplicate SNPs.
optionally impute missing values.
repeat steps 5. and 6. if missing values are imputed.
Usage
codeMarkers(
gData,
refAll = "minor",
nMissGeno = 1,
nMiss = 1,
MAF = NULL,
MAC = NULL,
removeDuplicates = TRUE,
keep = NULL,
impute = TRUE,
imputeType = c("random", "fixed", "beagle"),
fixedValue = NULL,
naStrings = NA,
verbose = FALSE
)
Arguments
gData |
An object of class |
refAll |
A character string indicating the reference allele used when
recoding markers. |
nMissGeno |
A numerical value between 0 and 1. Genotypes with a
fraction of missing values higher than |
nMiss |
A numerical value between 0 and 1. SNPs with a fraction of
missing values higher than |
MAF |
A numerical value between 0 and 1. SNPs with a Minor Allele
Frequency (MAF) below this value will be removed. Only one of |
MAC |
A numerical value. SNPs with Minor Allele Count (MAC) below this
value will be removed. Only one of |
removeDuplicates |
Should duplicate SNPs be removed? |
keep |
A vector of SNPs that should never be removed in the whole process. |
impute |
Should imputation of missing values be done? |
imputeType |
A character string indicating what kind of imputation of
values should be done.
|
fixedValue |
A numerical value used for replacing missing values in
case |
naStrings |
A character vector of strings to be treated as NA. |
verbose |
Should a summary of the performed steps be printed? |
Value
A copy of the input gData
object with markers replaced by
coded and imputed markers.
References
S R Browning and B L Browning (2007) Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084-1097. doi:10.1086/521987
Examples
## Create markers
markers <- matrix(c(
"AA", "AB", "AA", "BB", "BA", "AB", "AA", "AA", NA, "AA",
"AA", "AA", "BB", "BB", "AA", "AA", "BB", "AA", NA, "AA",
"AA", "BA", "AB", "BB", "AB", "AB", "AA", "BB", NA, "AA",
"AA", "AA", "BB", "BB", "AA", "AA", "AA", "AA", NA, "AA",
"AA", "AA", "BB", "BB", "AA", "BB", "BB", "BB", "AB", "AA",
"AA", "AA", "BB", "BB", "AA", NA, "BB", "AA", NA, "AA",
"AB", "AB", "BB", "BB", "BB", "AA", "BB", "BB", NA, "AB",
"AA", "AA", NA, "BB", NA, "AA", "AA", "AA", "AA", "AA",
"AA", NA, NA, "BB", "BB", "BB", "BB", "BB", "AA", "AA",
"AA", NA, "AA", "BB", "BB", "BB", "AA", "AA", NA, "AA"),
ncol = 10, byrow = TRUE, dimnames = list(paste0("IND", 1:10),
paste0("SNP", 1:10)))
## create object of class 'gData'.
gData <- createGData(geno = markers)
## Code markers by minor allele, no imputation.
gDataCoded1 <- codeMarkers(gData = gData, impute = FALSE)
## Code markers by reference alleles, impute missings by fixed value.
gDataCoded2 <- codeMarkers(gData = gData,
refAll = rep(x = c("A", "B"), times = 5),
impute = TRUE, imputeType = "fixed",
fixedValue = 1)
## Code markers by minor allele, impute by random value.
gDataCoded3 <- codeMarkers(gData = gData, impute = TRUE,
imputeType = "random")