genotype {genetics} | R Documentation |
Genotype or Haplotype Objects.
Description
genotype
creates a genotype object.
haplotype
creates a haplotype object.
is.genotype
returns TRUE
if x
is of class
genotype
is.haplotype
returns TRUE
if x
is of class
haplotype
as.genotype
attempts to coerce its argument into an object of
class genotype
.
as.genotype.allele.count
converts allele counts (0,1,2) into
genotype pairs ("A/A", "A/B", "B/B").
as.haplotype
attempts to coerce its argument into an object of
class haplotype
.
nallele
returns the number of alleles in an object of class
genotype
.
Usage
genotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
reorder = c("yes", "no", "default", "ascii", "freq"),
allow.partial.missing=FALSE, locus=NULL,
genotypeOrder=NULL)
haplotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
reorder="no", allow.partial.missing=FALSE, locus=NULL,
genotypeOrder=NULL)
is.genotype(x)
is.haplotype(x)
as.genotype(x, ...)
## S3 method for class 'allele.count'
as.genotype(x, alleles=c("A","B"), ... )
as.haplotype(x, ...)
## S3 method for class 'genotype'
print(x, ...)
nallele(x)
Arguments
x |
either an object of class |
a1 , a2 |
vector(s) or matrix containing two alleles for each individual. See details, below. |
alleles |
names (and order if |
sep |
character separator or column number used to divide
alleles when |
remove.spaces |
logical indicating whether spaces and tabs will be removed from a1 and a2 before processing. |
reorder |
how should alleles within an individual be reordered.
If |
allow.partial.missing |
logical indicating whether one allele is
permitted to be missing. When set to |
locus |
object of class locus, gene, or marker, holding information about the source of this genotype. |
genotypeOrder |
character, vector of genotype/haplotype names so that further functions can sort genotypes/haplotypes in wanted order |
... |
optional arguments |
Details
Genotype objects hold information on which gene or marker alleles were observed for different individuals. For each individual, two alleles are recorded.
The genotype class considers the stored alleles to be unordered, i.e., "C/T" is equivalent to "T/C". The haplotype class considers the order of the alleles to be significant so that "C/T" is distinct from "T/C".
When calling genotype
or haplotype
:
If only
a1
is provided and is a character vector, it is assumed that each element encodes both alleles. In this case, ifsep
is a character string,a1
is assumed to be coded as "Allele1<sep>Allele2". Ifsep
is a numeric value, it is assumed that character locations1:sep
contain allele 1 and that remaining locations contain allele 2.If
a1
is a matrix, it is assumed that column 1 contains allele 1 and column 2 contains allele 2.If
a1
anda2
are both provided, each is assumed to contain one allele value so that the genotype for an individual is obtained bypaste(a1,a2,sep="/")
.
If remove.spaces
is TRUE, (the default) any whitespace
contained in a1
and a2
is removed when the genotypes are
created. If whitespace is used as the separator, (eg "C C", "C T",
...), be sure to set remove.spaces to FALSE.
When the alleles are explicitly specified using the alleles
argument, all potential alleles not present in the list will be
converted to NA
.
NOTE: genotype
assumes that the order of the alleles is not important
(E.G., "A/C" == "C/A"). Use class haplotype
if order is significant.
If genotypeOrder=NULL
(the default setting), then
expectedGenotypes
is used to get standard sorting order.
Only unique values in genotypeOrder
are used, which in turns
means that the first occurrence prevails. When genotypeOrder
is
given some genotype names, but not all that appear in the data, the
rest (those in the data and possible combinations based on allele
variants) is automatically added at the end of
genotypeOrder
. This puts "missing" genotype names at the end of
sort order. This feature is especially useful when there are a lot of
allele variants and especially in haplotypes. See examples.
Value
The genotype class extends "factor" and haplotype extends genotype. Both classes have the following attributes:
levels |
character vector of possible genotype/haplotype values
stored coded by |
allele.names |
character vector of possible alleles. For a SNP, these might be c("A","T"). For a variable length dinucleotyde repeat this might be c("136","138","140","148"). |
allele.map |
matrix encoding how the factor levels correspond to
alleles. See the source code to |
genotypeOrder |
character, genotype/haplotype names in defined order that can used for sorting in various functions. Note that this slot stores both ordered and unordered genotypes i.e. "A/B" and "B/A". |
Author(s)
Gregory R. Warnes greg@warnes.net and Friedrich Leisch.
See Also
HWE.test
,
allele
,
homozygote
,
heterozygote
,
carrier
,
summary.genotype
,
allele.count
,
sort.genotype
,
genotypeOrder
,
locus
,
gene
,
marker
, and
%in%
for default %in% method
Examples
# several examples of genotype data in different formats
example.data <- c("D/D","D/I","D/D","I/I","D/D",
"D/D","D/D","D/D","I/I","")
g1 <- genotype(example.data)
g1
example.data2 <- c("C-C","C-T","C-C","T-T","C-C",
"C-C","C-C","C-C","T-T","")
g2 <- genotype(example.data2,sep="-")
g2
example.nosep <- c("DD", "DI", "DD", "II", "DD",
"DD", "DD", "DD", "II", "")
g3 <- genotype(example.nosep,sep="")
g3
example.a1 <- c("D", "D", "D", "I", "D", "D", "D", "D", "I", "")
example.a2 <- c("D", "I", "D", "I", "D", "D", "D", "D", "I", "")
g4 <- genotype(example.a1,example.a2)
g4
example.mat <- cbind(a1=example.a1, a1=example.a2)
g5 <- genotype(example.mat)
g5
example.data5 <- c("D / D","D / I","D / D","I / I",
"D / D","D / D","D / D","D / D",
"I / I","")
g5 <- genotype(example.data5,rem=TRUE)
g5
# show how genotype and haplotype differ
data1 <- c("C/C", "C/T", "T/C")
data2 <- c("C/C", "T/C", "T/C")
test1 <- genotype( data1 )
test2 <- genotype( data2 )
test3 <- haplotype( data1 )
test4 <- haplotype( data2 )
test1==test2
test3==test4
test1=="C/T"
test1=="T/C"
test3=="C/T"
test3=="T/C"
## also
test1
test1
test3
test1
test1
test3
test3
## "Messy" example
m3 <- c("D D/\t D D","D\tD/ I", "D D/ D D","I/ I",
"D D/ D D","D D/ D D","D D/ D D","D D/ D D",
"I/ I","/ ","/I")
genotype(m3)
summary(genotype(m3))
m4 <- c("D D","D I","D D","I I",
"D D","D D","D D","D D",
"I I"," "," I")
genotype(m4,sep=1)
genotype(m4,sep=" ",remove.spaces=FALSE)
summary(genotype(m4,sep=" ",remove.spaces=FALSE))
m5 <- c("DD","DI","DD","II",
"DD","DD","DD","DD",
"II"," "," I")
genotype(m5,sep=1)
haplotype(m5,sep=1,remove.spaces=FALSE)
g5 <- genotype(m5,sep="")
h5 <- haplotype(m5,sep="")
heterozygote(g5)
homozygote(g5)
carrier(g5,"D")
g5[9:10] <- haplotype(m4,sep=" ",remove=FALSE)[1:2]
g5
g5[9:10]
allele(g5[9:10],1)
allele(g5,1)[9:10]
# drop unused alleles
g5[9:10,drop=TRUE]
h5[9:10,drop=TRUE]
# Convert allele.counts into genotype
x <- c(0,1,2,1,1,2,NA,1,2,1,2,2,2)
g <- as.genotype.allele.count(x, alleles=c("C","T") )
g
# Use of genotypeOrder
example.data <- c("D/D","D/I","I/D","I/I","D/D",
"D/D","D/I","I/D","I/I","")
summary(genotype(example.data))
genotypeOrder(genotype(example.data))
summary(genotype(example.data, genotypeOrder=c("D/D", "I/I", "D/I")))
summary(genotype(example.data, genotypeOrder=c( "D/I")))
summary(haplotype(example.data, genotypeOrder=c( "I/D", "D/I")))
example.data <- genotype(example.data)
genotypeOrder(example.data) <- c("D/D", "I/I", "D/I")
genotypeOrder(example.data)