deSilvaFreq {polysat} | R Documentation |
Estimate Allele Frequencies with EM Algorithm
Description
This function uses the method of De Silva et al. (2005) to estimate allele frequencies under polysomic inheritance with a known selfing rate.
Usage
deSilvaFreq(object, self, samples = Samples(object),
loci = Loci(object), initNull = 0.15,
initFreq = simpleFreq(object[samples, loci]),
tol = 1e-08, maxiter = 1e4)
Arguments
object |
A |
self |
A number between 1 and 0, indicating the rate of selfing. |
samples |
An optional character vector indicating a subset of samples to use in the calculation. |
loci |
An optional character vector indicating a subset of loci for which to calculate allele frequencies. |
initNull |
A single value or numeric vector indicating initial frequencies to use for the null allele at each locus. |
initFreq |
A data frame containing allele frequencies (for non-null loci) to use
for initialization. This needs to be in the same format as the output
of |
tol |
The tolerance level for determining when the results have converged.
Where |
maxiter |
The maximum number of iterations that will be performed for each locus and population. |
Details
Most of the SAS code from the supplementary material of De Silva
et al.
(2005) is translated directly into the R code for this function. The
SIMSAMPLE (or CreateRandomSample in the SAS code) function is omitted
so that the actual allelic phenotypes from the dataset can be used
instead of simulated phenotypes. deSilvaFreq
loops through each locus and population, and in each loop tallies the
number of alleles and sets up matrices using GENLIST, PHENLIST, RANMUL, SELFMAT,
and CONVMAT as described in the paper.
Frequencies of each allelic phenotype are then tallied
across all samples in that population with non-missing data at the
locus. Initial allele
frequencies for that population and locus are then extraced from
initFreq
and adjusted according to initNull
. The EM
iteration then begins for that population and locus, as described in the
paper (EXPECTATION, GPROBS, and MAXIMISATION).
Each repetition of the EM algorithm includes an expectation and maximization step. The expectation step uses allele frequencies and the selfing rate to calculate expected genotype frequencies, then uses observed phenotype frequencies and expected genotype frequencies to estimate genotype frequencies for the population. The maximization step uses the estimated genotype frequencies to calculate a new set of allele frequencies. The process is repeated until allele frequencies converge.
In addition to returning a data frame of allele frequencies,
deSilvaFreq
also prints to the console the number of EM
repetitions used for each population and locus. When each locus and
each population is begun, a message is printed to the console so that
the user can monitor the progress of the computation.
Value
A data frame containing the estimated allele frequencies. The row names
are population names from PopNames(object)
. The first column
shows how many genomes each population has. All other columns represent
alleles (including one null allele per locus). These column names are
the locus name and allele name separated by a period.
Note
It is possible to exceed memory limits for R if a locus has too many
alleles in a population (e.g. 15 alleles in a tetraploid if the memory
limit is 1535 Mb, see memory.limit
).
De Silva et al. mention that their estimation method could be extended to the case of disomic inheritence. A method for disomic inheritence is not implemented here, as it would require knowledge of which alleles belong to which isoloci.
De Silva et al. also suggest a means of estimating the selfing rate with a least-squares method. Using the notation in the source code, this would be:
lsq <- smatt %*% EP - rvec
self <- as.vector((t(EP - rvec) %*% lsq)/(t(lsq) %*% lsq))
However, in my experimentation with this calculation, it sometimes yields selfing rates greater than one. For this reason, it is not implemented here.
Author(s)
Lindsay V. Clark
References
De Silva, H. N., Hall, A. J., Rikkerink, E., and Fraser, L. G. (2005) Estimation of allele frequencies in polyploids under certain patterns of inheritance. Heredity 95, 327–334
See Also
simpleFreq
, write.freq.SPAGeDi
,
GENLIST
Examples
## Not run:
## An example with a long run time due to the number of alleles
# create a dataset for this example
mygen <- new("genambig", samples=c(paste("A", 1:100, sep=""),
paste("B", 1:100, sep="")),
loci=c("loc1", "loc2"))
PopNames(mygen) <- c("PopA", "PopB")
PopInfo(mygen) <- c(rep(1, 100), rep(2, 100))
mygen <- reformatPloidies(mygen, output="one")
Ploidies(mygen) <- 4
Usatnts(mygen) <- c(2, 2)
Description(mygen) <- "An example for allele frequency calculation."
# create some genotypes at random for this example
for(s in Samples(mygen)){
Genotype(mygen, s, "loc1") <- sample(seq(120, 140, by=2),
sample(1:4, 1))
}
for(s in Samples(mygen)){
Genotype(mygen, s, "loc2") <- sample(seq(130, 156, by=2),
sample(1:4, 1))
}
# make one genotype missing
Genotype(mygen, "B4", "loc2") <- Missing(mygen)
# view the dataset
summary(mygen)
viewGenotypes(mygen)
# calculate the allele frequencies if the rate of selfing is 0.2
myfrequencies <- deSilvaFreq(mygen, self=0.2)
# view the results
myfrequencies
## End(Not run)
## An example with a shorter run time, for checking that the funciton
## is working. Genotype simulation is also a bit more realistic here.
# Create a dataset for the example.
mygen <- new("genambig", samples=paste("A", 1:100, sep=""), loci="loc1")
PopNames(mygen) <- "PopA"
PopInfo(mygen) <- rep(1, 100)
mygen <- reformatPloidies(mygen, output="one")
Ploidies(mygen) <- 4
Usatnts(mygen) <- 2
for(s in Samples(mygen)){
alleles <- unique(sample(c(122,124,126,0), 4, replace=TRUE,
prob = c(0.3, 0.2, 0.4, 0.1)))
Genotype(mygen, s, "loc1") <- alleles[alleles != 0]
if(length(Genotype(mygen, s, "loc1"))==0)
Genotype(mygen, s, "loc1") <- Missing(mygen)
}
# We have created a random mating populations with four alleles
# including one null. The allele frequencies are given in the
# 'prob' argument.
# Estimate allele frequencies
myfreq <- deSilvaFreq(mygen, self=0.01)
myfreq