R: Compute ALD.

compute.ALD {asymLD}

R Documentation

Compute ALD.

Description

A function to compute asymmetric Linkage Disequilibrium measures (ALD) for polymorphic genetic data. These measures are identical to the correlation measure (r) for bi-allelic data.

Usage

compute.ALD(dat, tolerance = 0.01)

Arguments

dat

A data.frame with 5 required variables (having the names listed below):

`haplo.freq`	A numeric vector of haplotype frequencies.
`locus1`	A character vector indentifying the first locus.
`locus2`	A character vector indentifying the second locus.
`allele1`	A character vector indentifying the allele at locus 1.
`allele2`	A character vector indentifying the allele at locus 2.

tolerance

A threshold for the sum of the haplotype frequencies. If the sum of the haplotype frequencies is greater than 1+tolerance or less than 1-tolerance an error is returned. The default is 0.01.

Value

The return value is a dataframe with the following components:

`locus1`	The name of the first locus.
`locus2`	The name of the second locus.
`F.1`	Homozygosity (expected under HWP) for locus 1.
`F.1.2`	Conditional homozygosity* for locus1 given locus2.
`F.2`	Homozygosity (expected under HWP) for locus 2.
`F.2.1`	Conditional homozygosity* for locus2 given locus1.
`ALD.1.2`	Asymmetric LD for locus1 given locus2.
`ALD.2.1`	Asymmetric LD for locus2 given locus1.

*Overall weighted haplotype-specific homozygosity for the first locus given the second locus.

Details

A warning message is given if the sum of the haplotype frequencies is greater than 1.01 or less than 0.99 (regardless of the tolerance setting). The haplotype frequencies that are passed to the function are normalized within the function to sum to 1.0 by dividing each frequency by the sum of the passed frequencies.

Examples

library(asymLD)

# An example using haplotype frequencies from Wilson(2010)
data(hla.freqs)
hla.a_b <- hla.freqs[hla.freqs$locus1=="A" & hla.freqs$locus2=="B",]
compute.ALD(hla.a_b)
hla.freqs$locus <- paste(hla.freqs$locus1, hla.freqs$locus2, sep="-")
compute.ALD(hla.freqs[hla.freqs$locus=="C-B",])
# Note: additonal columns on the input dataframe (e.g., "locus" above) are allowed, but
# ignored by the function.

# An example using genotype data from the haplo.stats package
require(haplo.stats)
data(hla.demo)
geno <- hla.demo[,5:8]  #DPB-DPA
label <- unique(gsub(".a(1|2)", "", colnames(geno)))
label <- paste("HLA*",label,sep="")
keep <- !apply(is.na(geno) | geno==0, 1, any)
em.keep  <- haplo.em(geno=geno[keep,], locus.label=label)
hapfreqs.df <- cbind(em.keep$haplotype, em.keep$hap.prob)
#format dataframe for ALD function
names(hapfreqs.df)[dim(hapfreqs.df)[2]] <- "haplo.freq"
names(hapfreqs.df)[1] <- "allele1"
names(hapfreqs.df)[2] <- "allele2"
hapfreqs.df$locus1 <- label[1]
hapfreqs.df$locus2 <- label[2]
head(hapfreqs.df)
compute.ALD(hapfreqs.df)
# Note that there is substantially less variablity (higher ALD) for HLA*DPA1
# conditional on HLA*DPB1 than for HLA*DPB1 conditional on HLA*DPA1, indicating
# that the overall variation for DPA1 is relatively low given specific DPB1 alleles

# An example using SNP data where results are symmetric and equal to the ordinary
# correlation measure (r)
data(snp.freqs)
snps <- c("rs1548306", "rs6923504", "rs4434496", "rs7766854")
compute.ALD(snp.freqs[snp.freqs$locus1==snps[2] & snp.freqs$locus2==snps[3],])

snp.freqs$locus <- paste(snp.freqs$locus1, snp.freqs$locus2, sep="-")
by(snp.freqs,list(snp.freqs$locus),compute.ALD)

# SNP1 & SNP2 : the r correlation & ALD measures are equivalent due to symmetry for
# bi-allelic SNPs
p.AB <- snp.freqs$haplo.freq[1]
p.Ab <- snp.freqs$haplo.freq[2]
p.aB <- snp.freqs$haplo.freq[3]
p.ab <- snp.freqs$haplo.freq[4]
p.A <- p.AB + p.Ab
p.B <- p.AB + p.aB
r.squared <- (p.AB - p.A*p.B)^2 / (p.A*(1-p.A)*p.B*(1-p.B))
sqrt(r.squared) #the r correlation measure
compute.ALD(snp.freqs[snp.freqs$locus1==snps[1] & snp.freqs$locus2==snps[2],])

[Package asymLD version 0.1 Index]