HGDP.bedassle.data {BEDASSLE} | R Documentation |
The Eurasian subset of the HGDP dataset used in example BEDASSLE analyses
Description
The allelic counts, sample sizes, geographic distances, ecological distances, and population metadata from the 38 human populations used in example BEDASSLE analyses, subsetted from the Human Genome Diversity Panel (HGDP) dataset.
Usage
data(HGDP.bedassle.data)
Format
The format is: List of 7
- $ allele.counts :
int [1:38, 1:1000] 12 16 5 17 4 14 20 5 34 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "Adygei" "Basque" "Italian" "French" ...
.. ..$ : chr [1:1000] "rs13287637" "rs17792496" "rs1968588" ...
- $ sample.sizes :
int [1:38, 1:1000] 34 48 24 56 30 50 56 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "Adygei" "Basque" "Italian" "French" ...
.. ..$ : chr [1:1000] "rs13287637" "rs17792496" "rs1968588" ...
- $ GeoDistance :
num [1:38, 1:38] 0 1.187 0.867 1.101 1.247 ...
- $ EcoDistance :
num [1:38, 1:38] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "1" "2" "3" "4" ...
.. ..$ : chr [1:38] "1" "2" "3" "4" ...
- $ number.of.populations:
int 38
- $ number.of.loci :
int 1000
- $ hgdp.metadata :
'data.frame': 38 obs. of 3 variables:
- ..$ Population:
chr [1:38] "Adygei" "Basque" "Italian" ...
- ..$ Latitude :
chr [1:38] "44" "43" "46" "46" ...
- ..$ Longitude :
chr [1:38] "39" "0" "10" "2" ...
Details
- allele.counts
A matrix of allelic count data, for which
nrow =
the number of populations andncol =
the number of bi-allelic loci sampled. Each cell gives the number of times allele ‘1’ is observed in each population. The choice of which allele is allele ‘1’ is arbitrary, but must be consistent across all populations at a locus.- sample.sizes
A matrix of sample sizes, for which
nrow =
the number of populations andncol =
the number of bi-allelic loci sampled (i.e. - the dimensions ofsample.sizes
must match those ofcounts
). Each cell gives the number of chromosomes successfully genotyped at each locus in each population.- Geo.Distance
Pairwise geographic distance (
D_{i,j}
). This may be Euclidean, or, if the geographic scale of sampling merits it, great-circle distance. In the case of this dataset, it is great-circle distance.- Eco.Distance
Pairwise ecological distance(s) (
E_{i,j}
), which may be continuous (e.g. - difference in elevation) or binary (same or opposite side of some hypothesized barrier to gene flow). In this case, the ecological distance is binary, representing whether a pair of populations occurs on the same side, or on opposite sides, of the Himalayas.- number.of.populations
The number of populations in the analysis. This should be equal to
nrow(
counts)
. In this dataset, there are 38 populations sampled.- number.of.loci
The number of loci in the analysis. This should be equal to
ncol(
counts)
. In this dataset, there are 1000 loci sampled.- hgdp.metadata
This data frame contains the metadata on the populations included in the analysis, including:
Population name
Latitude
Longitude
Source
Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics 2008.
Li et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008.
References
Bradburd, G.S., Ralph, P.L., and Coop, G.M. Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution 2013.
Examples
## see \command{MCMC}, \command{MCMC_BB}, \command{calculate.pariwise.Fst},
## \command{calculate.all.pairwise.Fst}, and \command{Covariance} for usage.