R: The Eurasian subset of the HGDP dataset used in example...

HGDP.bedassle.data {BEDASSLE}

R Documentation

The Eurasian subset of the HGDP dataset used in example BEDASSLE analyses

Description

The allelic counts, sample sizes, geographic distances, ecological distances, and population metadata from the 38 human populations used in example BEDASSLE analyses, subsetted from the Human Genome Diversity Panel (HGDP) dataset.

Usage

data(HGDP.bedassle.data)

Format

The format is: List of 7

$ allele.counts :: int [1:38, 1:1000] 12 16 5 17 4 14 20 5 34 ...

..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "Adygei" "Basque" "Italian" "French" ...
.. ..$ : chr [1:1000] "rs13287637" "rs17792496" "rs1968588" ...

$ sample.sizes :: int [1:38, 1:1000] 34 48 24 56 30 50 56 ...

..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "Adygei" "Basque" "Italian" "French" ...
.. ..$ : chr [1:1000] "rs13287637" "rs17792496" "rs1968588" ...

$ GeoDistance :: num [1:38, 1:38] 0 1.187 0.867 1.101 1.247 ...
$ EcoDistance :: num [1:38, 1:38] 0 0 0 0 0 0 0 0 0 0 ...

..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:38] "1" "2" "3" "4" ...
.. ..$ : chr [1:38] "1" "2" "3" "4" ...

$ number.of.populations:: int 38
$ number.of.loci :: int 1000
$ hgdp.metadata :: 'data.frame': 38 obs. of 3 variables:

..$ Population:: chr [1:38] "Adygei" "Basque" "Italian" ...
..$ Latitude :: chr [1:38] "44" "43" "46" "46" ...
..$ Longitude :: chr [1:38] "39" "0" "10" "2" ...

Details

allele.counts

A matrix of allelic count data, for which nrow = the number of populations and ncol = the number of bi-allelic loci sampled. Each cell gives the number of times allele ‘1’ is observed in each population. The choice of which allele is allele ‘1’ is arbitrary, but must be consistent across all populations at a locus.

sample.sizes

A matrix of sample sizes, for which nrow = the number of populations and ncol = the number of bi-allelic loci sampled (i.e. - the dimensions of sample.sizes must match those of counts). Each cell gives the number of chromosomes successfully genotyped at each locus in each population.

Geo.Distance

Pairwise geographic distance (D_{i,j}). This may be Euclidean, or, if the geographic scale of sampling merits it, great-circle distance. In the case of this dataset, it is great-circle distance.

Eco.Distance

Pairwise ecological distance(s) (E_{i,j}), which may be continuous (e.g. - difference in elevation) or binary (same or opposite side of some hypothesized barrier to gene flow). In this case, the ecological distance is binary, representing whether a pair of populations occurs on the same side, or on opposite sides, of the Himalayas.

number.of.populations

The number of populations in the analysis. This should be equal to nrow(counts). In this dataset, there are 38 populations sampled.

number.of.loci

The number of loci in the analysis. This should be equal to ncol(counts). In this dataset, there are 1000 loci sampled.

hgdp.metadata

This data frame contains the metadata on the populations included in the analysis, including:

Population name
Latitude
Longitude

Source

Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics 2008.
Li et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008.

References

Bradburd, G.S., Ralph, P.L., and Coop, G.M. Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution 2013.

Examples

## see \command{MCMC}, \command{MCMC_BB}, \command{calculate.pariwise.Fst}, 
## \command{calculate.all.pairwise.Fst}, and \command{Covariance} for usage.

[Package BEDASSLE version 1.6.1 Index]