GeneticSubsetter {GeneticSubsetter}R Documentation

Genetic Subsetter

Description

This package contains a set of tools that can be used to select a subset from a larger population, using genetic data. Two criteria are used to identify subsets, in seperate functions: Expected Heterozygosity (HET; called PIC in earlier versions and in the paper describing this package) and the Mean of Transformed Kinships (MTK).

Details

When selecting subsets of genotypes, two factors are important to consider: the criteria by which to judge subsets, and the method used to identfy the set of genotypes that best fit that criteria. Two criteria are Expected Heterozygosity (HET) and the Mean of Transformed Kinships (MTK). Tests suggest that of these two criteria, Expected Heterozygosity is better if the resulting subset will be used for rare-trait discovery, while MTK is better if the resulting subset will be used for genome-wide association scanning (Graebner et al. 2015). To reach subsets with a high Expected Heterozygosity or MTK, CoreSetter systematically removes genotypes from the full set, creating a full ranking of genotype's contributions to their respective criteria. When the HET criterion is selected, CoreSetter uses the same method and criteria described in Munoz-Amatrain et al. (2014), except CoreSetter uses a more computationally efficient approach, and CoreSetter can consider heterozygous markers. Alternatively, CoreSetterCombined works to systematically improve a user-defined number of random subsets via single-genotype replacements, until no replacement can increase the selected criteria. This later method generally returns subsets with a higher Heterozyosity or MTK, but are subset-size specific, take more time to compute, and will not always return identical results in subsequent runs.

Author(s)

Ryan C. Graebner <ryan.graebner@gmail.com> and Alfonso Cuesta-Marcos

References

Graebner RC, Hayes PM, Hagerty CH, Cuesta-Marcos A (2016) A comparison of polymorphism information content and mean of transformed kinships as criteria for selection informative subsets of barley (Hordeum vulgare L. s. l) from the USDA Barley Core Collection. Genet Resour Crop Evol 63:477-482. Munoz-Amatrain M, Cuesta-Marcos A, Endelman JB, Comadran J, Bonman JM (2014) The USDA barley core collection: genetic diversity, population structure, and potential for genome-wide association studies. PloS One 9:e94688.

Examples

data("genotypes")
CoreSetter(genotypes,criterion="HET",save=colnames(genotypes)[c(1,5,9)])

[Package GeneticSubsetter version 0.8 Index]