consecutiveRUNS.run {detectRUNS}R Documentation

Main function to detect genomic RUNS (ROHom/ROHet) using the consecutive method

Description

This is the main detectRUNS function to scan the genome for runs (of homozygosity or heterozygosity) using the consecutive method (Marras et al. 2015, Animal Genetics 46(2):110-121). All parameters to detect runs (e.g. minimum n. of SNP, max n. of missing genotypes, max n. of opposite genotypes etc.) are specified here. Input data are in the ped/map Plink format (https://www.cog-genomics.org/plink/1.9/input#ped)

Usage

consecutiveRUNS.run(genotypeFile, mapFile, ROHet = FALSE,
  maxOppRun = 0, maxMissRun = 0, minSNP = 15, minLengthBps = 1000,
  maxGap = 10^6)

Arguments

genotypeFile

genotype (.ped) file path

mapFile

map file (.map) file path

ROHet

should we look for ROHet or ROHom? (default = FALSE)

maxOppRun

max n. of opposite genotype SNPs in the run (default = 0)

maxMissRun

max n. of missing SNPs in the run (default = 0)

minSNP

minimum n. of SNP in a RUN (default = 15)

minLengthBps

minimum length of run in bps (defaults to 1000 bps = 1 kbps)

maxGap

max distance between consecutive SNP in a window to be still considered a potential run (defaults to 10^6)

Details

This function scans the genome (diploid) for runs using the consecutive method. This is a wrapper function for many component functions that handle the input data (ped/map files), performs internal conversions, accepts parameters specifications, selects the statistical method to detect runs (sliding windows, consecutive loci) and whether runs of homozygosity (RoHom) or of heterozygosity (RoHet) are looked for.

In the ped file, the groups samples belong to can be specified (first column). This is important if comparisons between human ethnic groups or between animal breeds or plant varieties or biological populations are to be performed. Also, if cases and controls are to be compared, this is the place where this information needs to be specified.

This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics of the results (see plot and statistic functions in this manual, and/or refer to the vignette of detectRUNS).

Value

A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")
# calculating runs with consecutive run approach
## Not run: 
# skipping runs calculation
runs <- consecutiveRUNS.run(genotypeFile, mapFile, minSNP = 15, ROHet = FALSE,
maxOppRun = 0, maxMissRun = 0, maxGap=10^6,
minLengthBps = 100000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.consecutive.csv", package="detectRUNS")
colClasses <- c(rep("character", 3), rep("numeric", 4)  )
runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE,
colClasses = colClasses)


[Package detectRUNS version 0.9.6 Index]