R: Select markers for the pairwise scan.

select_markers_for_pairscan {cape}

R Documentation

Select markers for the pairwise scan.

Description

This function selects markers for the pairwise scan. Beause Cape is computationally intensive, pairscans should not be run on large numbers of markers. As a rule of thumb, 1500 markers in a population of 500 individuals takes about 24 hours to run without the kinship correction. The kinship correction increases the time of the analysis, and users may wish to reduce the number of markers scanned even further to accommodate the extra computational burden of the kinship correction.

Usage

select_markers_for_pairscan(
  data_obj,
  singlescan_obj,
  geno_obj,
  specific_markers = NULL,
  num_alleles = 50,
  peak_density = 0.5,
  window_size = NULL,
  tolerance = 5,
  plot_peaks = FALSE,
  verbose = FALSE,
  pdf_filename = "Peak.Plots.pdf"
)

Arguments

`data_obj`	a `Cape` object
`singlescan_obj`	a singlescan object from `singlescan`.
`geno_obj`	a genotype object
`specific_markers`	A vector of marker names specifying which markers should be selected. If NULL, the function uses main effect size to select markers.
`num_alleles`	The target number of markers to select if using main effect size
`peak_density`	The fraction of markers to select under each peak exceeding the current threshold. Should be set higher for populations with low LD. And should be set lower for populations with high LD. Defaults to 0.5, corresponding to 50% of markers selected under each peak.
`window_size`	The number of markers to use in a smoothing window when calculating main effect peaks. If NULL, the window size is selected automatically based on the number of markers with consecutive rises and falls of main effect size.
`tolerance`	The allowable deviation from the target marker number in number of markers. For example, If you ask the function to select 100 markers, an set the tolerance to 5, the algorithm will stop when it has selected between 95 and 105 markers.
`plot_peaks`	Whether to plot the singlescan peaks identified by `bin_curve`. This can be helpful in determining whether the window_size and peak_density parameters are optimal for the population.
`verbose`	Whether progress should be printed to the screen
`pdf_filename`	If plot_peaks is TRUE, this argument specifies the filename to which the peaks are plotted.

Details

This function can select markers either from a pre-defined list input as the argument specific_markers, or can select markers based on their main effect size.

To select markers based on main effect size, this function first identifies effect score peaks using an automated peak detection algorithm. It finds the peaks rising above a starting threshold and samples markers within each peak based on the user-defined sampling density peak_density. Setting peak_density to 0.5 will result in 50% of the markers in a given peak being sampled uniformly at random. Sampling reduces the redundancy among linked markers tested in the pairscan. If LD is relatively low in the population, this density can be increased to 1 to include all markers under a peak. If LD is high, the density can be decreased to reduce redundancy further.

The algorithm compares the number of markers sampled to the target defined by the user in the argument num_alleles. If fewer than the target have been selected, the threshold is lowered, and the process is repeated until the target number of alleles have been selected (plus or minus the number set in tolerance).

If the number of target alleles exceeds the number of markers genotyped, all alleles will be selected automatically.

Value