| read_map {skater} | R Documentation | 
Read genetic map file
Description
This function reads in the content from a genetic map file to translate physical distance to genetic units (i.e. cM). Regardless of the source, the input file must be sex-averaged and in a tab-separated "Plink" format (documentation) with the following four columns and no header (i.e. no column names):
- Chromosome 
- Identifier (ignored in - read_map())
- Length (genetic length within the physical position boundary) 
- Position (physical position boundary) 
The columns must be in the order above. Note that only the first, third, and fourth columns are used in the function.
Usage
read_map(file)
Arguments
| file | Input file path | 
Details
The genetic map could come from different sources. One source is the HapMap map distributed by the Browning Lab (documentation). If this map file is used, the non-sex chromosomes can be downloaded and concatenated to a single file as follows:
wget https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh37.map.zip unzip plink.GRCh37.map.zip cat *chr[0-9]*GRCh37.map | sort -k1,1 -k4,4 --numeric-sort > plink.allchr.GRCh37.map
Another source is a sex-specific map ("bherer") originally published by Bherer et al and recommended by the developers of ped-sim for simulating IBD segments (documentation). To retrieve and prep this map file for simulation:
# Get the refined genetic map and extract
wget --no-check-certificate https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination/raw/master/Refined_genetic_map_b37.tar.gz
tar xvfpz Refined_genetic_map_b37.tar.gz
# Format for ped-sim as per https://github.com/williamslab/ped-sim#map-file-
printf "#chr\tpos\tmale_cM\tfemale_cM\n" > sexspec.pedsim.map
for chr in {1..22}; do
  paste Refined_genetic_map_b37/male_chr$chr.txt Refined_genetic_map_b37/female_chr$chr.txt \
    | awk -v OFS="\t" 'NR > 1 && $2 == $6 {print $1,$2,$4,$8}' \
    | sed 's/^chr//' >> sexspec.pedsim.map;
done
# Clean up
rm -rf Refined_genetic_map_b37*
After this, the sexspec.pedsim.map file is ready for use in simulation. However, it must be averaged and reformatted to "Plink format" to use here:
cat sexspec.pedsim.map | grep -v "^#" | awk -v OFS="\t" '{print $1,".",($3+$4)/2,$2}' > sexspec-avg.plink.map
#' The genetic maps created above are in the tens of megabytes size range. This is trivial to store for most systems but a reduced version would increase portability and ease testing. This "minimum viable genetic map" could be used for testing and as installed package data in an R package for example analysis. Read more about minimum viable genetic maps at:
- Blog post: https://hapi-dna.org/2020/11/minimal-viable-genetic-maps/ 
- Github repo with python code: https://github.com/williamslab/min_map 
The code as written below reduces the averaged sex-specific genetic map from 833776 to 28726 positions (~30X reduction!).
# Get minmap script from github
wget https://raw.githubusercontent.com/williamslab/min_map/main/min_map.py
# Create empty minmap
echo -n > sexspec-avg-min.plink.map
# For each autosome...
for chr in {1..22}; do
  echo "Working on chromosome $chr..."
  # First pull out just one chromosome
  grep "^${chr}[[:space:]]" sexspec-avg.plink.map > tmp.${chr}
  # Run the python script on that chromosome.
  # The genetic map column is 3rd column (2nd in 0-start). Physical position is last column (3 in 0-based)
  python3 min_map.py -mapfile tmp.${chr} -chr ${chr} -genetcol 2 -physcol 3 -noheader -error 0.05
  # Strip out the header and reformat back to plink format, and append to minmap file
  cat min_viable_map${chr}.txt | grep -v "^#" | awk -v OFS="\t" '{print $1,".",$4,$2}' >> sexspec-avg-min.plink.map
  # Clean up
  rm -f min_viable_map${chr}.txt tmp.${chr}
done
This averaged version of the Bherer sex-specific map, reduced to a minimum viable genetic map with at most 5% error, in Plink format, is available as installed package data (see examples). This is useful for testing code, but the full genetic map should be used for most analysis operations.
Value
A tibble containing 3 columns:
- chr (chromosome) 
- value (genetic length within the physical position boundary) 
- bp (physical position boundary) 
References
https://www.cog-genomics.org/plink/1.9/formats#map
https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/
https://github.com/williamslab/ped-sim#map-file
https://www.nature.com/articles/ncomms14994
https://www.nature.com/articles/ncomms14994
https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination
Examples
gmapfile <- system.file("extdata", "sexspec-avg-min.plink.map", package="skater", mustWork=TRUE)
gmap <- read_map(gmapfile)