MapMarkers {Map2NCBI} | R Documentation |
Mapping SNP Markers to Closest Genomic Feature
Description
MapMarkers
allows the user to map the supplied DNA markers (primarily designed for SNP markers) to the genomic feature in closest proximity based on the feature list generated using the GetGeneList
function or a properly formated feature list (see Values
section).
Usage
MapMarkers(features, markers, nAut, other = c("X"), savefiles = TRUE, destfile)
Arguments
features |
This is the table or matrix in the current R session that will be used to map the marker list to. If using the |
markers |
This is the table or matrix in the current R session that will provide marker map information to use for the function. See |
nAut |
The number of autosomes in the species. This should reflect the total number of autosomes in the species, not the number of autosomes in the marker file. |
other |
The sex chromosomes or other genomic information available (e.g., for eukaryotes this could include mitochondrial DNA). These must be specified inside quotation marks. If sex chromosomes or other genomic information is not provided in the marker file, set other=FALSE. |
savefiles |
Default is TRUE. This term allows you to save the final marker file with genomic feature information in the destfile location as "MappedMarkers.txt" format. Any markers that cannot be mapped due to lack of feature information are saved as "NotMapped.txt". Options: Must be either TRUE or FALSE. |
destfile |
This is the pathway to the folder in which files will be saved and must be specified using quotation marks (e.g., |
Details
The MapMarkers
function processes each chromosome individually to search for features that fall closest to the markers provided based on the map information included. Map positions of the markers must match the assembly being used in the feature list. Once the closest feature has been found, the marker and feature information are saved together and take the format of binding the marker map file (which include at a minimum 3 columns) with the feature list columns provided (20 columns if using the GetGeneList
function or a minimum of 4 columns if formatting yourself). The function also adds 2 additional columns described in Value
section to identify the distance the marker is from the feature and a category to group the marker's proximity to the feature by.
Value
1) Format for feature list if not generated using the GetGeneList
function:
FeatureName |
The name of the feature provided. Column heading name can be changed, but should be included to identify the feature once the |
chromosome |
The chromosome in which the genomic feature is located on. The column heading name must be given this name. If including sex chromosomes or other genomic information, label based on letters or abbreviation (e.g., "X"). |
start |
The start position of the genomic feature based on the build used. The column heading name must be given this name. This used to be called "chr_start" in version 1.1 of this package. |
end |
The end or stop position of the genomic feature based on the build used. The column heading name must be given this name. This used to be called "chr_stop" in version 1.1 of this package. |
2) Format for the marker map file:
Marker |
Name of the marker. Be aware of R language and its restrictions. The name of this column heading can be changed to something else. |
chromosome |
The chromosome in which the marker is mapped to. The name of this column is required and must be exact. This must be numeric. If including sex chromosomes or other genomic information, assign numbers to each. Number the sex chromosomes or other genomic information in the order that matches the order listed in the other=c() statement (e.g., X and Y chromosomes are labeled 30 and 31, respectively, so other=c("X","Y") to follow that order). The function will automatically align the letter with the correct number as long as they are included in the order specified. |
position |
The base pair position of the marker based on the map build used. This build must also match the build in which you generated genomic feature from using the |
NOTE: Order of the columns in both files are not necessarily important, but correct column heading names are essential. R programming is case sensitive, so make sure it matches exactly unless otherwise noted. Other columns may be included, but will not be used by the function. Any columns included in this file will be returned with the final marker file after the MapMarkers
function is completed.
3) Additional columns included in the output file of the MapMarkers
function:
Distance |
The base pair distance of the marker from the closest feature identified. If the marker is located inside the feature, the distance is set to zero. |
Inside? |
The category in which the marker and feature pair fall into. This is based on the distance between the Marker and the closest feature, which is broken into 11 categories described in the next section. |
4) Categories that are included in the "Inside?" column:
Yes , _Inside_Gene |
Marker is located in the closest feature. |
Marker_is_<=_2500_bp_Before_Feature |
The closest feature is located after the marker position and is within 2,500 base pairs (bp). |
Marker_is_<=_2500_bp_After_Feature |
The closest feature is located before the marker position and is within 2,500 bp. |
Marker_is_>_2500_bp_<=5000_bp_Before_Feature |
The closest feature is located before the marker position and is between 2,500 bp and 5,000 bp from the marker. |
Marker_is_>_2500_bp_<=5000_bp_After_Feature |
The closest feature is located after the marker position and is between 2,500 bp and 5,000 bp from the marker. |
Marker_is_>_5000_bp_<=25000_bp_Before_Feature |
The closest feature is located before the marker position and is between 5,000 bp and 25,000 bp from the marker. |
Marker_is_>_5000_bp_<=25000_bp_After_Feature |
The closest feature is located after the marker position and is between 5,000 bp and 25,000 bp from the marker. |
Nearest_feature_is_>_25 , 000_bp_before_marker |
The closest feature is located before the marker position and is more than 25,000 bp from the marker. |
Nearest_feature_is_>_25 , 000_bp_after_marker |
The closest feature is located after the marker position and is more than 25,000 bp from the marker. |
Nearest_feature_is_>_1_Mb_before_marker |
The closest feature is located before the marker position and is more than 1,000,000 bp (1 Mb) from the marker. |
Nearest_feature_is_>_1_Mp_after_marker |
The closest feature is located after the marker position and is more than 1,000,000 bp (1 Mb) from the marker. |
Note
For issues or problems with this function, please contact Lauren Hanna at Lauren.Hanna@ndsu.edu.
Author(s)
Lauren L. Hulsman Hanna and David G. Riley
References
Hulsman Hanna, L. L., and D. G. Riley. 2014. Mapping genomic markers to closest feature using the R package Map2NCBI. Livest. Sci. 162:59-65. doi:10.1016/j.livsci.2014.01.019
See Also
Function: GetGeneList
Examples
#Example 1: Step 1 includes running "GetGeneList" function.
#As this step is interactive, a dataset from Bos taurus has
#been generated and available to use in the \data folder as
#well as a subset of marker information from BTA 1. Use the
#following code to run this example:
data(GeneList_BTA1)
data(Example10MarkerFile)
Example1 = MapMarkers(GeneList_BTA1, Example10MarkerFile,
nAut=29,other="X",savefiles = FALSE)
#Note, this example will not save the output to the working
#directory, but will return the information to "Example1"
#variable.