locus_pred {HaploCatcher} | R Documentation |
Haplotype Prediction: Using Trained Models to Make Predictions
Description
Haplotype Prediction: Using Trained Models to Make Predictions
Usage
locus_pred(locus_train_results, geno_mat, genotypes_to_predict)
Arguments
locus_train_results |
This object is a the results object from the "locus_train" function. |
geno_mat |
This is a genotypic matrix with genotypes of individuals you have genotyped and characterized for the locus/QTL/gene of interest and individuals that have only been genotyped with a genome wide marker panel. It is important to note that the markers in the genome wide panel must be shared between training population and the population you wish to forward predict. In the case of genotyping-by-sequencing markers, the training and test populations should be discovered and produced together into a genotype file. All marker platforms, however, are compatable, as long as the training and forward prediction population share the same markers genome-wide. |
genotypes_to_predict |
This is a character vector of genotypes in the geno_mat which the user wishes to predict. If this object contains names in the training population, they will be omitted in the results to avoid bias. |
Value
a data frame with two columns: genotype names and predictions.
Examples
#set seed for reproducible sampling
set.seed(022294)
#read in the genotypic data matrix
data("geno_mat")
#read in the marker information
data("marker_info")
#read in the gene compendium file
data("gene_comp")
#Note: in practice you would have something like a gene file
#that does not contain any lines you are trying to predict.
#However, this is for illustrative purposes on how to run the function
#sample data in the gene_comp file to make a traning population
train<-gene_comp[gene_comp$FullSampleName %in%
sample(gene_comp$FullSampleName,
round(length(gene_comp$FullSampleName)*0.8),0),]
#pull vector of names, not in the train, for forward prediction
test<-gene_comp[!gene_comp$FullSampleName
%in% train$FullSampleName,
"FullSampleName"]
#run the function with hets
fit<-locus_train(geno_mat=geno_mat, #the genotypic matrix
gene_file=train, #the gene compendium file
gene_name="sst1_solid_stem", #the name of the gene
marker_info=marker_info, #the marker information file
chromosome="3B", #name of the chromosome
ncor_markers=2, #number of markers to retain
n_neighbors=3, #number of neighbors
include_hets=FALSE, #include hets in the model
verbose = FALSE, #allows for text and graph output
set_seed = 022294, #sets a seed for reproduction of results
models = "knn") #sets what models are requested
#predict the lines in the test population
pred<-locus_pred(locus_train_results=fit,
geno_mat=geno_mat,
genotypes_to_predict=test)
#see predictions
head(pred)