logisticRidgeGenotypesPredict {ridge} | R Documentation |
Predict fitted probabilities from genome-wide SNP data based on a file of coefficients
Description
Predict fitted probabilities from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.
Usage
logisticRidgeGenotypesPredict(genotypesfilename, betafilename,
phenotypesfilename = NULL, verbose = FALSE)
Arguments
genotypesfilename |
character string: path to file containing SNP genotypes coded 0, 1,
2. See |
betafilename |
character string: path to file containing fitted coefficients. See |
phenotypesfilename |
(optional) character string: path to file in which to write out the
fitted probabilities. See |
verbose |
Logical: If |
Value
A vector of fitted probabilities, the same length as the number of
individuals whose data are in genotypesfilename
. If
phenotypesfilename
is supplied, the fitted probabilities are also
written there.
Input file formats
- genotypesfilename:
A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
- betafilename:
Two columns: First column is SNP names in same order as in
genotypesfilename
, second column is fitted coefficients. If the coefficients include an intercept then the first row ofbetafilename
should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those ingenotypesfilename
. The format ofbetafilename
is that of the output oflinearRidgeGenotypes
, meaninglinearRidgeGenotypesPredict
can be used to predict using coefficients fitted usinglinearRidgeGenotypes
(see the example).
Output file format
Whether or not phenotypesfilename
is provided, fitted probabilities are returned to the R workshpace. If phenotypesfilename
is provided, fitted probabilities are written to the file specified (in addition).
- phenotypesfilename:
One column, containing fitted probabilities, one individual per row.
Author(s)
Erika Cule
References
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
See Also
logisticRidgeGenotypes
for model
fitting. linearRidgeGenotypes
and
linearRidgeGenotypesPredict
for corresponding functions
to fit and predict on SNP data with continuous outcomes.
Examples
## Not run:
genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
betafile <- tempfile(pattern = "beta", fileext = ".dat")
beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile,
phenotypesfilename = phenotypesfile,
betafilename = betafile)
pred_phen_geno <- logisticRidgeGenotypesPredict(genotypesfilename = genotypesfile,
betafilename = betafile)
## compare to output of logisticRidge
data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
pred_phen <- predict(beta_logisticRidge, type="response")
print(cbind(pred_phen_geno, pred_phen))
## Delete the temporary betafile
unlink(betafile)
## End(Not run)