R: Fits linear ridge regression models for genome-wide SNP data.

linearRidgeGenotypes {ridge}

R Documentation

Fits linear ridge regression models for genome-wide SNP data.

Description

Fits linear ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed the code directly, enabling the analysis of genome-wide scale SNP data sets.

Usage

linearRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)

Arguments

`genotypesfilename`	character string: path to file containing SNP genotypes coded 0, 1, 2. See `Input file formats`.
`phenotypesfilename`	character string: path to file containing phenotypes. See `Input file formats`.
`lambda`	(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012).
`thinfilename`	(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See `Input file formats`. (See `details`.)
`betafilename`	(optional) character string: path to file where the output will be written. See `Output file formats`.
`approxfilename`	(optional) character string: path to fine where the approximate test p-values will be written. Approximate p-values are not computed unless this argument is given. Approximate p-values are computed using the method of Cule et al (2011). See `Output file formats`.
`permfilename`	(optional) character string: path to file where the permutation test p-values will be written. Permutation test p-values are not computed unless this argument is given. (See warning). See `Output file formats`.
`intercept`	Logical: Should the ridge regression model be fitted with an intercept? (Defaults to `TRUE`)
`verbose`	Logical: If `TRUE`, additional information is printed to the R output as the code runs. Defaults to `FALSE`.

Details

If a file thin is supplied, and the shrinkage parameter lambda is being computed automatically based on the data, then this file is used to thin the SNP data by SNP position. If this file is not supplied, SNPs are thinned automatically based on number of SNPs.

Value

The vector of fitted ridge regression coefficients. If betafilename is given, the fitted coefficients are written to this file as well as being returned. If approxfilename and/or permfilename are given, results of approximate test p-values and/or permutation test p-values are written to the files given in their arguments.

Input file formats

genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.
phenofilename:: A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename.
thin:: (optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename, one row per SNP. First column: SNP names (must match names in genotypesfilename); second column: chromosome; third column: SNP position in BP.

Output file formats

All output files are optional. Whether or not betafilename is provided, fitted coefficients are returned to the R workshpace. If betafilename is provided, fitted coefficients are written to the file specified (in addition).

betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If intercept = TRUE (the default) then the first row is the fitted intercept (with the name Intercept in the first column).
approxfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is approximate p-values.
permfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is permutation p-values.

Warning

When data are large, the permutation test p-values may take a very long time to compute. It is recommended not to request permutation test p-values (using the argument permfilename) when data are large.

Author(s)

Erika Cule

References

Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

## Not run: 
    genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge")
    beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
phenotypesfilename = phenotypesfile)
    ## compare to output of linearRidge
    data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt
    beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
    cbind(round(coef(beta_linearRidge), 6), beta_linearRidgeGenotypes)

## End(Not run)

[Package ridge version 3.3 Index]