| evian {evian} | R Documentation |
Evidential analysis for genetic data using regression models
Description
Calculates the likelihood intervals for genetic association in a genomic region of interest. Covariates can be accommodated.
Usage
evian(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL,
formula = NULL, robust = FALSE, model='additive', m=200,
bse = 5, lolim = NULL, hilim = NULL, kcutoff = c(8,32,100,1000),
multiThread = 1, family='gaussian',plinkCC=F)
Arguments
data |
a data frame includes a column for the response variable, one or multiple columns of genotype data (coded as |
bim |
a data frame with six columns representing chromosome, SNP ID, physical distance, base pair position, effective allele, and reference allele. i.e. data from a file in PLINK binary format (bim). No header is assumed, but the ordering of the columns must follow the standard bim file format. |
ycol |
numeric; column index in the |
xcols |
numeric vector; the column range in the |
covariateCol |
numeric or numeric vector; optional argument specifying which columns represent covariates. If left as |
formula |
string; this is an alternative way of specifying model rather than using |
robust |
logical; default |
model |
a string that specifies the mode of inheritance parameterization: |
m |
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function. |
bse |
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at |
lolim |
numeric; the lower limit for the grid or the minimum value of the regression parameter |
hilim |
numeric; the upper limit for the grid or the maximum value of the regression parameter |
kcutoff |
numeric or numeric vector; default = |
multiThread |
numeric; number of threads to use for parallel computing. |
family |
the link function for |
plinkCC |
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE. |
Details
evian is the main function called to calculate the 1/k likelihood intervals for the additive, dominant, recessive, or overdominance genotypic models. This function calls calculateEvianMLE in parallel to calculate the likelihood for each SNP. The calculation details can be found in calculateEvianMLE.
The input for the data and bim arguments can be obtained from the PLINK files; data is expected to follow PLINK format when run with the --recodeA option and bim can be obtained directly from a PLINK binary format file. Note if covariates are to be included, it is expected that the covariates are appended to the data file with a header for each covariate.
The statistical model can be specified in two ways. Column index can be provided through the xcols, ycol, and covariateCol arguments or through the formula argument, which can accept a formula specified as the formula argument in the R glm function. We recommend using xcols, ycol, and covariateCol arguments in most scenarios as this is relatively easier to input and it works for all the cases that we have considered so far. The alternative formula argument is not able to detect non-rsID variants as parameters of interests, and is only suggested in the scenario where only a single variant is of interest and that its rsID is known in advance. Since the profileLikelihood can only accomendate scalar parameter and thus if multiple rsID variants are inputted through formula option, it will only assume the first one to be parameter of interests.s
Parallel computing is avaliable through the use of the multiThread argument. This parallelization uses the foreach and doMC packages and will typically reduce computation time significantly. Due to this dependency, parallelization is not available on Windows OS as foreach and doMC are not supported on Windows.
Value
This function outputs the row-combined the results from calculateEvianMLE for each of the SNPs included in the data/bim files. The exact output for each SNP can be found in the calculateEvianMLE documentation.
Note
When lolim/hilim is NOT defined, then the boundaries of the beta grid will be determined by the default bse=5, or a bse defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim/hilim.
In some cases the beta grid (using bse or lolim/hilim) may need to be increased substantially (bse as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contribute to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.
See Also
Examples
data(evian_linear_raw)
data(evian_linear_bim)
rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw),
ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4,
hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE)