lfmm_ridge {lfmm} | R Documentation |
LFMM least-squares estimates with ridge penalty
Description
This function computes regularized least squares estimates for latent factor mixed models using a ridge penalty.
Usage
lfmm_ridge(
Y,
X,
K,
lambda = 1e-05,
algorithm = "analytical",
it.max = 100,
relative.err.min = 1e-06
)
Arguments
Y |
a response variable matrix with n rows and p columns. Each column corresponds to a distinct response variable (e.g., SNP genotype, gene expression level, beta-normalized methylation profile, etc). Response variables must be encoded as numeric. |
X |
an explanatory variable matrix with n rows and d columns. Each column corresponds to a distinct explanatory variable (eg. phenotype, exposure, outcome). Explanatory variables must be encoded as numeric variables. |
K |
an integer for the number of latent factors in the regression model. |
lambda |
a numeric value for the regularization parameter. |
algorithm |
exact (analytical) algorithm or numerical algorithm. The exact algorithm is based on the global minimum of the loss function and computation is very fast. The numerical algorithm converges toward a local minimum of the loss function. The exact method should preferred. The numerical method is for very large n. |
it.max |
an integer value for the number of iterations for the numerical algorithm. |
relative.err.min |
a numeric value for a relative convergence error. Test whether the numerical algorithm converges or not (numerical algorithm only). |
Details
The algorithm minimizes the following penalized least-squares criterion
L(U, V, B) = \frac{1}{2} ||Y - U V^{T} - X B^T||_{F}^2
+ \frac{\lambda}{2} ||B||^{2}_{2} ,
where Y is a response data matrix, X contains all explanatory variables, U denotes the score matrix, V is the loading matrix, B is the (direct) effect size matrix, and lambda is a regularization parameter.
The response variable matrix Y and the explanatory variable are centered.
Value
an object of class lfmm
with the following attributes:
U the latent variable score matrix with dimensions n x K,
V the latent variable axis matrix with dimensions p x K,
B the effect size matrix with dimensions p x d.
Author(s)
Kevin Caye, Basile Jumentier, Olivier Francois
References
Caye, K., B. Jumentier, J. Lepeule, and O. François, 2019 LFMM 2: fast and accurate inference of gene-environment associations in genome-widestudies. Mol. Biol. Evol. 36: 852–860.https://doi.org/10.1093/molbev/msz008
Examples
library(lfmm)
## a GWAS example with Y = SNPs and X = phenotype
data(example.data)
Y <- example.data$genotype[, 1:10000]
X <- example.data$phenotype
## Fit an LFMM with K = 6 factors
mod.lfmm <- lfmm_ridge(Y = Y,
X = X,
K = 6)
## Perform association testing using the fitted model:
pv <- lfmm_test(Y = Y,
X = X,
lfmm = mod.lfmm,
calibrate = "gif")
## Manhattan plot with causal loci shown
pvalues <- pv$calibrated.pvalue
plot(-log10(pvalues), pch = 19,
cex = .2, col = "grey", xlab = "SNP")
points(example.data$causal.set[1:5],
-log10(pvalues)[example.data$causal.set[1:5]],
type = "h", col = "blue")
## An EWAS example with Y = methylation data and X = exposure
Y <- skin.exposure$beta.value
X <- as.numeric(skin.exposure$exposure)
## Fit an LFMM with 2 latent factors
mod.lfmm <- lfmm_ridge(Y = Y,
X = X,
K = 2)
## Perform association testing using the fitted model:
pv <- lfmm_test(Y = Y,
X = X,
lfmm = mod.lfmm,
calibrate = "gif")
## Manhattan plot with true associations shown
pvalues <- pv$calibrated.pvalue
plot(-log10(pvalues),
pch = 19,
cex = .3,
xlab = "Probe",
col = "grey")
causal.set <- seq(11, 1496, by = 80)
points(causal.set,
-log10(pvalues)[causal.set],
col = "blue")