R: LFMM least-squares estimates with ridge penalty

lfmm_ridge {lfmm}

R Documentation

LFMM least-squares estimates with ridge penalty

Description

This function computes regularized least squares estimates for latent factor mixed models using a ridge penalty.

Usage

lfmm_ridge(
  Y,
  X,
  K,
  lambda = 1e-05,
  algorithm = "analytical",
  it.max = 100,
  relative.err.min = 1e-06
)

Arguments

`Y`	a response variable matrix with n rows and p columns. Each column corresponds to a distinct response variable (e.g., SNP genotype, gene expression level, beta-normalized methylation profile, etc). Response variables must be encoded as numeric.
`X`	an explanatory variable matrix with n rows and d columns. Each column corresponds to a distinct explanatory variable (eg. phenotype, exposure, outcome). Explanatory variables must be encoded as numeric variables.
`K`	an integer for the number of latent factors in the regression model.
`lambda`	a numeric value for the regularization parameter.
`algorithm`	exact (analytical) algorithm or numerical algorithm. The exact algorithm is based on the global minimum of the loss function and computation is very fast. The numerical algorithm converges toward a local minimum of the loss function. The exact method should preferred. The numerical method is for very large n.
`it.max`	an integer value for the number of iterations for the numerical algorithm.
`relative.err.min`	a numeric value for a relative convergence error. Test whether the numerical algorithm converges or not (numerical algorithm only).

Details

The algorithm minimizes the following penalized least-squares criterion

L(U, V, B) = \frac{1}{2} ||Y - U V^{T} - X B^T||_{F}^2 + \frac{\lambda}{2} ||B||^{2}_{2} ,

where Y is a response data matrix, X contains all explanatory variables, U denotes the score matrix, V is the loading matrix, B is the (direct) effect size matrix, and lambda is a regularization parameter.

The response variable matrix Y and the explanatory variable are centered.

Value

an object of class lfmm with the following attributes:

U the latent variable score matrix with dimensions n x K,
V the latent variable axis matrix with dimensions p x K,
B the effect size matrix with dimensions p x d.

Author(s)

Kevin Caye, Basile Jumentier, Olivier Francois

References

Caye, K., B. Jumentier, J. Lepeule, and O. François, 2019 LFMM 2: fast and accurate inference of gene-environment associations in genome-widestudies. Mol. Biol. Evol. 36: 852–860.https://doi.org/10.1093/molbev/msz008

Examples


library(lfmm)

## a GWAS example with Y = SNPs and X = phenotype
data(example.data)
Y <- example.data$genotype[, 1:10000]
X <- example.data$phenotype

## Fit an LFMM with K = 6 factors
mod.lfmm <- lfmm_ridge(Y = Y, 
                       X = X, 
                       K = 6)

## Perform association testing using the fitted model:
pv <- lfmm_test(Y = Y, 
                X = X, 
                lfmm = mod.lfmm, 
                calibrate = "gif")

## Manhattan plot with causal loci shown

pvalues <- pv$calibrated.pvalue
plot(-log10(pvalues), pch = 19, 
     cex = .2, col = "grey", xlab = "SNP")
points(example.data$causal.set[1:5], 
      -log10(pvalues)[example.data$causal.set[1:5]], 
       type = "h", col = "blue")


## An EWAS example with Y = methylation data and X = exposure
Y <- skin.exposure$beta.value
X <- as.numeric(skin.exposure$exposure)

## Fit an LFMM with 2 latent factors
mod.lfmm <- lfmm_ridge(Y = Y,
                       X = X, 
                       K = 2)
                       
## Perform association testing using the fitted model:
pv <- lfmm_test(Y = Y, 
                X = X,
                lfmm = mod.lfmm, 
                calibrate = "gif")
                
## Manhattan plot with true associations shown
pvalues <- pv$calibrated.pvalue
plot(-log10(pvalues), 
     pch = 19, 
     cex = .3,
     xlab = "Probe",
     col = "grey")
     
causal.set <- seq(11, 1496, by = 80)
points(causal.set, 
      -log10(pvalues)[causal.set], 
       col = "blue")

[Package lfmm version 1.1 Index]