lgmr_model {baldur} | R Documentation |
Baldur has the option to use the Latent Gamma Regression Model
(LGMR). This model attempts to estimate local Mean-Variance (M-V) trend for
each peptide (or protein, PTM, etc.). To this end is describes the M-V
trend as a regression with a common link function and a latent one. While
there may not be a latent trend in real data, it allows for a mathematical
description that can estimate the local trends of each peptide. The model
describes the standard deviation (S) as a gamma random variable with mean
dependency described by the sample mean (\bar{y}
):
\boldsymbol{S}\sim\Gamma(\alpha,\beta),\quad\boldsymbol{\beta}=\frac{\alpha}{\boldsymbol{\mu}}
\boldsymbol{\mu}=\exp(\gamma_0-\gamma_{\bar{y}}\,f(\boldsymbol{\bar{y}}))
+ \kappa
\exp(\boldsymbol{\theta}(\gamma_{0L}-\gamma_{\bar{y}L}\,f(\boldsymbol{\bar{y}})),\quad
f(\boldsymbol{x})=\frac{\boldsymbol{x}-\mu_{\bar{y}}}{\sigma_{\bar{y}}}
Here, \exp(\gamma_0-\gamma_{\bar{y}}f(\boldsymbol{\bar{y}}))
is the common trend of the
regression. Then, the mixing variable, \boldsymbol{\theta}
, describes
the proportion of the mixture that each peptide has, and \kappa
is
just some small constant such that when \theta
is zero the latent
term is small.
Next we will describe the hierarchical prior setup for the
regression variables.
For \alpha
baldur
uses a half-Cauchy as a weakly informative prior:
\alpha\sim\text{Half-Cauchy}(25)
For the latent contribution to the i:th peptide, \theta_i
, has an uniform distribution:
\theta_i\sim\mathcal{U}(0, 1)
Further, it can be seen that Baldur assumes
that S always decreases (on average) with the sample mean. To this end, both
slopes (\gamma_{\bar{y}},\gamma_{\bar{y}L}
) are defined on the real
positive line. Hence, we used Half-Normal (HN)
distributions to describe the slope parameters:
\gamma_{\bar{y}}\sim
HN(1)
\gamma_{\bar{y}L}\sim HN(1)
For the intercepts, we assume a
standard normal prior for the common intercept. Then, we use a skew-normal to
model the latent intercept. The reason for this is two-fold, one,
\kappa
will decrease the value of the latent term and, two, to push the
latent trend upwards. The latter reason is so that the latent intercept is
larger than the common and so that the priors prioritize a shift in intercept
over a increase in slope.
For the intercepts, Baldur uses a standard normal prior for the common intercept.
\gamma_0\sim\mathcal{N}(0,1)
While for the latent trend, it uses a skew-normal (SN) to push up the second
trend and to counteract the shrinkage of \kappa
.
\gamma_{0L}\sim\mathcal{SN}(2,15,35)
A stanmodel
that can be used in fit_lgmr.
The Stan
code for this model is given by:
lgmr_model S4 class stanmodel 'lgmr_model' coded as follows: functions { // mu vector reg_function( vector x, vector theta, real I, real I_L, real S, real S_L, int N ) { vector[N] exp_beta = .001*exp(theta .* (I_L - S_L*x)); exp_beta += exp(I - S*x); return exp_beta; } } data { int<lower=0> N; // Number of observations vector<lower=0>[N] y; // sd vector[N] x; // mean } transformed data { real v_y = variance(y); // for NRMSE vector[N] x_star = (x - mean(x))/sd(x); // f(y_bar) vector[3] lb = [0, 0, negative_infinity()]'; // Boundaries normal coefs. } parameters { real<lower = 0> alpha; // Shape vector<lower = lb>[3] eta; // For S, S_L, I vector<lower = 0, upper = 1>[N] theta; // Mixture parameter real I_L; // Intercept Latent } transformed parameters { real<lower = 0> S = eta[1]; // Slope common real<lower = 0> S_L = eta[2]; // Slope Latent real I = eta[3]; // Intercep common } model { //Priors alpha ~ cauchy(0, 25); eta ~ std_normal(); I_L ~ skew_normal(2, 15, 35); theta ~ beta(1, 1); { vector[N] exp_beta = reg_function(x_star, theta, I, I_L, S, S_L, N); // Likelihood y ~ gamma(alpha, alpha ./ exp_beta); } } generated quantities { // NRMSE calculations real nrmse; { vector[N] se = reg_function(x_star, theta, I, I_L, S, S_L, N); se -= y; se = square(se); nrmse = mean(se) / v_y; } nrmse = sqrt(nrmse); }