pmnorm {mnorm}R Documentation

Probabilities of (conditional) multivariate normal distribution

Description

This function calculates and differentiates probabilities of (conditional) multivariate normal distribution.

Usage

pmnorm(
  lower,
  upper,
  given_x = numeric(),
  mean = numeric(),
  sigma = matrix(),
  given_ind = numeric(),
  n_sim = 1000L,
  method = "default",
  ordering = "mean",
  log = FALSE,
  grad_lower = FALSE,
  grad_upper = FALSE,
  grad_sigma = FALSE,
  grad_given = FALSE,
  is_validation = TRUE,
  control = NULL,
  n_cores = 1L,
  marginal = NULL,
  grad_marginal = FALSE,
  grad_marginal_prob = FALSE
)

Arguments

lower

numeric vector representing lower integration limits. If lower is a matrix then each row determines new limits. Negative infinite values are allowed while positive infinite values are prohibited.

upper

numeric vector representing upper integration limits. If upper is a matrix then each row determines new limits. Positive infinite values are allowed while negative infinite values are prohibited.

given_x

numeric vector which i-th element corresponds to the given value of the given_ind[i]-th element (component) of multivariate normal vector. If given_x is numeric matrix then it's rows are such vectors of given values.

mean

numeric vector representing expectation of multivariate normal vector (distribution).

sigma

positively defined numeric matrix representing covariance matrix of multivariate normal vector (distribution).

given_ind

numeric vector representing indexes of multivariate normal vector which are conditioned at values given by given_x argument.

n_sim

positive integer representing the number of draws from Richtmyer sequence in GHK algorithm. More draws provide more accurate results by the cost of additional computational burden. Alternative types of sequences could be provided via random_sequence argument.

method

string representing the method to be used to calculate multivariate normal probabilities. Possible options are "GHK" and "Gassmann" recommended for high dimensional (more than 5) and low dimensional probabilities correspondingly. Additional option "default" selects optimal method depending on the number of dimensions. See 'Details' for additional information.

ordering

string representing the method to be used to order the integrals. See Details section below.

log

logical; if TRUE then probabilities (or densities) p are given as log(p) and derivatives will be given respect to log(p).

grad_lower

logical; if TRUE then the vector of partial derivatives of the probability will be calculated respect to each element of lower. If lower is a matrix then gradients will be estimated for each row of lower.

grad_upper

logical; if TRUE then the vector of partial derivatives of the probability will be calculated respect to each element of upper. If upper is a matrix then gradients will be estimated for each row of upper.

grad_sigma

logical; if TRUE then the vector of partial derivatives (gradient) of the probability will be calculated respect to each element of sigma. If lower and upper are matrices then gradients will be estimated for each row of these matrices.

grad_given

logical; if TRUE then the vector of partial derivatives of the density function will be calculated respect to each element of given_x. If given_x is a matrix then gradients will be estimated for each row of given_x.

is_validation

logical value indicating whether input arguments should be validated. Set it to FALSE to get performance boost (default value is TRUE).

control

a list of control parameters. See Details.

n_cores

positive integer representing the number of CPU cores used for parallel computing. Currently it is not recommended to set n_cores > 1 if vectorized arguments include less then 100000 elements.

marginal

list such that marginal[[i]] represents parameters of marginal distribution of the i-th component of the random vector and names(marginal)[i] is a name of this distribution. If names(marginal)[i] = "logistic" or names(marginal)[i] = "normal" then marginal[[i]] should be an empty vector or NULL. If names(marginal)[i] = "student" then marginal[[i]] should contain a single element representing degrees of freedom. If names(marginal)[i] = "PGN" or names(marginal)[i] = "hpa" then marginal[[i]] should be a numeric vector which elements correspond to pc argument of phpa0.

grad_marginal

logical; if TRUE then the vector of partial derivatives (gradient) of probability will be calculated respect to each parameter of marginal distributions i.e. respect to each element of marginal. The gradient respect to the parameters of the i-th marginal distribution will be stored in the grad_marginal[[i]] output matrix which j-th column stores derivatives respect to marginal[[i]][j].

grad_marginal_prob

logical; if TRUE then the vector of partial derivatives (gradient) of probability will be calculated respect to each cumulative marginal probability of marginal distributions.

Details

Consider notations from the Details sections of cmnorm and dmnorm. The function calculates probabilities of the form:

P\left(x^{(l)}\leq X_{I_{d}}\leq x^{(u)}|X_{I_{g}}=x^{(g)}\right)

where x^{(l)} and x^{(u)} are lower and upper integration limits respectively i.e. lower and upper correspondingly. Also x^{(g)} represents given_x. Note that lower and upper should be matrices of the same size. Also given_x should have the same number of rows as lower and upper.

To calculate bivariate probabilities the function applies the method of Drezner and Wesolowsky described in A. Genz (2004). In contrast to the classical implementation of this method the function applies Gauss-Legendre quadrature with 30 sample points to approximate integral (1) of A. Genz (2004). Classical implementations of this method use up to 20 points but requires some additional transformations of (1). During preliminary testing it has been found that approach with 30 points provides similar accuracy being slightly faster because of better vectorization capabilities.

To calculate trivariate probabilities the function uses Drezner method following formula (14) of A. Genz (2004). Similarly to bivariate case 30 points are used in Gauss-Legendre quadrature.

The function may apply the method of Gassmann (2003) for estimation of m>3 dimensional normal probabilities. It uses matrix 5 representation of Gassmann (2003) and 30 points of Gauss-Legendre quadrature.

For m-variate probabilities, where m > 1, the function may apply GHK algorithm described in section 4.2 of A. Genz and F. Bretz (2009). The implementation of GHK is based on deterministic Halton sequence with n_sim draws and uses variable reordering suggested in section 4.1.3 of A. Genz and F. Bretz (2009). The ordering algorithm may be determined via ordering argument. Available options are "NO", "mean" (default), and "variance".

Univariate probabilities are always calculated via standard approach so in this case method will not affect the output. If method = "Gassmann" then the function uses fast (aforementioned) algorithms for bivariate and trivariate probabilities or the method of Gassmann for m>3 dimensional probabilities. If method = "GHK" then GHK algorithm is used. If method = "default" then "Gassmann" is used for bivariate and trivariate probabilities while "GHK" is used for m>3 dimensional probabilities. During future updates "Gassmann" may become a default method for calculation of the 4-5 dimensional probabilities.

We are going to provide alternative estimation algorithms during future updates. These methods will be available via method argument.

The function is optimized to perform much faster when all upper integration limits upper are finite while all lower integration limits lower are negative infinite. The derivatives could be also calculated much faster when some integration limits are infinite.

For simplicity of notations further let's consider unconditioned probabilities. Derivatives respect to conditioned components are similar to those mentioned in Details section of dmnorm. We also provide formulas for m \geq 3. But the function may calculate derivatives for m \leq 2 using some simplifications of the formulas mentioned below.

If grad_upper is TRUE then function additionally estimates the gradient respect to upper:

\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial x^{(u)}_{i}}= P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}| X_{i}=x^{(u)}_{i}\right) f_{X_{i}}\left(x^{(u)}_{i};\mu_{i},\Sigma_{i,i}\right)

If grad_lower is TRUE then function additionally estimates the gradient respect to lower:

\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial x^{(l)}_{i}}= -P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}| X_{i}=x^{(l)}_{i}\right) f_{X_{i}}\left(x^{(l)}_{i};\mu_{i},\Sigma_{i,i}\right)

If grad_sigma is TRUE then function additionally estimates the gradient respect to sigma. For i\ne j the function calculates derivatives respect to the covariances:

\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial \Sigma_{i, j}}=

=P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}| X_{i}=x^{(u)}_{i}, X_{j}=x^{(u)}_{j}\right) f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(u)}_{j}; \mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -

-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}| X_{i}=x^{(l)}_{i}, X_{j}=x^{(u)}_{j}\right) f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(u)}_{j}; \mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -

-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}| X_{i}=x^{(u)}_{i}, X_{j}=x^{(l)}_{j}\right) f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(l)}_{j}; \mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) +

+P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}| X_{i}=x^{(l)}_{i}, X_{j}=x^{(l)}_{j}\right) f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(l)}_{j}; \mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right)

Note that if some of integration limits are infinite then some elements of this equation converge to zero which highly simplifies the calculations.

Derivatives respect to variances are calculated using derivatives respect to covariances and integration limits:

\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial \Sigma_{i, i}}=

-\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial x^{(l)}_{i}} \frac{x^{(l)}_{i}}{2\Sigma_{i, i}} -\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial x^{(u)}_{i}} \frac{x^{(u)}_{i}}{2\Sigma_{i, i}}-

-\sum\limits_{j\ne i}\frac{\partial P\left(x^{(l)}\leq X\leq x^{(u)}\right)}{\partial \Sigma_{i, j}} \frac{\Sigma_{i, j}}{2\Sigma_{i, i}}

If grad_given is TRUE then function additionally estimates the gradient respect to given_x using formulas similar to those described in Details section of dmnorm.

More details on abovementioned differentiation formulas could be found in the appendix of E. Kossova and B. Potanin (2018).

If marginal is not empty then Gaussian copula will be used instead of a classical multivariate normal distribution. Without loss of generality let's assume that \mu is a vector of zeros and introduce some additional notations:

q_{i}^{(u)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(u)}}{\sigma_{i}}\right)\right)

q_{i}^{(l)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(l)}}{\sigma_{i}}\right)\right)

where \Phi(.)^{-1} is a quantile function of a standard normal distribution and P_{i} is a cumulative distribution function of the standartized (to zero mean and unit variance) marginal distribution which name and parameters are defined by names(marginal)[i] and marginal[[i]] correspondingly. For example if marginal[i] = "logistic" then:

P_{i}(t) = \frac{1}{1+e^{-\pi t / \sqrt{3}}}

Let's denote by X^{*} random vector which is distributed with Gaussian (its covariance matrix is \Sigma) copula and marginals defined by marginal. Then probabilities for these random vector are calculated as follows:

P\left(x^{(l)}\leq X^{*}\leq x^{(u)}\right) = P\left(\sigma q^{(l)}\leq X\leq \sigma q^{(u)}\right) = P_{0}\left(\sigma q^{(l)}, \sigma q^{(u)}\right)

where q^{(l)} = (q_{1}^{(l)},...,q_{m}^{(l)}), q^{(u)} = (q_{1}^{(u)},...,q_{m}^{(u)}) and \sigma = (\sqrt{\Sigma_{1, 1}},...,\sqrt{\Sigma_{m, m}}). Therefore probabilities of X^{*} may be calculated using probabilities of multivariate normal random vector X (with the same covariance matrix) by substituting lower and upper integration limits x^{(l)} and x^{(u)} with \sigma q^{(l)} and \sigma q^{(u)} correspondingly. Therefore differentiation formulas are similar to those mentioned above and will be automatically adjusted if marginal is not empty (including conditional probabilities).

Argument control is a list with the following input parameters:

Value

This function returns an object of class "mnorm_pmnorm".

An object of class "mnorm_pmnorm" is a list containing the following components:

If log is TRUE then prob is a log-probability so output grad_lower, grad_upper, grad_sigma and grad_given are calculated respect to the log-probability.

Output grad_lower and grad_upper are Jacobian matrices which rows are gradients of the probabilities calculated for each row of lower and upper correspondingly. Similarly grad_given is a Jacobian matrix respect to given_x.

Output grad_sigma is a 3D array such that grad_sigma[i, j, k] is a partial derivative of the probability function respect to the sigma[i, j] estimated for k-th observation.

Output grad_marginal is a list such that grad_marginal[[i]] is a Jacobian matrice which rows are gradients of the probabilities calculated for each row of lower and upper correspondingly respect to the elements of marginal_par[[i]].

References

Genz, A. (2004), Numerical computation of rectangular bivariate and trivariate normal and t-probabilities, Statistics and Computing, 14, 251-260.

Genz, A. and Bretz, F. (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195. Springer-Verlag, Heidelberg.

E. Kossova, B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.

H. I. Gassmann (2003). Multivariate Normal Probabilities: Implementing an Old Idea of Plackett's. Journal of Computational and Graphical Statistics, vol. 12 (3), pages 731-752.

Examples

# Consider multivariate normal vector:
# X = (X1, X2, X3, X4, X5) ~ N(mean, sigma)

# Prepare multivariate normal vector parameters
  # expected value
mean <- c(-2, -1, 0, 1, 2)
n_dim <- length(mean)
  # correlation matrix
cor <- c(   1,  0.1,  0.2,   0.3,  0.4,
          0.1,    1, -0.1,  -0.2, -0.3,
          0.2, -0.1,    1,   0.3,  0.2,
          0.3, -0.2,  0.3,     1, -0.05,
          0.4, -0.3,  0.2, -0.05,     1)
cor <- matrix(cor, ncol = n_dim, nrow = n_dim, byrow = TRUE)
  # covariance matrix
sd_mat <- diag(c(1, 1.5, 2, 2.5, 3))
sigma <- sd_mat %*% cor %*% t(sd_mat)

# Estimate probability:
# P(-3 < X1 < 1, -2.5 < X2 < 1.5, -2 < X3 < 2, -1.5 < X4 < 2.5, -1 < X5 < 3)
lower <- c(-3, -2.5, -2, -1.5, -1)
upper <- c(1, 1.5, 2, 2.5, 3)
p.list <- pmnorm(lower = lower, upper = upper,
                 mean = mean, sigma = sigma)
p <- p.list$prob
print(p)

# Additionally estimate a probability
lower.1 <- c(-Inf, 0, -Inf, 1, -Inf)
upper.1 <- c(Inf, Inf, 3, 4, 5)
lower.mat <- rbind(lower, lower.1)
upper.mat <- rbind(upper, upper.1)
p.list.1 <- pmnorm(lower = lower.mat, upper = upper.mat,
                   mean = mean, sigma = sigma)
p.1 <- p.list.1$prob
print(p.1)

# Estimate the probabilities
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
lower.2 <- c(-1, -3, -5)
upper.2 <- c(1, 3, 5)
given_ind <- c(2, 4)
given_x <- c(-2, 4)
p.list.2 <- pmnorm(lower = lower.2, upper = upper.2,
                   mean = mean, sigma = sigma,
                   given_ind = given_ind, given_x = given_x)
p.2 <- p.list.2$prob
print(p.2)

# Additionally estimate the probability
# P(-Inf < X1 < 1, -3 < X3 < Inf, -Inf < X5 < Inf | X2 = 4, X4 = -2)
lower.3 <- c(-Inf, -3, -Inf)
upper.3 <- c(1, Inf, Inf)
given_x.1 <- c(-2, 4)
lower.mat.2 <- rbind(lower.2, lower.3)
upper.mat.2 <- rbind(upper.2, upper.3)
given_x.mat <- rbind(given_x, given_x.1)
p.list.3 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
                   mean = mean, sigma = sigma,
                   given_ind = given_ind, given_x = given_x.mat)
p.3 <- p.list.3$prob
print(p.3)

# Estimate the gradient of previous probabilities respect various arguments
p.list.4 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
                   mean = mean, sigma = sigma,
                   given_ind = given_ind, given_x = given_x.mat,
                   grad_lower = TRUE, grad_upper = TRUE,
                   grad_sigma = TRUE, grad_given = TRUE)
p.4 <- p.list.4$prob
print(p.4)
# Gradient respect to 'lower'
grad_lower <- p.list.4$grad_lower
   # for the first probability
print(grad_lower[1, ])
   # for the second probability
print(grad_lower[2, ])
# Gradient respect to 'upper'
grad_upper <- p.list.4$grad_upper
   # for the first probability
print(grad_upper[1, ])
   # for the second probability
print(grad_upper[2, ])
# Gradient respect to 'given_x'
grad_given <- p.list.4$grad_given
   # for the first probability
print(grad_given[1, ])
   # for the second probability
print(grad_given[2, ])
# Gradient respect to 'sigma'
grad_given <- p.list.4$grad_given
   # for the first probability
print(grad_given[1, ])
   # for the second probability
print(grad_given[2, ])


# Compare analytical gradients from the previous example with
# their numeric (forward difference) analogues for the first probability
n_dependent <- length(lower.2)
n_given <- length(given_x)
n_dim <- n_dependent + n_given
delta <- 1e-6
grad_lower.num <- rep(NA, n_dependent)
grad_upper.num <- rep(NA, n_dependent)
grad_given.num <- rep(NA, n_given)
grad_sigma.num <- matrix(NA, nrow = n_dim, ncol = n_dim)
for (i in 1:n_dependent)
{
  # respect to lower
  lower.delta <- lower.2
  lower.delta[i] <- lower.2[i] + delta
  p.list.delta <- pmnorm(lower = lower.delta, upper = upper.2,
                         given_x = given_x,
                         mean = mean, sigma = sigma,
                         given_ind = given_ind)
  grad_lower.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
  # respect to upper
  upper.delta <- upper.2
  upper.delta[i] <- upper.2[i] + delta
  p.list.delta <- pmnorm(lower = lower.2, upper = upper.delta,
                         given_x = given_x,
                         mean = mean, sigma = sigma,
                         given_ind = given_ind)
  grad_upper.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_given)
{
  # respect to lower
  given_x.delta <- given_x
  given_x.delta[i] <- given_x[i] + delta
  p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
                         given_x = given_x.delta,
                         mean = mean, sigma = sigma,
                         given_ind = given_ind)
  grad_given.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_dim)
{
  for(j in 1:n_dim)
  {
    # respect to sigma
    sigma.delta <- sigma
    sigma.delta[i, j] <- sigma[i, j] + delta 
    sigma.delta[j, i] <- sigma[j, i] + delta 
    p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
                           given_x = given_x,
                           mean = mean, sigma = sigma.delta,
                           given_ind = given_ind)
    grad_sigma.num[i, j] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
  }
}
# Comparison of gradients respect to lower integration limits
h.lower <- cbind(analytical = p.list.4$grad_lower[1, ], 
                 numeric = grad_lower.num)
print(h.lower)
# Comparison of gradients respect to upper integration limits
h.upper <- cbind(analytical = p.list.4$grad_upper[1, ], 
                 numeric = grad_upper.num)
print(h.upper)
# Comparison of gradients respect to given values
h.given <- cbind(analytical = p.list.4$grad_given[1, ], 
                 numeric = grad_given.num)
print(h.given)
# Comparison of gradients respect to the covariance matrix
h.sigma <- list(analytical = p.list.4$grad_sigma[, , 1], 
                numeric = grad_sigma.num)
print(h.sigma)

# Let's again estimate probability
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
# But assume that standardized (to zero mean and unit variance): 
# 1) X1 and X2 have standardized PGN distribution with coefficients vectors
#    pc1 = (0.5, -0.2, 0.1) and pc2 = (0.2, 0.05) correspondingly.
# 2) X3 has standardized student distribution with 5 degrees of freedom
# 3) X4 has standardized logistic distribution
# 4) X5 has standard normal distribution
marginal <- list(PGN = c(0.5, -0.2, 0.1), hpa = c(0.2, 0.05), 
                 student = 5, logistic = numeric(), normal = NULL)
p.list.5 <- pmnorm(lower = lower.2, upper = upper.2,
                   mean = mean, sigma = sigma,
                   given_ind = given_ind, given_x = given_x,
                   grad_lower = TRUE, grad_upper = TRUE,
                   grad_sigma = TRUE, grad_given = TRUE,
                   marginal = marginal, grad_marginal = TRUE)     
# Lets investigate derivatives respect to parameters
# of marginal distributions
  # respect to pc1 of X1
p.list.5$grad_marginal[[1]]              
  # respect to pc2 of X2
p.list.5$grad_marginal[[2]]  
  # derivative respect to degrees of freedom of X5 is
  # currently unavailable and will be set to 0
p.list.5$grad_marginal[[3]]                    

[Package mnorm version 1.2.2 Index]