pmnorm {mnorm} | R Documentation |
Probabilities of (conditional) multivariate normal distribution
Description
This function calculates and differentiates probabilities of (conditional) multivariate normal distribution.
Usage
pmnorm(
lower,
upper,
given_x = numeric(),
mean = numeric(),
sigma = matrix(),
given_ind = numeric(),
n_sim = 1000L,
method = "default",
ordering = "mean",
log = FALSE,
grad_lower = FALSE,
grad_upper = FALSE,
grad_sigma = FALSE,
grad_given = FALSE,
is_validation = TRUE,
control = NULL,
n_cores = 1L,
marginal = NULL,
grad_marginal = FALSE,
grad_marginal_prob = FALSE
)
Arguments
lower |
numeric vector representing lower integration limits.
If |
upper |
numeric vector representing upper integration limits.
If |
given_x |
numeric vector which |
mean |
numeric vector representing expectation of multivariate normal vector (distribution). |
sigma |
positively defined numeric matrix representing covariance matrix of multivariate normal vector (distribution). |
given_ind |
numeric vector representing indexes of multivariate
normal vector which are conditioned at values given by
|
n_sim |
positive integer representing the number of draws from Richtmyer
sequence in GHK algorithm. More draws provide more accurate results by the
cost of additional computational burden. Alternative types of sequences
could be provided via |
method |
string representing the method to be used to calculate
multivariate normal probabilities. Possible options are |
ordering |
string representing the method to be used to order the integrals. See Details section below. |
log |
logical; if |
grad_lower |
logical; if |
grad_upper |
logical; if |
grad_sigma |
logical; if |
grad_given |
logical; if |
is_validation |
logical value indicating whether input
arguments should be validated. Set it to |
control |
a list of control parameters. See Details. |
n_cores |
positive integer representing the number of CPU cores
used for parallel computing. Currently it is not recommended to set
|
marginal |
list such that |
grad_marginal |
logical; if |
grad_marginal_prob |
logical; if |
Details
Consider notations from the Details sections of
cmnorm
and dmnorm
.
The function calculates probabilities of the form:
P\left(x^{(l)}\leq X_{I_{d}}\leq
x^{(u)}|X_{I_{g}}=x^{(g)}\right)
where x^{(l)}
and x^{(u)}
are lower and upper integration
limits respectively i.e. lower
and
upper
correspondingly. Also x^{(g)}
represents given_x
.
Note that lower
and upper
should be matrices of the same size.
Also given_x
should have the same number of rows as lower
and upper
.
To calculate bivariate probabilities the function applies the method of Drezner and Wesolowsky described in A. Genz (2004). In contrast to the classical implementation of this method the function applies Gauss-Legendre quadrature with 30 sample points to approximate integral (1) of A. Genz (2004). Classical implementations of this method use up to 20 points but requires some additional transformations of (1). During preliminary testing it has been found that approach with 30 points provides similar accuracy being slightly faster because of better vectorization capabilities.
To calculate trivariate probabilities the function uses Drezner method following formula (14) of A. Genz (2004). Similarly to bivariate case 30 points are used in Gauss-Legendre quadrature.
The function may apply the method of Gassmann (2003) for estimation of
m>3
dimensional normal probabilities. It uses
matrix 5
representation of Gassmann (2003) and 30 points of
Gauss-Legendre quadrature.
For m
-variate probabilities, where m > 1
, the function may apply
GHK algorithm described in section 4.2 of A. Genz and F. Bretz (2009).
The implementation of GHK is based on deterministic Halton sequence
with n_sim
draws and uses variable reordering suggested in
section 4.1.3 of A. Genz and F. Bretz (2009). The ordering algorithm may
be determined via ordering
argument. Available options
are "NO"
, "mean"
(default), and "variance"
.
Univariate probabilities are always calculated via standard approach so in
this case method
will not affect the output.
If method = "Gassmann"
then the function uses fast (aforementioned)
algorithms for bivariate and trivariate probabilities or the method of
Gassmann for m>3
dimensional probabilities.
If method = "GHK"
then GHK algorithm is used.
If method = "default"
then "Gassmann"
is used for bivariate
and trivariate probabilities while "GHK"
is used for m>3
dimensional probabilities. During future updates "Gassmann"
may become
a default method for calculation of the 4-5
dimensional probabilities.
We are going to provide alternative estimation algorithms during
future updates. These methods will be available via method
argument.
The function is optimized to perform much faster when all upper integration
limits upper
are finite while all lower integration limits
lower
are negative infinite. The derivatives could be also calculated
much faster when some integration limits are infinite.
For simplicity of notations further let's consider
unconditioned probabilities. Derivatives respect to conditioned components
are similar to those mentioned in Details section
of dmnorm
. We also provide formulas for m \geq 3
.
But the function may calculate derivatives for m \leq 2
using some
simplifications of the formulas mentioned below.
If grad_upper
is TRUE
then function additionally estimates the
gradient respect to upper
:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(u)}_{i}}=
P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}|
X_{i}=x^{(u)}_{i}\right)
f_{X_{i}}\left(x^{(u)}_{i};\mu_{i},\Sigma_{i,i}\right)
If grad_lower
is TRUE
then function additionally estimates the
gradient respect to lower
:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(l)}_{i}}=
-P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}|
X_{i}=x^{(l)}_{i}\right)
f_{X_{i}}\left(x^{(l)}_{i};\mu_{i},\Sigma_{i,i}\right)
If grad_sigma
is TRUE
then function additionally estimates the
gradient respect to sigma
. For i\ne j
the function
calculates derivatives respect to the covariances:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, j}}=
=P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(u)}_{i}, X_{j}=x^{(u)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(u)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -
-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(l)}_{i}, X_{j}=x^{(u)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(u)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -
-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(u)}_{i}, X_{j}=x^{(l)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(l)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) +
+P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(l)}_{i}, X_{j}=x^{(l)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(l)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right)
Note that if some of integration limits are infinite then some elements of this equation converge to zero which highly simplifies the calculations.
Derivatives respect to variances are calculated using derivatives respect to covariances and integration limits:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, i}}=
-\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(l)}_{i}} \frac{x^{(l)}_{i}}{2\Sigma_{i, i}}
-\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(u)}_{i}} \frac{x^{(u)}_{i}}{2\Sigma_{i, i}}-
-\sum\limits_{j\ne i}\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, j}}
\frac{\Sigma_{i, j}}{2\Sigma_{i, i}}
If grad_given
is TRUE
then function additionally estimates the
gradient respect to given_x
using formulas similar to those
described in Details section of dmnorm
.
More details on abovementioned differentiation formulas could be found in the appendix of E. Kossova and B. Potanin (2018).
If marginal
is not empty then Gaussian copula will be used instead of
a classical multivariate normal distribution. Without loss of generality
let's assume that \mu
is a vector of zeros and introduce some
additional notations:
q_{i}^{(u)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(u)}}{\sigma_{i}}\right)\right)
q_{i}^{(l)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(l)}}{\sigma_{i}}\right)\right)
where \Phi(.)^{-1}
is a quantile function of a standard
normal distribution and P_{i}
is a cumulative distribution function
of the standartized (to zero mean and unit variance) marginal distribution
which name and parameters are defined by
names(marginal)[i]
and marginal[[i]]
correspondingly.
For example if marginal[i] = "logistic"
then:
P_{i}(t) = \frac{1}{1+e^{-\pi t / \sqrt{3}}}
Let's denote by X^{*}
random vector which is distributed
with Gaussian (its covariance matrix is \Sigma
) copula and
marginals defined by marginal
.
Then probabilities for these random vector are calculated as follows:
P\left(x^{(l)}\leq X^{*}\leq
x^{(u)}\right) =
P\left(\sigma q^{(l)}\leq X\leq
\sigma q^{(u)}\right) = P_{0}\left(\sigma q^{(l)}, \sigma q^{(u)}\right)
where q^{(l)} = (q_{1}^{(l)},...,q_{m}^{(l)})
,
q^{(u)} = (q_{1}^{(u)},...,q_{m}^{(u)})
and
\sigma = (\sqrt{\Sigma_{1, 1}},...,\sqrt{\Sigma_{m, m}})
. Therefore
probabilities of X^{*}
may be calculated using probabilities
of multivariate normal random vector X
(with the same
covariance matrix) by
substituting lower and upper integration limits x^{(l)}
and
x^{(u)}
with \sigma q^{(l)}
and \sigma q^{(u)}
correspondingly. Therefore differentiation formulas are similar to
those mentioned above and will be automatically adjusted if
marginal
is not empty (including conditional probabilities).
Argument control
is a list with the following input parameters:
-
random_sequence
– numeric matrix of uniform pseudo random numbers (like Halton sequence). The number of columns should be equal to the number of dimensions of multivariate random vector. If omitted than this matrix will be generated automatically usingn_sim
number of simulations.
Value
This function returns an object of class "mnorm_pmnorm".
An object of class "mnorm_pmnorm" is a list containing the
following components:
-
prob
- probability that multivariate normal random variable will be betweenlower
andupper
bounds. -
grad_lower
- gradient of probability respect tolower
ifgrad_lower
orgrad_sigma
input argument is set toTRUE
. -
grad_upper
- gradient of probability respect toupper
ifgrad_upper
orgrad_sigma
input argument is set toTRUE
. -
grad_sigma
- gradient respect to the elements ofsigma
ifgrad_sigma
input argument is set toTRUE
. -
grad_given
- gradient respect to the elements ofgiven_x
ifgrad_given
input argument is set toTRUE
. -
grad_marginal
- gradient respect to the elements ofmarginal_par
ifgrad_marginal
input argument is set toTRUE
. Currently only derivatives respect to the parameters of"PGN"
distribution are available.
If log
is TRUE
then prob
is a log-probability
so output grad_lower
, grad_upper
, grad_sigma
and
grad_given
are calculated respect to the log-probability.
Output grad_lower
and grad_upper
are Jacobian matrices which
rows are gradients of the probabilities calculated for each row of
lower
and upper
correspondingly. Similarly grad_given
is a Jacobian matrix respect to given_x
.
Output grad_sigma
is a 3D array such that grad_sigma[i, j, k]
is a partial derivative of the probability function respect to the
sigma[i, j]
estimated for k
-th observation.
Output grad_marginal
is a list such that grad_marginal[[i]]
is a Jacobian matrice which rows are gradients of the probabilities
calculated for each row of lower
and upper
correspondingly
respect to the elements of marginal_par[[i]]
.
References
Genz, A. (2004), Numerical computation of rectangular bivariate and trivariate normal and t-probabilities, Statistics and Computing, 14, 251-260.
Genz, A. and Bretz, F. (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195. Springer-Verlag, Heidelberg.
E. Kossova, B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.
H. I. Gassmann (2003). Multivariate Normal Probabilities: Implementing an Old Idea of Plackett's. Journal of Computational and Graphical Statistics, vol. 12 (3), pages 731-752.
Examples
# Consider multivariate normal vector:
# X = (X1, X2, X3, X4, X5) ~ N(mean, sigma)
# Prepare multivariate normal vector parameters
# expected value
mean <- c(-2, -1, 0, 1, 2)
n_dim <- length(mean)
# correlation matrix
cor <- c( 1, 0.1, 0.2, 0.3, 0.4,
0.1, 1, -0.1, -0.2, -0.3,
0.2, -0.1, 1, 0.3, 0.2,
0.3, -0.2, 0.3, 1, -0.05,
0.4, -0.3, 0.2, -0.05, 1)
cor <- matrix(cor, ncol = n_dim, nrow = n_dim, byrow = TRUE)
# covariance matrix
sd_mat <- diag(c(1, 1.5, 2, 2.5, 3))
sigma <- sd_mat %*% cor %*% t(sd_mat)
# Estimate probability:
# P(-3 < X1 < 1, -2.5 < X2 < 1.5, -2 < X3 < 2, -1.5 < X4 < 2.5, -1 < X5 < 3)
lower <- c(-3, -2.5, -2, -1.5, -1)
upper <- c(1, 1.5, 2, 2.5, 3)
p.list <- pmnorm(lower = lower, upper = upper,
mean = mean, sigma = sigma)
p <- p.list$prob
print(p)
# Additionally estimate a probability
lower.1 <- c(-Inf, 0, -Inf, 1, -Inf)
upper.1 <- c(Inf, Inf, 3, 4, 5)
lower.mat <- rbind(lower, lower.1)
upper.mat <- rbind(upper, upper.1)
p.list.1 <- pmnorm(lower = lower.mat, upper = upper.mat,
mean = mean, sigma = sigma)
p.1 <- p.list.1$prob
print(p.1)
# Estimate the probabilities
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
lower.2 <- c(-1, -3, -5)
upper.2 <- c(1, 3, 5)
given_ind <- c(2, 4)
given_x <- c(-2, 4)
p.list.2 <- pmnorm(lower = lower.2, upper = upper.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x)
p.2 <- p.list.2$prob
print(p.2)
# Additionally estimate the probability
# P(-Inf < X1 < 1, -3 < X3 < Inf, -Inf < X5 < Inf | X2 = 4, X4 = -2)
lower.3 <- c(-Inf, -3, -Inf)
upper.3 <- c(1, Inf, Inf)
given_x.1 <- c(-2, 4)
lower.mat.2 <- rbind(lower.2, lower.3)
upper.mat.2 <- rbind(upper.2, upper.3)
given_x.mat <- rbind(given_x, given_x.1)
p.list.3 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x.mat)
p.3 <- p.list.3$prob
print(p.3)
# Estimate the gradient of previous probabilities respect various arguments
p.list.4 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x.mat,
grad_lower = TRUE, grad_upper = TRUE,
grad_sigma = TRUE, grad_given = TRUE)
p.4 <- p.list.4$prob
print(p.4)
# Gradient respect to 'lower'
grad_lower <- p.list.4$grad_lower
# for the first probability
print(grad_lower[1, ])
# for the second probability
print(grad_lower[2, ])
# Gradient respect to 'upper'
grad_upper <- p.list.4$grad_upper
# for the first probability
print(grad_upper[1, ])
# for the second probability
print(grad_upper[2, ])
# Gradient respect to 'given_x'
grad_given <- p.list.4$grad_given
# for the first probability
print(grad_given[1, ])
# for the second probability
print(grad_given[2, ])
# Gradient respect to 'sigma'
grad_given <- p.list.4$grad_given
# for the first probability
print(grad_given[1, ])
# for the second probability
print(grad_given[2, ])
# Compare analytical gradients from the previous example with
# their numeric (forward difference) analogues for the first probability
n_dependent <- length(lower.2)
n_given <- length(given_x)
n_dim <- n_dependent + n_given
delta <- 1e-6
grad_lower.num <- rep(NA, n_dependent)
grad_upper.num <- rep(NA, n_dependent)
grad_given.num <- rep(NA, n_given)
grad_sigma.num <- matrix(NA, nrow = n_dim, ncol = n_dim)
for (i in 1:n_dependent)
{
# respect to lower
lower.delta <- lower.2
lower.delta[i] <- lower.2[i] + delta
p.list.delta <- pmnorm(lower = lower.delta, upper = upper.2,
given_x = given_x,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_lower.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
# respect to upper
upper.delta <- upper.2
upper.delta[i] <- upper.2[i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.delta,
given_x = given_x,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_upper.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_given)
{
# respect to lower
given_x.delta <- given_x
given_x.delta[i] <- given_x[i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
given_x = given_x.delta,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_given.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_dim)
{
for(j in 1:n_dim)
{
# respect to sigma
sigma.delta <- sigma
sigma.delta[i, j] <- sigma[i, j] + delta
sigma.delta[j, i] <- sigma[j, i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
given_x = given_x,
mean = mean, sigma = sigma.delta,
given_ind = given_ind)
grad_sigma.num[i, j] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
}
# Comparison of gradients respect to lower integration limits
h.lower <- cbind(analytical = p.list.4$grad_lower[1, ],
numeric = grad_lower.num)
print(h.lower)
# Comparison of gradients respect to upper integration limits
h.upper <- cbind(analytical = p.list.4$grad_upper[1, ],
numeric = grad_upper.num)
print(h.upper)
# Comparison of gradients respect to given values
h.given <- cbind(analytical = p.list.4$grad_given[1, ],
numeric = grad_given.num)
print(h.given)
# Comparison of gradients respect to the covariance matrix
h.sigma <- list(analytical = p.list.4$grad_sigma[, , 1],
numeric = grad_sigma.num)
print(h.sigma)
# Let's again estimate probability
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
# But assume that standardized (to zero mean and unit variance):
# 1) X1 and X2 have standardized PGN distribution with coefficients vectors
# pc1 = (0.5, -0.2, 0.1) and pc2 = (0.2, 0.05) correspondingly.
# 2) X3 has standardized student distribution with 5 degrees of freedom
# 3) X4 has standardized logistic distribution
# 4) X5 has standard normal distribution
marginal <- list(PGN = c(0.5, -0.2, 0.1), hpa = c(0.2, 0.05),
student = 5, logistic = numeric(), normal = NULL)
p.list.5 <- pmnorm(lower = lower.2, upper = upper.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x,
grad_lower = TRUE, grad_upper = TRUE,
grad_sigma = TRUE, grad_given = TRUE,
marginal = marginal, grad_marginal = TRUE)
# Lets investigate derivatives respect to parameters
# of marginal distributions
# respect to pc1 of X1
p.list.5$grad_marginal[[1]]
# respect to pc2 of X2
p.list.5$grad_marginal[[2]]
# derivative respect to degrees of freedom of X5 is
# currently unavailable and will be set to 0
p.list.5$grad_marginal[[3]]