ppi {scorematchingad}R Documentation

Estimation of Polynomially-Tilted Pairwise Interaction (PPI) Model

Description

Estimates the parameters of the Polynomially-Tilted Pairwise Interaction (PPI) model (Scealy and Wood 2023) for compositional data. By default ppi() uses cppad_closed() to find estimate. For many situations a hard-coded implementation of the score matching estimator is also available.

For a given parameter vector evalparam, ppi_smvalues() computes the score matching discrepancy, the gradient and the Hessian of the score matching discrepancy (see smvalues()) and the gradient offset of the score matching discrepancy (see quadratictape_parts() and tapeGradOffset()).

Usage

ppi(
  Y,
  paramvec = NULL,
  trans,
  method = "closed",
  w = rep(1, nrow(Y)),
  constrainbeta = FALSE,
  bdryw = "ones",
  acut = NULL,
  bdrythreshold = 1e-10,
  shiftsize = bdrythreshold,
  approxorder = 10,
  control = list(tol = 1e-15, checkgrad = TRUE),
  paramvec_start = NULL
)

ppi_smvalues(
  Y,
  paramvec = NULL,
  evalparam,
  trans,
  method = "closed",
  w = rep(1, nrow(Y)),
  bdryw = "ones",
  acut = NULL,
  bdrythreshold = 1e-10,
  shiftsize = bdrythreshold,
  approxorder = 10,
  average = TRUE
)

Arguments

Y

A matrix of measurements. Each row is a compositional measurement (i.e. each row sums to 1 and has non-negative elements).

paramvec

Optionally a vector of the PPI models parameters. NA-valued elements of this vector are estimated and non-NA values are fixed. Generate paramvec easily using ppi_paramvec(). If NULL then all elements of A_L, b_L and \beta are estimated.

trans

The name of the transformation of the manifold in Hyvärinen divergence (See scorematchingtheory): "clr" (centred log ratio), "alr" (additive log ratio), "sqrt" or "none".

method

"closed" uses CppAD to solve in closed form the a quadratic score matching discrepancy using cppad_closed(). "hardcoded" uses hardcoded implementations. "iterative" uses cppad_search() (which uses CppAD and optimx::Rcgmin()) to iteratively find the minimum of the weighted Hyvärinen divergence.

w

Weights for each observation, if different observations have different importance. Used by Windham() and ppi_robust() for robust estimation.

constrainbeta

If TRUE, elements of \beta that are less than -1 are converted to -1 + 1E-7.

bdryw

The boundary weight function for down weighting measurements as they approach the manifold boundary. Either "ones", "minsq" or "prodsq". See details.

acut

The threshold a_c in bdryw to avoid over-weighting measurements interior to the simplex

bdrythreshold

iterative or closed methods only. For measurements within bdrythreshold of the simplex boundary a Taylor approximation is applied by shifting the measurement shiftsize towards the center of the simplex.

shiftsize

iterative or closed methods only. For measurements within bdrythreshold of the simplex boundary a Taylor approximation is applied by shifting the measurement shiftsize towards the center of the simplex.

approxorder

iterative or closed methods only. Order of the Taylor approximation for measurements on the boundary of the simplex.

control

iterative only. Passed to optimx::Rcgmin() to control the iterative solver.

paramvec_start

iterative method only. The starting guess for Rcgmin. Generate paramvec_start easily using ppi_paramvec().

evalparam

The parameter set to evaluate the score matching values. This is different to paramvec, which specifies which parameters to estimate. All elements of evalparam must be non-NA, and any parameters fixed by paramvec must have the same value in evalparam.

average

If TRUE return the (weighted average) of the measurements, otherwise return the values for each measurement.

Details

Estimation may be performed via transformation of the measure in Hyvärinen divergence from Euclidean space to the simplex (inverse of the additive log ratio transform), from a hyperplane to the simplex (inverse of the centred log ratio transform), from the positive quadrant of the sphere to the simplex (inverse of the square root transform), or without any transformation. In the latter two situations there is a boundary and weighted Hyvärinen divergence (Equation 7, Scealy and Wood 2023) is used. Properties of the estimator using the square root transform were studied by Scealy and Wood (2023). Properties of the estimator using the additive log ratio transform were studied by Scealy et al. (2024).

There are three boundary weight functions available:

Scealy and Wood (Theorem 1, Scealy and Wood 2023) prove that minimising the weighted Hyvärinen Divergence is equivalent to minimising \psi(f, f_0) (See scorematchingtheory) when the boundary weight function is smooth or for the functions "minsq" and "prodsq" above when the manifold is the simplex or positive orthant of a sphere.

Hard-coded estimators are available for the following situations:

Value

ppi() returns: A list of est, SE and info.

ppi_smvalues() returns a list of

PPI Model

The PPI model density is proportional to

\exp(z_L^TA_Lz_L + b_L^Tz_L)\prod_{i=1}^p z_i^{\beta_i},

where p is the dimension of a compositional measurement z, and z_L is z without the final (pth) component. A_L is a p-1 \times p-1 symmetric matrix that controls the covariance between components. b_L is a p-1 vector that controls the location within the simplex. The ith component \beta_i of \beta controls the concentration of density when z_i is close to zero: when \beta_i \geq 0 there is no concentration and \beta_i is hard to identify; when \beta_i < 0 then the probability density of the PPI model increases unboundedly as z_i approaches zero, with the increasing occuring more rapidly and sharply the closer \beta_i is to -1.

References

Scealy JL, Hingee KL, Kent JT, Wood ATA (2024). “Robust score matching for compositional data.” Statistics and Computing, 34, 93. doi:10.1007/s11222-024-10412-w.

Scealy JL, Wood ATA (2023). “Score matching for compositional distributions.” Journal of the American Statistical Association, 118(543), 1811–1823. doi:10.1080/01621459.2021.2016422.

See Also

Other PPI model tools: dppi(), ppi_param_tools, ppi_robust(), rppi()

Examples

model <- rppi_egmodel(100)
estalr <- ppi(model$sample,
              paramvec = ppi_paramvec(betap = -0.5, p = ncol(model$sample)),
              trans = "alr")
estsqrt <- ppi(model$sample,
              trans = "sqrt",
              bdryw = "minsq", acut = 0.1)

[Package scorematchingad version 0.0.67 Index]