R: Introduction to Score Matching

scorematchingtheory {scorematchingad}

R Documentation

Introduction to Score Matching

Description

This package includes score matching estimators for particular distributions and a general capacity to implement additional score matching estimators. Score matching is a popular estimation technique when normalising constants for the proposed model are difficult to calculate or compute. Score matching was first developed by Hyvärinen (2005) and was further developed for subsets of Euclidean space (Hyvärinen 2007; Yu et al. 2019; Yu et al. 2020; Liu et al. 2019), Riemannian manifolds (Mardia et al. 2016; Mardia 2018), and Riemannian manifolds with boundary (Scealy and Wood 2023). In this help entry we briefly describe score matching estimation.

Score Matching in General

In the most general form (Riemannian manifolds with boundary) score matching minimises the weighted Hyvärinen divergence (Equation 7, Scealy and Wood 2023)

\phi(f, f_0) = \frac{1}{2} \int_M f_0(z)h(z)^2 \left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2 dM(z),

where

M is the manifold, isometrically embedded in Euclidean space, and dM(z) is the unnormalised uniform measure on M.
P(z) is the matrix that projects points onto the tangent space of the manifold at z, which is closely related to to Riemannian metric of M.
f_0 is the density of the data-generating process, defined with respect to dM(z).
f is the density of a posited model, again defined with respect to dM(z).
h(z) is a function, termed the boundary weight function, that is zero on the boundary of M and smooth (Section 3.2, Scealy and Wood 2023) or potentially piecewise smooth.
\nabla_z is the Euclidean gradient operator that differentiates with respect to z.
\lVert \cdot \rVert is the Euclidean norm.

Note that, because P(z) is the projection matrix, \left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2 is the natural inner product of the gradient of the log ratio of f and f_0.

When the density functions f and f_0 are smooth and positive inside M, and the boundary weight function is smooth or of particular forms for specific manifolds (Section 3.2, Scealy and Wood 2023), then minimising the weighted Hyvärinen divergence \phi(f, f_0) is equivalent to minimising the score matching discrepancy (Theorem 1, Scealy and Wood 2023)

\psi(f, f_0) = \int f_0(z)\big(A(z) + B(z) + C(z)\big)dM(z),

where

A(z) = \frac{1}{2} h(z)^2 \left(\nabla_z \log(f)\right)^T P(z) \left(\nabla_z \log(f)\right),

B(z) = h(z)^2\Delta_z\log(f),

C(z) = \left(\nabla_z h(z)^2)\right)^T P(z) \left(\nabla_z \log(f)\right),

and \Delta is the Laplacian operator. We term

A(z) + B(z) + C(z)

the score matching discrepancy function.

We suspect that (Theorem 1, Scealy and Wood 2023) holds more generally for nearly all realistic continuous and piecewise-smooth boundary weight functions, although no proof exists to our knowledge.

When n independent observations from f_0 are available, the integration in \psi(f, f_0) can be approximated by an average over the observations,

\psi(f, f_0) \approx \hat\psi(f, f_0) = \frac{1}{n} \sum_{i = 1}^n A(z_i) + B(z_i) + C(z_i).

If we parameterise a family of models f_\theta according to a vector of parameters \theta, then the score matching estimate is the \theta that minimises \hat\psi(f_\theta, f_0). In general, the score matching estimate must be found via numerical optimisation techniques, such as in the function cppad_search(). However, when the family of models is a canonical exponential family then often \hat\psi(f_\theta, f_0) and the score matching discrepancy function is a quadratic function of \theta (Mardia 2018) and the minimum has a closed-form solution found by cppad_closed().

Note that when M has a few or more dimensions, the calculations of A(z), B(z) and C(z) can become cumbersome. This package uses CppAD to automatically compute A(z), B(z) and C(z), and the quadratic simplification if it exists.

Transformations

Hyvärinen divergence \phi(f, f_0) is sensitive to transformations of the measure dM(z), including transforming the manifold. That is, transforming the manifold M changes the divergence between distributions and changes the minimum of \hat\psi(f_\theta, f_0). The transformation changes measure dM(z), the divergence and the estimator but does not transform the data.

For example, many different transformations of the simplex (i.e. compositional data) are possible (Appendix A.3, Scealy et al. 2024). Hyvärinen divergences that use the sphere, obtained from the simplex by a square root, have different behaviour to Hyvärinen divergence using Euclidean space obtained from the simplex using logarithms (Scealy et al. 2024). The estimator for the latter does not apply logarithms to the observations, in fact the estimator involves only polynomials of the observed compositions (Scealy et al. 2024).

The variety of estimator behaviour available through different transformations was a major motivator for this package as each transformation has different A(z), B(z) and C(z), and without automatic differentiation, implementation of the score matching estimator in each case would require a huge programming effort.

References

Hyvärinen A (2005). “Estimation of Non-Normalized Statistical Models by Score Matching.” Journal of Machine Learning Research, 6(24), 695–709. https://jmlr.org/papers/v6/hyvarinen05a.html.

Hyvärinen A (2007). “Some extensions of score matching.” Computational Statistics & Data Analysis, 51(5), 2499–2512. doi:10.1016/j.csda.2006.09.003.

Liu S, Kanamori T, Williams DJ (2019). “Estimating Density Models with Truncation Boundaries using Score Matching.” doi:10.48550/arXiv.1910.03834.

Mardia K (2018). “A New Estimation Methodology for Standard Directional Distributions.” In 2018 21st International Conference on Information Fusion (FUSION), 724–729. doi:10.23919/ICIF.2018.8455640.

Mardia KV, Kent JT, Laha AK (2016). “Score matching estimators for directional distributions.” doi:10.48550/arXiv.1604.08470.

Scealy JL, Hingee KL, Kent JT, Wood ATA (2024). “Robust score matching for compositional data.” Statistics and Computing, 34, 93. doi:10.1007/s11222-024-10412-w.

Scealy JL, Wood ATA (2023). “Score matching for compositional distributions.” Journal of the American Statistical Association, 118(543), 1811–1823. doi:10.1080/01621459.2021.2016422.

Yu S, Drton M, Shojaie A (2019). “Generalized Score Matching for Non-Negative Data.” Journal of Machine Learning Research, 20(76), 1–70. https://jmlr.org/papers/v20/18-278.html.

Yu S, Drton M, Shojaie A (2020). “Generalized Score Matching for General Domains.” doi:10.48550/arXiv.2009.11428.

[Package scorematchingad version 0.0.67 Index]