Skip to content
Surf Wiki
Save to docs
general/theory-of-probability-distributions

From Surf Wiki (app.surf) — the open knowledge base

Hellinger distance

Metric used in probability and statistics


Summary

Metric used in probability and statistics

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.{{Citation | author-link = Ernst Hellinger | url-access = subscription

It is sometimes called the Jeffreys distance.

Definition

Measure theory

To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures on a measure space \mathcal{X} that are absolutely continuous with respect to an auxiliary measure \lambda. Such a measure always exists, e.g \lambda = (P + Q). The square of the Hellinger distance between P and Q is defined as the quantity

:H^2(P,Q) = \frac{1}{2}\displaystyle \int_{\mathcal{X}} \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \lambda(dx).

Here, P(dx) = p(x)\lambda(dx) and Q(dx) = q(x) \lambda(dx), i.e. p and q are the Radon–Nikodym derivatives of P and Q respectively with respect to \lambda. This definition does not depend on \lambda, i.e. the Hellinger distance between P and Q does not change if \lambda is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

:H^2(P,Q) = \frac{1}{2}\int_{\mathcal{X}} \left(\sqrt{P(dx)} - \sqrt{Q(dx)}\right)^2.

Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

:H^2(f,g) =\frac{1}{2}\int \left(\sqrt{f(x)} - \sqrt{g(x)}\right)^2 , dx = 1 - \int \sqrt{f(x) g(x)} , dx,

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)

: 0\le H(P,Q) \le 1.

Discrete distributions

For two discrete probability distributions P=(p_1, \ldots, p_k) and Q=(q_1, \ldots, q_k), their Hellinger distance is defined as

: H(P, Q) = \frac{1}{\sqrt{2}} ; \sqrt{\sum_{i=1}^k (\sqrt{p_i} - \sqrt{q_i})^2},

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e. : H(P, Q) = \frac{1}{\sqrt{2}} ; \bigl|\sqrt{P} - \sqrt{Q} \bigr|_2 .

Also, 1 - H^2(P,Q) = \sum_{i=1}^k \sqrt{p_i q_i}.

Properties

The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.

The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor 1/\sqrt{2} in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient BC(P,Q) as it can be defined as

: H(P,Q) = \sqrt{1 - BC(P,Q)}.

Hellinger distances are used in the theory of sequential and asymptotic statistics.{{cite book

The squared Hellinger distance between two normal distributions P \sim \mathcal{N}(\mu_1,\sigma_1^2) and Q \sim \mathcal{N}(\mu_2,\sigma_2^2) is: : H^2(P, Q) = 1 - \sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} , e^{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}}.

The squared Hellinger distance between two multivariate normal distributions P \sim \mathcal{N}(\mu_1,\Sigma_1) and Q \sim \mathcal{N}(\mu_2,\Sigma_2) is : H^2(P, Q) = 1 - \frac{ \det (\Sigma_1)^{1/4} \det (\Sigma_2) ^{1/4}} { \det \left( \frac{\Sigma_1 + \Sigma_2}{2}\right)^{1/2} } \exp\left{-\frac{1}{8}(\mu_1 - \mu_2)^T \left(\frac{\Sigma_1 + \Sigma_2}{2}\right)^{-1} (\mu_1 - \mu_2) \right}

The squared Hellinger distance between two exponential distributions P \sim \mathrm{Exp}(\alpha) and Q \sim \mathrm{Exp}(\beta) is: : H^2(P, Q) = 1 - \frac{2 \sqrt{\alpha \beta}}{\alpha + \beta}.

The squared Hellinger distance between two Weibull distributions P \sim \mathrm{W}(k,\alpha) and Q \sim \mathrm{W}(k,\beta) (where k is a common shape parameter and \alpha, , \beta are the scale parameters respectively): : H^2(P, Q) = 1 - \frac{2 (\alpha \beta)^{k/2}}{\alpha^k + \beta^k}.

The squared Hellinger distance between two Poisson distributions with rate parameters \alpha and \beta, so that P \sim \mathrm{Poisson}(\alpha) and Q \sim \mathrm{Poisson}(\beta), is: : H^2(P,Q) = 1-e^{-\frac{1}{2} (\sqrt{\alpha} - \sqrt{\beta})^2}.

The squared Hellinger distance between two beta distributions P \sim \text{Beta}(a_1,b_1) and Q \sim \text{Beta}(a_2, b_2) is: : H^2(P,Q) = 1 - \frac{B\left(\frac{a_1 + a_2}{2}, \frac{b_1 + b_2}{2}\right)}{\sqrt{B(a_1, b_1) B(a_2, b_2)}} where B is the beta function.

The squared Hellinger distance between two gamma distributions P \sim \text{Gamma}(a_1,b_1) and Q \sim \text{Gamma}(a_2, b_2) is: : H^2(P,Q) = 1 - \Gamma\left({\scriptstyle\frac{a_1 + a_2}{2}}\right)\left(\frac{b_1+b_2}{2}\right)^{-(a_1+a_2)/2}\sqrt{\frac{b_1^{a_1}b_2^{a_2}}{\Gamma(a_1)\Gamma(a_2)}} where \Gamma is the gamma function.

Connection with total variation distance

The Hellinger distance H(P,Q) and the total variation distance (or statistical distance) \delta(P,Q) are related as follows:

: H^2(P,Q) \leq \delta(P,Q) \leq \sqrt{2}H(P,Q),.

The constants in this inequality may change depending on which renormalization you choose (1/2 or 1/\sqrt{2}).

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

Notes

References

References

  1. Nikulin, M.S.. "Hellinger distance".
  2. "Jeffreys distance - Encyclopedia of Mathematics".
  3. (1946-09-24). "An invariant form for the prior probability in estimation problems". Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
  4. Torgerson, Erik. (1991). "Encyclopedia of Mathematics". Cambridge University Press.
  5. Pardo, L.. (2006). "Statistical Inference Based on Divergence Measures". Chapman and Hall/CRC.
  6. Harsha, Prahladh. (September 23, 2011). "Lecture notes on communication complexity".
Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about Hellinger distance — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report