What does it mean to be subgaussian?

For many of us who have visited the steps of a concentration inequality from time to time, we are probably familiar with this term “subgaussian”. We know it means something like “variance”, in that if the variance of a random variable is 0, then it is also subgaussian with constant 0. We also know it has something to do with tail decay, e.g. according to wikipedia

“In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay.”

I thought in this post we could take a journey through a few distributions, though, to really visualize what this means.

Definition

A random variable $X$ with mean $\mu = \mathbb E[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that

$$
\mathbb E[e^{\lambda (X-\mu)}] \leq e^{\sigma^2\lambda^2/2}, \qquad \text{ for all } \lambda\in \mathbb R.
$$

That is to say, the tail decays faster than the tail of a Gaussian with variance $\sigma^2$.

(Source: Chapter 2 of High-Dimensional Statistics: A Non-Asymptotic Viewpoint By Martin J. Wainwright (2019) [1])

Usage

One important use of subgaussianity is to establish concentration inequalities, which are used to decide how fast the law of large numbers holds. For example, Hoeffding’s bound states that for random variables $X_i$, $i = 1,…,n$ independent and with mean $\mu_i$, then if they are all subgaussian with parameter $\sigma_i$, then

$$
\text{Pr}\left(\sum_{i=1}^n (X_i-\mu_i)\geq t\right) \leq \exp\left\{-\frac{t^2}{2\sum_{i=1}^n \sigma_i^2}\right\}.
$$

This is often taken to the case of bounded variables; if $X_i\in [a,b]$, then $\sigma_i = \frac{b-a}{2}$, and the bound simplifies to

$$
\text{Pr}\left(\sum_{i=1}^n (X_i-\mu_i)\geq t\right) \leq \exp\left\{-\frac{2t^2}{m(b-a)^2}\right\}.
$$

So we can see that, though $\sigma_i$ is not variance in every distribution’s case, it does seem to have a similar flavor to it; a distribution with a large variance would have a harder time “concentrating” to the mean. But… in what cases is that not really the whole story?

tl;dr

Now we are ready for the big idea. The subgaussian bound is given as $\sigma^2$ for Gaussian. Moreover, using Hoeffding’s lemma, this constant is , and $(b-a)/2$ for uniform, 1/2 for Bernoulli. (See also [1].)

Well, we tried using this for our application and were kind of annoyed that we could not demonstrate the behavior we wanted with the resulting concentration inequality. In particular, for a Bernoulli distribution, we want better rates when the probability of 0 or 1 is unusually large, e.g. converging toward a deterministic distribution. Obviously a not-that-random occurrence should concentrate better than a really random one!

So, that lead us down to some bunny holes, where we formed a fancy functions to make these bounds. Through a lot of trial and error, one function was found, and I don’t think they’re that mathematically interesting. What is maybe more interesting is that we can simply numerically compute the exact subgaussian bounds under the different distributional conditions, and use them to figure out if our intuition is indeed right–that though subgaussian isn’t the same thing as variance, it should somewhat behave like it, to some degree.

Distributions

Let’s take a look at how subgaussianity plays out in each case, by deriving the subgaussian constant in each case. In other words, we take a look at the function

$$
f(\lambda) = \frac{2}{\lambda^2} \ln(\mathbb E[e^{\lambda(X-\mu)}])=\frac{2}{\lambda^2} \ln\left(\int_{-\infty}^\infty e^{\lambda(x-\mu)}p_X(X=x)dx\right)\leq \sigma^2
$$

Gaussian

This one is quite easy. Without loss of generality, let’s assume $\mu = 0$. Then we take the integral

\begin{eqnarray*}
\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{\lambda x}e^{-\frac{x^2}{2\sigma^2}} dx&=&
\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{-\frac{x^2-2\sigma^2\lambda x}{2\sigma^2}} dx\\
&=&
\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{-\frac{(x-\sigma^2\lambda )^2 – (\sigma^2\lambda)^2}{2\sigma^2}} dx\\
&=&
e^{\frac{\sigma^2\lambda^2}{2}}\cdot \underbrace{\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{-\frac{(x-\sigma^2\lambda )^2}{2\sigma^2}} dx}_{=1}\\
\end{eqnarray*}

since the part in the bracket reduces to the pdf of a Gaussian random variable with mean $\sigma^2\lambda$ and variance $\sigma^2$. So

$$
\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{\lambda x}e^{-\frac{x^2}{2\sigma^2}} dx=
e^{\sigma^2\lambda^2/2}
$$

and therefore
$$f(\lambda) = \frac{2}{\lambda^2}\ln\left(\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty e^{\lambda x}e^{-\frac{x^2}{2\sigma^2}} dx\right) =
\sigma^2.$$

As expected, $\sigma^2$ in, $\sigma^2$ out.

Laplace

Beginning with the Laplace p.d.f.:

$$
p_X(x) = \frac{1}{2b}\exp(-\frac{|x-\mu|}{b})
$$

again without loss of generality, we pick $\mu = 0$. Then

\begin{eqnarray*}
\frac{1}{\sqrt{2b}}\int_{-\infty}^\infty e^{\lambda x}e^{-\frac{|x|}{b}} dx &=& \frac{1}{\sqrt{2b}}\int_0^\infty e^{\lambda x}e^{-\frac{-x}{b} }dx+\frac{1}{\sqrt{2b}}\int_0^\infty e^{\lambda x}e^{-\frac{x}{b}} dx \\
&=& \frac{1}{\sqrt{2b}}\int_0^\infty e^{(\lambda+b^{-1}) x}dx+\frac{1}{\sqrt{2b}}\int_0^\infty e^{(\lambda-b^{-1}) x} dx
\end{eqnarray*}

This function clearly grows monotonically with $\lambda$, so let’s just take a look at its smallest possible value, at $\lambda = 0$. Then, this term is

\begin{eqnarray*}
\frac{1}{\sqrt{2b}}\int_0^\infty e^{x/b}dx+\frac{1}{\sqrt{2b}}\int_0^\infty e^{-x/b} dx \geq \frac{1}{\sqrt{2b}}\int_0^\infty e^{x/b}dx = \frac{\sqrt{b}}{\sqrt{2}}e^{x/b}
\end{eqnarray*}

which does not converge. Therefore, Laplace distributions are not subgaussian.

Uniform

Now let’s try

$$
p_X(x) = \begin{cases} \frac{1}{b-a}, & a<x<b\\0&\text{else.}\end{cases}
$$

Again, we first reset to $\mu = 0$ to reduce calculations, and define $c = (b-a)/2$, so the interval is now $[-c,c]$. Then

\begin{eqnarray*}
\frac{1}{2c}\int_{-c}^c e^{\lambda x}dx &=& \frac{e^{\lambda c}\, -\, e^{-\lambda c} }{2c\lambda }
\end{eqnarray*}

and

$$
f(\lambda) = \frac{2}{\lambda^2} \ln\left(\frac{e^{\lambda c} \,- \,e^{-\lambda c}}{2c\lambda }\right)
$$

Using a handy dandy graphing calculator, we can see this function achieves its maximum at $\lambda = 0$, so

$$
\max_{\lambda} f(\lambda) = c/3
$$

which in fact is slightly tighter than the Hoeffding lemma offered bound of $c$. However, maybe we don’t moan and groan too much in this case, because it’s just a constant factor off.

Bernoulli

This is the one with the real problem. In particular, this Hoeffding lemma bound gives a constant that does not depend on the distributional parameter, $p$. But, clearly if the situation is less stochastic, e.g. $p$ is really close to 0 or 1, then somehow, my concentration inequalities should “concentrate” faster. That is to say, the subgaussian bound seems it should be better than 1/4!

Well, let’s take a look. Let’s first write down a clean expression for $f(\lambda)$. For a distributional parameter $p$, the mean $\mu = p$, so:

$$
f(\lambda) = \frac{2}{\lambda^2} \ln(p e^{\lambda(1-p)} + (1-p)e^{-\lambda p})
$$

Just plotting $f(\lambda)$ numerically in this case already shows that this bound is pessimistic. But worse than in the uniform case, it doesn’t depend on $p$!

The bound, for Bernoulli distributions

Claim: the following function

$$
g(p) = -\frac{1}{2} \frac{(p-\beta)^2}{\ln(p)}
$$

satisfies

$g(p) \geq f(\lambda)$ for all $\lambda$, for $0<p<0.5$, $\beta > 1$. Obviously, since the function is symmetric, then we can use $g(1-p)$ whenever $p > 0.5$.

Proof by picture: Here, we pick $\beta = 1.1$.

Proof by math: Open.

I tried all the usual tricks, but things get too messy and don’t quite work out. Yet, based on many numerical experiments, this bound definitely holds! To me, that is satisfying enough, because honestly, it’s a 1-D problem so I’m not that suspicious. But if anyone did know of a proof of this thing that exists, do let me know!

Citations

[1] Wainwright, Martin J. High-dimensional statistics: A non-asymptotic viewpoint. Vol. 48. Cambridge university press, 2019.

What does it mean to be subgaussian?

Definition

tl;dr

Distributions

The bound, for Bernoulli distributions

Citations

Like this:

Published by HappyOptimist

Leave a ReplyCancel reply

Definition

tl;dr

Distributions

The bound, for Bernoulli distributions

Citations

Share this:

Like this:

Published by HappyOptimist

Leave a ReplyCancel reply

Discover more from The Happy Optimist