次高斯性
设$X$为一个均值为$\mu =\mathbb{E}\left[ X \right] $的$r.v.$,若存在$\sigma >0$使得:
$$
\mathbb{E}\left[ exp\left\{ \lambda \left( X-\mu \right) \right\} \right] \le exp\left\{ \frac{\sigma ^2\lambda ^2}{2} \right\} ,\forall \lambda \in \mathbb{R}
$$
则称其为$\sigma$-次高斯的,其中$\sigma$作为次高斯参数。
若$X$为$\sigma$-次高斯随机变量,则$X$满足:
$$
Pr\left( \left( X-\mu \right) \ge \epsilon \right) \le e^{-\frac{\varepsilon ^2}{2\sigma ^2}},\forall \epsilon >0
$$
$Prrof:$设$\lambda \in \left( 0,+\infty \right) $有:
$$
Pr\left( \left( X-\mu \right) \ge \epsilon \right) =Pr\left( exp\left\{ \lambda \left( X-\mu \right) \right\} \ge exp\left\{ \lambda \epsilon \right\} \right)
$$
$$
=\frac{\mathbb{E}\left[ exp\left\{ \lambda \left( X-\mu \right) \right\} \right]}{exp\left\{ \lambda \epsilon \right\}}\le \frac{exp\left\{ \frac{\sigma ^2\lambda ^2}{2} \right\}}{exp\left\{ \lambda \epsilon \right\}}=exp\left\{ \frac{\sigma ^2\lambda ^2}{2}-\lambda \epsilon \right\}
$$
解得$\lambda ^*=\frac{a}{\sigma ^2}$时最优,最终得到:
$$
Pr\left( \left( X-\mu \right) \ge \epsilon \right) \le e^{-\frac{\varepsilon ^2}{2\sigma ^2}},\forall \epsilon >0
$$
霍夫丁界
设$X$为一个均值为$\mu =\mathbb{E}\left[ X \right] $的$r.v.$,使得$a\le X\le b$几乎处处成立,则$X$是次高斯的,其次高斯参数是$\sigma =\frac{b-a}{2}$。
$Prrof:$先证对于任意$r.v.$ $Y$满足$a\le Y\le b$ $a.s.$,$Var\left( Y \right) \le \frac{\left( b-a \right)^2}{4}$;
根据方差定义有:
$$
Var\left( Y \right) =\mathbb{E}\left[ \left( Y-\mathbb{E}\left[ Y \right] \right) ^2 \right] =\underset{t}{min}\ \mathbb{E}\left[ \left( Y-t \right) ^2 \right]
$$
$$
\le \mathbb{E}\left[ \left( Y-\frac{a+b}{2} \right) ^2 \right] \le \underset{y\in \left[ a,b \right]}{max}\ \left[ \left( y-\frac{a+b}{2} \right) ^2 \right]
$$
显然当$y=a$或$y=b$上式取得最大值$\frac{\left( b-a \right)^2}{4}$
设$P$为$X$的概率分布,定义$\phi \left( \lambda \right) \triangleq ln\mathbb{E}_p\left[ e^{\lambda X} \right]$(对数矩母函数),则有:
$$
\phi ‘\left( \lambda \right) =\frac{\left( \mathbb{E}_p\left[ e^{\lambda X} \right] \right) ‘}{\mathbb{E}_p\left[ e^{\lambda X} \right]}=\frac{\mathbb{E}_p\left[ Xe^{\lambda X} \right]}{\mathbb{E}_p\left[ e^{\lambda X} \right]}
$$
$$
\phi ”\left( \lambda \right) =\frac{\mathbb{E}_p\left[ X^2e^{\lambda X} \right] \mathbb{E}_p\left[ e^{\lambda X} \right] -\mathbb{E}_p\left[ Xe^{\lambda X} \right] \mathbb{E}_p\left[ Xe^{\lambda X} \right]}{\mathbb{E}_p\left[ e^{\lambda X} \right] ^2}
$$
$$
=\frac{\mathbb{E}_p\left[ X^2e^{\lambda X} \right]}{\mathbb{E}_p\left[ e^{\lambda X} \right]}-\frac{\mathbb{E}_p\left[ Xe^{\lambda X} \right] ^2}{\mathbb{E}_p\left[ e^{\lambda X} \right] ^2}
$$
设$Q_\lambda$是一个关于$X$的分布,定义为:
$$
dQ_{\lambda}=\frac{e^{\lambda X}}{\mathbb{E}_p\left[ e^{\lambda X} \right]}dP\left( x \right)
$$
则有:
$$
\phi ‘\left( \lambda \right) =\frac{\mathbb{E}_p\left[ Xe^{\lambda X} \right]}{\mathbb{E}_p\left[ e^{\lambda X} \right]}=\ \frac{1}{\mathbb{E}_p\left[ e^{\lambda X} \right]}\int{xe^{\lambda x}dP\left( x \right)}=\int{xdQ_{\lambda}\left( x \right)}=\mathbb{E}_{Q_{\lambda}}\left[ X \right]
$$
$$
\phi ”\left( \lambda \right) ==\frac{\mathbb{E}_p\left[ X^2e^{\lambda X} \right]}{\mathbb{E}_p\left[ e^{\lambda X} \right]}-\frac{\mathbb{E}_p\left[ Xe^{\lambda X} \right] ^2}{\mathbb{E}_p\left[ e^{\lambda X} \right] ^2}=\mathbb{E}_{Q_{\lambda}}\left[ X^2 \right] -\mathbb{E}_{Q_{\lambda}}\left[ X \right] ^2=Var_{Q_{\lambda}}\left[ X \right]
$$
对$\phi \left( \lambda \right)$在$ \lambda =0$处进行泰勒展开有:
$$
\phi ‘\left( 0 \right) =\frac{\mathbb{E}_p\left[ Xe^{0X} \right]}{\mathbb{E}_p\left[ e^{0X} \right]}=\frac{\mathbb{E}_p\left[ X \right]}{\mathbb{E}_p\left[ 1 \right]}=\mu
$$
$$
\phi ”\left( 0 \right) =Var_{Q_{\lambda}}\left[ X \right] \le \frac{\left( b-a \right) ^2}{4}
$$
$$
\phi \left( \lambda \right) =\phi \left( 0 \right) +\phi ‘\left( 0 \right) \lambda +\frac{1}{2}\phi ”\left( \tilde{\lambda} \right) \lambda ^2,\tilde{\lambda}\in \left[ 0,\lambda \right]
$$
$$
=0+\mu \lambda +\frac{1}{2}\phi ”\left( \tilde{\lambda} \right) \lambda ^2\le \mu \lambda +\frac{\lambda ^2\left( b-a \right) ^2}{8}
$$
回代对数矩母函数可以得到:
$$
ln\mathbb{E}_p\left[ e^{\lambda X} \right] \le \mu \lambda +\frac{\lambda ^2\left( b-a \right) ^2}{8}\Rightarrow \mathbb{E}_p\left[ e^{\lambda X} \right] \le exp\left\{ \mu \lambda +\frac{\lambda ^2\left( b-a \right) ^2}{8} \right\}
$$
$$
\left( exp\left\{ -\mu \lambda \right\} >0 \right) \Rightarrow \mathbb{E}_p\left[ e^{\lambda X} \right] exp\left\{ -\mu \lambda \right\} \le exp\left\{ \frac{\lambda ^2\left( b-a \right) ^2}{8} \right\}
$$
$$
\Rightarrow \mathbb{E}_p\left[ e^{\lambda \left( X-\mu \right)} \right] \le exp\left\{ \frac{\lambda ^2\left( \frac{b-a}{2} \right) ^2}{2} \right\}
$$
至此引理得证!至于《概率与计算 算法与数据分析中的随机化和概率技术》一书中的霍夫丁引理则是以上定理在$\mu =0$时的特殊形式。
假设$X$是$\sigma$-次高斯的$r.v.$,$X_1,X_2$相互独立,分别为$\sigma _1,\sigma _2$-次高斯,则有:
(1)$Var\left[ X \right] \le \sigma ^2$;
(2)$\forall c\in \mathbb{R},cX$是$\left| c \right|\sigma $-次高斯的$r.v.$;
(3)$X_1+X_2$是$\sqrt{\sigma _{1}^{2}+\sigma _{2}^{2}}$-次高斯的$r.v.$;