Hey there! Ready to crunch some numbers? Let’s dive into the world of statistics! Feel Free to reach out any time Contact Us

Cramer Rao Inequality - statsclick

CR Rao Inequality -statsclick, Cramer Rao, UMVUE

Information Inequality

In this chapter, the lower bounds $B(\theta)$ for the variance, which is the smallest variance that can be attained by an unbiased estimator of $g(\theta)$, are derived. Generally, these lower bounds are simple to calculate. The performance of an unbiased estimator is judged by the closeness to $V(\theta) + \Theta \in \mathbb{H}$. Here $L$ is local.


Regular Family of Distributions, Score Functions, and Fisher Information

Regularity Conditions

  • The parameter space $\Theta$ is a non-degenerate open interval on the real line $\mathbb{R}^1 (-\infty, \infty)$. That is, $\Theta \subset \mathbb{R}$ and $\Theta$ is an open interval.
  • The support of the distribution $f(x;\theta)$, denoted as $S(\theta) = \{x | f(x;\theta) > 0\}$, is independent of $\theta$. In other words, the family of distributions $\{f(x;\theta) : \theta \in \Theta\}$ has a common support.
  • For almost all $X = (x_1, x_2, ..., x_n)$ and for all $\theta \in \Theta$, the derivative:
    `\frac{\partial}{\partial \theta} f(x;\theta)`
    exists and is independent of $\theta$.
  • The range of integration is independent of the parameter $\theta$, so that $f(x;\theta)$ is differentiable under the integral sign.
  • The conditions of uniform convergence of integrals are satisfied, ensuring that the integral sign is valid.

Score Functions

The first derivative of the log-likelihood function is called the score function of the sample, denoted as:

`S(X, \theta) = \frac{\partial}{\partial \theta} \log f(X;\theta).`

It measures the sensitivity of the log-likelihood function to small changes in the value of $\theta$.

Fisher Information

`S(x,\theta) = \frac{\partial}{\partial \theta} \log f(x;\theta)`

The variance of a score function measures the strength of information contained in the sample of observations about $\theta$. Small variance for a given value of $\theta$ indicates that all the samples have their score near 0, meaning all the samples contain little information about the true value of $\theta$. Therefore, the variance of the score becomes the natural measure of information that the sample contains about $\theta$.

`I_X(\theta) = E_{\theta}[S^2(x, \theta)] = \nu_{\theta}[S(x,\theta)]`

Therefore, the squared average of the relative rate of change in the density, $E_{\theta} \left[ \frac{\partial}{\partial \theta} \log f(x;\theta) \right]^2$, at some point $\theta$ measures the strength by which the value of $\theta$ can be distinguished from its neighboring values. This quantity is denoted by $I(\theta)$.

A high value of $I(\theta)$ indicates that $\theta$ can be more accurately estimated by the sample observations $X$. We expect that we will get an unbiased estimator $\hat{\theta}$ with smaller variance. So, $I(\theta)$ measures the information that $X$ contains about the parameter $\theta$. This is known as Fisher information.

`I_X(\theta) = E \left[ - \frac{\partial^2}{\partial \theta^2} \log f(x;\theta) \right]= E \left[ \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right]`

This is called R.A. Fisher’s measure, as it represents the amount of information on $\theta$ supplied by the sample $(x_1, x_2, \dots, x_n)$. The reciprocal $\frac{1}{I(\theta)}$ represents the information limit to the variance of the estimator $t = t(x_1, x_2, \dots, x_n)$.

Lower Bounds for Variance of Unbiased Estimator

  • Rao and Cramer Lower Bound
  • Bhattacharyya Lower Bound
  • Chapman, Robbin, and Kiefer Lower Bound

Rao and Cramer Lower Bound

Suppose a family of PDFs $F = \{f(x;\theta), \theta \in \Theta\}$ satisfies the regularity conditions. Let a random sample $X_1, X_2, \dots, X_n$ be drawn from a population with PDF $f(x;\theta)$ in $F$, where $\theta$ is not known. Let $S(X)$ be an unbiased estimator of $g(\theta)$, so that its second moment exists. Then:

`\text{var}[S(X)] \geq \frac{\left( \frac{d}{d\theta} g(\theta) \right)^2}{E_{\theta} \left[ \frac{\partial}{\partial \theta} \log L \right]^2} = \frac{\left( g'(\theta) \right)^2}{I(\theta)}`

Remark

  • Regularity conditions hold for an exponential family but are not necessarily true for a non-exponential family.
  • CRLB, $B(\theta)$, depends only on the parametric function $g(\theta)$ and the joint density $f(x;\theta)$. This lower bound is uniform for any unbiased estimator.
  • CRLB in the i.i.d. case: If $X_1, X_2, \dots, X_n$ are i.i.d. from $f(x;\theta)$, then by $\text{eqn}$, $I_X(\theta) = n I_x(\theta)$. In this case, the CRLB $B(\theta)$ is given by:
`\nu_{\theta}[S(X)] \geq \frac{\left( g'(\theta) \right)^2}{n I_x(\theta)} = B(\theta)`
  • Fisher’s information contained in the sample $X_1, X_2, \dots, X_n$ on the parameter $\theta$, $I_x(\theta)$, increases with the increase in sample size $n$. Consequently, we have a smaller lower bound on the increase in sample variance of an unbiased estimator of $g(\theta)$.
  • In some cases, when regularity conditions are satisfied and UMVUE exists, the CRLB $B(\theta)$ is not sharp. In other words, in these cases, the variance of the estimator fails to reach the CRLB, the UMVUE are not most efficient. This may be considered as a drawback of defining an estimator as the most efficient corresponding to the CRLB. However, under such estimator situations, one fails to decide whether one should continue the search for an estimator that could attain the CRLB or just no estimator can attain it.
  • In cases where the regularity conditions are not satisfied, we cannot talk of the CRLB, even though UMVUEs may still exist.

Definition (Most Efficient Estimator):

An unbiased estimator $S$ is said to be the most efficient estimator for a regular family of distributions $\{f(x;\theta), \theta \in \Theta\}$, if

`\nu_{\theta}(S_{\theta}) = \text{CRLB} = \frac{\left( g'(\theta) \right)^2}{I_x(\theta)}`

$S$ is the best estimator of $g(\theta)$ in the sense that it achieves the minimum value for the average squared deviation $E_{\theta} [S_{\theta} - g(\theta)]^2$ for all $\theta$.

Definition (Efficiency of an Estimator):

The efficiency of an estimator $\delta$, when $S_{\theta}$ is given to be the most efficient estimator for a regular family $\{f(x;\theta), \theta \in \Theta\}$, is defined by:

`e(\delta, \theta) = \frac{\text{CRLB}}{\nu_{\theta}(\delta)} = \frac{\left[ I_x(\theta) \right]}{\nu_{\theta}(\delta)}`

The estimators become better and better with an increase in their efficiencies. Generally, the efficiency of an estimator $e < 1 $, and when it attains $1$, the corresponding estimator is said to be the most efficient.

  • `S(x,\theta) = \frac{\partial}{\partial \theta} \log f(x; \theta) = c(\theta) [S(x) - g(\theta)]`, it is the condition of linearity between the score and the unbiased estimator of $g(\theta)$. If the condition is satisfied, then $S(X)$ is not only UMVUE but also the most efficient (attains CR lower bound) for estimating $g(\theta)$.
  • If an unbiased estimator attains the CRLB (i.e., it is the most efficient), then it is MLE, but the converse is not necessarily true. MLEs are asymptotically CRLB estimators (most efficient).
  • MLE is not only consistent and asymptotically normal but also asymptotically most efficient.

Proof:

Let $X$ be a random variable following the pdf $f(x; \theta)$ and let $L$ be the likelihood function of the sample:

\[ L = l(x, \theta) = \prod_{i=1}^{n} f(x_i, \theta) \] \[ = \int L(x, \theta) dx = 1. \]

where $dx = \int \dots \int dx_1 dx_2 \dots dx_n$.

Differentiating with respect to $\theta$ and using regularity conditions given above, we get:

\[ \frac{\partial}{\partial \theta} \int L dx = 0 \Rightarrow \int \frac{\partial}{\partial \theta} \log L dx = 0 \Rightarrow E \left( \frac{\partial}{\partial \theta} \log L \right) = 0. \]

Let $t = t(x_1, x_2, \dots, x_n)$ be an unbiased estimator of $g(\theta)$ such that

\[ E(t) = g(\theta) \Rightarrow \int t L dx = g(\theta) \neq 0 \neq \int \left( \frac{\partial}{\partial \theta} \log L \right) dx = 0. \]

Differentiating w.r.t $\theta$, we get

\[ \int t \cdot \frac{\partial}{\partial \theta} L dx = g'(\theta) \] \[ \Rightarrow \int t \left( \frac{\partial}{\partial \theta} \log L \right) dx = g'(\theta). \]

Cramér-Rao Inequality

The covariance between an estimator $t$ and the score function is given by:

`\text{cov} \left( t, \frac{\partial}{\partial \theta} \log L \right) = E \left( t \cdot \frac{\partial}{\partial \theta} \log L \right) - E(t)E \left( \frac{\partial}{\partial \theta} \log L \right)`
`= \gamma'(\theta)`

where

`E \left( \frac{\partial}{\partial \theta} \log L \right) = 0, \quad E \left( t \cdot \frac{\partial}{\partial \theta} \log L \right) = \gamma'(\theta)`

Using the Cauchy-Schwarz inequality:

`\eta(X,Y)^2 \leq 1 \Rightarrow \left\{ \frac{\text{cov} \left( t, \frac{\partial}{\partial \theta} \log L \right) }{\sqrt{\text{var}(t) \cdot \text{var} \left( \frac{\partial}{\partial \theta} \log L \right) }} \right\}^2 \leq 1`

which leads to:

`\left\{ \gamma'(\theta) \right\}^2 \leq \text{var}(t) \cdot E \left( \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right)`
`\Rightarrow \gamma'(\theta)^2 \leq \text{var}(t) \cdot E \left( \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right)`

which gives the Cramér-Rao lower bound:

`\text{var}(t) \geq \frac{\gamma'(\theta)^2}{E \left( \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right)}`

Fisher Information

If $ t $ is an unbiased estimator of parameter $ \theta $, i.e.,

`E(t) = \theta \Rightarrow \gamma(\theta) = \theta \Rightarrow \gamma'(\theta) = 1`
`\text{var}(t) \geq \frac{1}{E \left( \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right)}`
`= \frac{1}{I(\theta)}`

This is called R.A. Fisher’s information measure. The Fisher information is defined as:

`I(\theta) = E \left\{ \left( \frac{\partial}{\partial \theta} \log L \right)^2 \right\} = -E \left( \frac{\partial^2}{\partial \theta^2} \log L \right)`
`I(\theta) = n \left\{ E \left( \frac{\partial}{\partial \theta} \log f(x,\theta) \right)^2 \right\} = -n E \left( \frac{\partial^2}{\partial \theta^2} \log f \right)`

An unbiased estimator $ t $ of $ \gamma(\theta) $ for which the Cramér-Rao lower bound is attained is called a minimum variance bound (MVB) estimator. An MVB estimator for $ \gamma(\theta) $ exists if and only if there exists a sufficient estimator for $ \gamma(\theta) $.

As $n$ gets larger, the lower bound for $var_{\theta}(T(X))$ gets smaller. This as the Fisher Information increases, the lower bound decreases and the "best" estimator will have smaller variance, consequently more information about $\theta$.

Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
Site is Blocked
Sorry! This site is not available in your country.