Hey there! Ready to crunch some numbers? Let’s dive into the world of statistics! Contact Us Buy Now!

Methods of Estimation in Statistical Inference

Statistical estimation is the cornerstone of statistical inference, enabling us to derive insights about population characteristics using sample data. The population is described by a probability density function (pdf) 

$f(x;θ)$, which depends on unknown parameters θ. These parameters are often not directly observable but play a vital role in defining the population's behavior and structure.

To obtain a good such estimators following method can be used:

  1. Method of Maximum Likelihood Estimator 
  2. Method of Minimum Variance 
  3. Method of Moments
  4. Method of Least Squares
  5. Method of Minimum Chi-square
  6. Method of Inverse Probability 

Method of Maximum Likelihood Estimator 

For each sample point $x$ let $\hat\theta(x)$ be a estimator value at which $L(\theta|x)$ attains it maximum value as a function of $\theta$  with $x$ held fixed. A maximum likelihood estimator based on sample points $x_{i}$ is $\hat\theta_{MLE}$.

In practical scenarios, θ is estimated using a random sample. This process involves leveraging principles such as sufficiency, unbiasedness, and minimal variance to ensure estimators are both reliable and optimal. The goal is to minimize errors, typically in the sense of mean squared error (MSE).

This article explores key parametric estimation methods for deriving robust estimates of θ. Among these, the Method of Moments (MoM), introduced by Karl Pearson in the 18th century, stands out as one of the earliest and simplest techniques. Additionally, advanced methods like Minimum Chi-Square, Least Squares (MoLS), and Maximum Likelihood Estimation (MoML) are discussed for their applications and theoretical underpinnings.

By understanding these methods, statisticians can select the most appropriate technique for their analysis, balancing computational simplicity and statistical efficiency.

The observed sample points in parameter space maximises the likelihood function in order to achieve the global maxima.

`L(\theta) = \prod_{i=0}^{n} f(x_{i},\theta_{1},\theta_{2}.....\theta_{n},)`

This is the most general method of estimation of parameter is called the Maximum Likelihood Estimator (M.L.E.). This is earlier formulated by C.F. Gauss but the modern method of estimation is developed by Prof. R.A. Fisher with increase in advancement of its several optimal properties and he compared this various methods of estimator.This method used in the development of least squares theory.

The M.L.E. is a parameter point for the for which the observed sample is most likely or attaining the maximum value. 

General the M.L.E. is a good point estimator possessing some optimal properties. The principal of maximum likelihood estimation is to finding an good an estimator for unknown parameter $\theta = (\theta_{1}, \theta_{2},....... \theta_{k})$ which maximises the likelihood function in...$\tag{(1.1)}. But the drawback in finding the M.L.E. is sometimes it is difficult to calculate the calculus of M.L.E.

The principal of maximum likelihood consists finding an estimator for the unknown parameter $\theta = (\theta_{1}, \theta_{2},......,\theta_{k})$. Say which maximises the likelihood function $L(\theta)$ for variations in parameter. We want   $\hat \theta_{MLE}$. Thus if there exists a function

`\hat \theta =\hat \theta(x_{1}, x_{2}, x_{3}.........x_{n})`

 of the sample values which maximises $L$ for variations in $\theta$. Then $\hat \theta$ is to be taken as an estimator of $\theta$. $\hat \theta$ is usually called maximum likelihood estimator.

`\frac{\partial L(\theta)}{\partial \theta}= \theta` &  `\frac{\partial^2 L(\theta)}{\partial \theta^2} < 0`

MLE have many attractive large samples property. Thus for large samples MLE is an consistent estimator, asymptotically efficient, sometimes unbiased and under some regularity conditions it also provide a UMVUE. That's why the maximum likelihood estimation is most widely used parametric estimation technique.

MLE are like modes which maximize the pdf pmf and mode.

The most efficient estimators are necessarily MLE but the converse that MLE are the most efficient is not always true.

MLE Example with Normal Distributions

Maximum Likelihood Estimation - Normal Distribution

Terms related to M.L.E

1. Likelihood Function 

Likelihood Function is a concept in statistics inference. It combines the experimental and theoretical information. For a pdf it combines the several observed sample points and forms a joint function is called a likelihood function. If $X_{1}, X_{2}, X_{3}.........X_{n}$ are iid random sample from PDF or pmf $f(x | \theta_{1}, \theta_{2}, ........., \theta_{n})$ then the likelihood function is defined by 
`L(\theta|x) = L(\theta_{1}, \theta_{2}, ........., \theta_{n}|x_{1}, x_{2}, ........., x_{n})`
`L(\theta) = \prod_{i=0}^{n} f(x_{i}|\theta_{1},.......,\theta_{k})`

Obviously this method of finding the Estimator is in favour of the likelihood principal.

Example (Normal Likelihood)

Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $N(\mu , \sigma^2)$. Then the MLE of $\mu$ and $\sigma$ will be 

Solution: the pdf if Normal distribution is as follows 

`f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2π}}\exp^{-1/2({x-\mu}/\sigma)^2}`

Case 1 : MLE for $\mu$

The likelihood function is defined by  

`L(\theta) = \prod_{i=1}^{n} f(x_{i}|\theta)`

`L(\theta) = \prod_{i=1}^{n} \left[\frac{1}{\sigma\sqrt{2\pi}} \exp\left\{-\frac{1}{2}\left(\frac{x_i-\mu}{\sigma}\right)^2\right\}\right]`

`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`

`log L(\theta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2`

`\frac{\partial}{\partial \mu} \log L(\theta) = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)`

Now putting $\frac{\partial L(\theta)}{\partial \mu}=0$

`\hat \mu_{MLE} = \frac{\sum_{i=1}^{n} x_{i}}{n} \Rightarrow \bar x`

--

Case 2 : MLE for $\sigma^2$

`L(\theta) = \prod_{i=1}^{n} \left[ \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{-\frac{(x_i - \mu)^2}{2\sigma^2}\right\} \right]`

`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`

`\Rightarrow \frac{\partial L(\sigma^2)}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2`

`\Rightarrow -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0`

`\Rightarrow -n\sigma^2 + \sum_{i=1}^{n} (x_i - \mu)^2 = 0`

`\Rightarrow \hat \sigma_{MLE}^{2} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2`

Example Gamma Distribution 

Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $Gamma(a,\lambda)$

`f(x; a, \lambda)= \frac{a^{\lambda}}{{\Gamma(\lambda)}} e^{-ax}x^{\lambda-1}  ; x > 0, \, a > 0, \, \lambda > 0`

Solution:

The likelihood function is

`L(a, \lambda) = \frac{a^{n\lambda}}{\Gamma(\lambda)^n} (e^{-a \sum_{i=1}^{n} x_i}) (\prod_{i=1}^{n} x_i^{\lambda - 1})`

`\log L(a, \lambda) = n\lambda \log(a) - n \log(\Gamma(\lambda)) - a \sum_{i=1}^{n} x_i + (\lambda - 1) \sum_{i=1}^{n} \log(x_i)`

`\frac{\partial \log L(a, \lambda)}{\partial a} = \frac{n\lambda}{a} - \sum_{i=1}^{n} x_i`

Now putting `\frac{\partial \log L(a, \lambda)}{\partial a} = 0`

`\frac{n\lambda}{a} = \sum_{i=1}^{n} x_i`

`\hat a_{MLE} = \frac{n\lambda}{\sum_{i=1}^{n} x_i}\Rightarrow \frac{\lambda}{{\bar x}}`

Properties of MLEs

1. Non uniqueness of MLE

Levy 1985 and More 1971 have shown that if an MLE is exist it is unique and they are functions of sufficient statistic.
Let `X_{1}, X_{2}, X_{3}.........X_{n}  \sim  U[\theta - a, \theta + a], \theta \in \mathbb{R}, a>0`.
Here a is known constant.
The likelihood equation is

`L(\theta, x) = \prod_{i=0}^{n} f(x_{i},\theta)`

`L(\theta, x) = \prod_{i=0}^{n} \left( \frac{1}{2a} \right)`

`L(\theta, x) = \left(\frac{1}{2a}\right)^n`

We know 

`X_{(1)  > \theta - a = \theta < X_{(1)} + a`

`X_{(n)} < \theta + a = \theta > X_{(n)} - a`

So any value between `X_{(1)} + a` and `X_{(n)} - a` is a MLE of `\theta`.

So here MLE is not unique and given by 

`\alpha \big( X_{(n)} - 1 \big) + (1 - \alpha)\big( X_{(1)} + 1 \big)`

2. MLE need not be in a nice analytic form

Let $X_{1}, X_{2}, X_{3}.........X_{n} \sim N(\theta, \theta^2), \theta > 0$

`f(x; \theta) = \frac{1}{\sqrt{2\pi \theta^2}} exp{-\frac{(x - \theta)^2}{2\theta^2}}`

`L(\theta) = \prod_{i=1}^{n}[ \frac{1}{\sqrt{2\pi \theta^2}} exp{-\frac{(x_i - \theta)^2}{2\theta^2}}]`

`L(\theta) = \frac{1}{(2\pi \theta^2)^{n/2}} exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2}`

`\log L(\theta) = \log \left( \frac{1}{(2\pi \theta^2)^{n/2}} \right) + \log \left( exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2} \right)`

`\log L(\theta) = -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log(\theta^2) - \frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2`

`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta)`

`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta) = 0`

`-n\theta^2 + \sum_{i=1}^{n} (x_i - \theta) = 0`

`-n\theta^2 + \sum_{i=1}^{n} x_i - n\theta = 0`

`-n\theta^2 - n\theta + \sum_{i=1}^{n} x_i = 0`

`\theta^2 + \theta - \frac{1}{n} \sum_{i=1}^{n} x_i = 0`

`\theta^2 + \theta - C = 0, \quad \text{where } C = \frac{1}{n} \sum_{i=1}^{n} x_i`

`\theta = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}`

`\theta = \frac{-1 \pm \sqrt{1^2 - 4(1)\left(-\frac{1}{n} \sum_{i=1}^{n} x_i\right)}}{2(1)}`

`\theta = \frac{-1 \pm \sqrt{1 + \frac{4}{n} \sum_{i=1}^{n} x_i}}{2}`

`\theta = \frac{-1 \pm \sqrt{1 + \frac{4}{n} \sum_{i=1}^{n} x_i}}{2}`

3. MLE may not be in a closed form

Let $X_{1}, X_{2}, X_{3}.........X_{n} ~ Gamma(r,\lambda)$
When $\lambda$ is known and $r$ is unknown 

`f(x; r, \lambda) = \frac{\lambda^r x^{r-1} e^{-\lambda x}}{\Gamma(r)}, \quad x > 0, \, r > 0, \, \lambda > 0`

`L(r, \lambda) = \prod_{i=1}^{n} \frac{\lambda^r x_i^{r-1} e^{-\lambda x_i}}{\Gamma(r)}`

`L(r, \lambda) = \frac{\lambda^{nr}}{\Gamma(r)^n} \prod_{i=1}^{n} x_i^{r-1} e^{-\lambda \sum_{i=1}^{n} x_i}`

`\log L(r, \lambda) = \log \left( \frac{\lambda^{nr}}{\Gamma(r)^n} \prod_{i=1}^{n} x_i^{r-1} e^{-\lambda \sum_{i=1}^{n} x_i} \right)`

`\frac{\partial}{\partial r} \left( nr \log(\lambda) \right) = n \log(\lambda)`

`\frac{\partial}{\partial r} \left( -n \log(\Gamma(r)) \right) = -n \psi(r)`

`\frac{\partial}{\partial r} \left( (r-1) \sum_{i=1}^{n} \log(x_i) \right) = \sum_{i=1}^{n} \log(x_i)`

`\frac{\partial \log L(r, \lambda)}{\partial r} = n \log(\lambda) - n \psi(r) + \sum_{i=1}^{n} \log(x_i)`

Due to the involvement of the digamma function, the MLE for `r`  does not have a closed-form solution and requires numerical optimization techniques. Various numerical methods, such as Newton-Raphson or gradient-based approaches, are employed to approximate  . For this work, the parameter `r` was estimated numerically using the Scipy Optimize Minimize method in Python. It is not a nice analytic form because we can't solve di gamma function.

4. Invariance property of MLE 

Zehna discovered the estimation of MLE for the function of $\theta$ but condition that function must be one to one or many to one.

If $\hat \theta$ is MLE of $\theta$ then the $g(\hat \theta)$ is the MLE of $g(\theta)$

So `\hat \theta_{MLE} = g(\hat \theta_{MLE})`

5. Asymptotic Properties of MLE

$X ~ f(x, \theta), \theta \in \Theta$ an open interval in $mathbb{R}$
Regularity Assumptions 

1. $\partial/{\partial \theta}log f $ exist for all $x in |\theta - \theta_{°}|< \delta.$ means $\frac{\partial f}{\partial \theta}$,
$\frac{\partial^2 f}{\partial \theta^2}$, $\frac{\partial^3 f}{\partial \theta^3}$ also exist.

2. $E(\partial/{\partial \theta}log f(x,\theta))_{\theta = \theta_{°})=0$ and it's square > 0.

3. $\frac{\partial^3 log f}{\partial \theta^3}$ is Continuous in $\theta$. The quantity $\int f(x,\theta) dx =1$  can be differentiated twice under the integral sign with respect to $\theta$, so that 
$I(\theta) = \mathbb{E} \left[ \left( \frac{\partial \log f(X; \theta)}{\partial \theta} \right)^2 \right]=0$

Under these assumptions we will have large samples results.

MLE is the excellent estimator because it is consistent, efficient, sometimes may be unbaised and a function of sufficient statistics. That why MLE is the most widely used parametric estimation technique.


Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
Site is Blocked
Sorry! This site is not available in your country.