Methods of Estimation in Statistical Inference

Statistical estimation is the cornerstone of statistical inference, enabling us to derive insights about population characteristics using sample data. The population is described by a probability density function (pdf)

$f(x;θ)$, which depends on unknown parameters θ. These parameters are often not directly observable but play a vital role in defining the population's behavior and structure.

To obtain a good such estimators following method can be used:

Method of Maximum Likelihood Estimator
Method of Minimum Variance
Method of Moments
Method of Least Squares
Method of Minimum Chi-square
Method of Inverse Probability

Method of Maximum Likelihood Estimator

For each sample point $x$ let $\hat\theta(x)$ be a estimator value at which $L(\theta|x)$ attains it maximum value as a function of $\theta$ with $x$ held fixed. A maximum likelihood estimator based on sample points $x_{i}$ is $\hat\theta_{MLE}$.

In practical scenarios, θ is estimated using a random sample. This process involves leveraging principles such as sufficiency, unbiasedness, and minimal variance to ensure estimators are both reliable and optimal. The goal is to minimize errors, typically in the sense of mean squared error (MSE).

This article explores key parametric estimation methods for deriving robust estimates of θ. Among these, the Method of Moments (MoM), introduced by Karl Pearson in the 18th century, stands out as one of the earliest and simplest techniques. Additionally, advanced methods like Minimum Chi-Square, Least Squares (MoLS), and Maximum Likelihood Estimation (MoML) are discussed for their applications and theoretical underpinnings.

By understanding these methods, statisticians can select the most appropriate technique for their analysis, balancing computational simplicity and statistical efficiency.

The observed sample points in parameter space maximises the likelihood function in order to achieve the global maxima.

`L(\theta) = \prod_{i=0}^{n} f(x_{i},\theta_{1},\theta_{2}.....\theta_{n},)`

This is the most general method of estimation of parameter is called the Maximum Likelihood Estimator (M.L.E.). This is earlier formulated by C.F. Gauss but the modern method of estimation is developed by Prof. R.A. Fisher with increase in advancement of its several optimal properties and he compared this various methods of estimator.This method used in the development of least squares theory.

The M.L.E. is a parameter point for the for which the observed sample is most likely or attaining the maximum value.

General the M.L.E. is a good point estimator possessing some optimal properties. The principal of maximum likelihood estimation is to finding an good an estimator for unknown parameter $\theta = (\theta_{1}, \theta_{2},....... \theta_{k})$ which maximises the likelihood function in...$\tag{(1.1)}. But the drawback in finding the M.L.E. is sometimes it is difficult to calculate the calculus of M.L.E.

The principal of maximum likelihood consists finding an estimator for the unknown parameter $\theta = (\theta_{1}, \theta_{2},......,\theta_{k})$. Say which maximises the likelihood function $L(\theta)$ for variations in parameter. We want $\hat \theta_{MLE}$. Thus if there exists a function

`\hat \theta =\hat \theta(x_{1}, x_{2}, x_{3}.........x_{n})`

of the sample values which maximises $L$ for variations in $\theta$. Then $\hat \theta$ is to be taken as an estimator of $\theta$. $\hat \theta$ is usually called maximum likelihood estimator.

`\frac{\partial L(\theta)}{\partial \theta}= \theta` & `\frac{\partial^2 L(\theta)}{\partial \theta^2} < 0`

MLE have many attractive large samples property. Thus for large samples MLE is an consistent estimator, asymptotically efficient, sometimes unbiased and under some regularity conditions it also provide a UMVUE. That's why the maximum likelihood estimation is most widely used parametric estimation technique.

MLE are like modes which maximize the pdf pmf and mode.

The most efficient estimators are necessarily MLE but the converse that MLE are the most efficient is not always true.

MLE Example with Normal Distributions

Maximum Likelihood Estimation - Normal Distribution

Terms related to M.L.E

1. Likelihood Function

Likelihood Function is a concept in statistics inference. It combines the experimental and theoretical information. For a pdf it combines the several observed sample points and forms a joint function is called a likelihood function. If $X_{1}, X_{2}, X_{3}.........X_{n}$ are iid random sample from PDF or pmf $f(x | \theta_{1}, \theta_{2}, ........., \theta_{n})$ then the likelihood function is defined by

`L(\theta|x) = L(\theta_{1}, \theta_{2}, ........., \theta_{n}|x_{1}, x_{2}, ........., x_{n})`

`L(\theta) = \prod_{i=0}^{n} f(x_{i}|\theta_{1},.......,\theta_{k})`

Obviously this method of finding the Estimator is in favour of the likelihood principal.

Example (Normal Likelihood)

Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $N(\mu , \sigma^2)$. Then the MLE of $\mu$ and $\sigma$ will be

Solution: the pdf if Normal distribution is as follows

`f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2π}}\exp^{-1/2({x-\mu}/\sigma)^2}`

Case 1 : MLE for $\mu$

The likelihood function is defined by

`L(\theta) = \prod_{i=1}^{n} f(x_{i}|\theta)`

`L(\theta) = \prod_{i=1}^{n} \left[\frac{1}{\sigma\sqrt{2\pi}} \exp\left\{-\frac{1}{2}\left(\frac{x_i-\mu}{\sigma}\right)^2\right\}\right]`

`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`

`log L(\theta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2`

`\frac{\partial}{\partial \mu} \log L(\theta) = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)`

Now putting $\frac{\partial L(\theta)}{\partial \mu}=0$

`\hat \mu_{MLE} = \frac{\sum_{i=1}^{n} x_{i}}{n} \Rightarrow \bar x`

Case 2 : MLE for $\sigma^2$

`L(\theta) = \prod_{i=1}^{n} \left[ \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{-\frac{(x_i - \mu)^2}{2\sigma^2}\right\} \right]`

`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`

`\Rightarrow \frac{\partial L(\sigma^2)}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2`

`\Rightarrow -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0`

`\Rightarrow -n\sigma^2 + \sum_{i=1}^{n} (x_i - \mu)^2 = 0`

`\Rightarrow \hat \sigma_{MLE}^{2} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2`

Example Gamma Distribution

Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $Gamma(a,\lambda)$

`f(x; a, \lambda)= \frac{a^{\lambda}}{{\Gamma(\lambda)}} e^{-ax}x^{\lambda-1} ; x > 0, \, a > 0, \, \lambda > 0`

Solution:

The likelihood function is

`L(a, \lambda) = \frac{a^{n\lambda}}{\Gamma(\lambda)^n} (e^{-a \sum_{i=1}^{n} x_i}) (\prod_{i=1}^{n} x_i^{\lambda - 1})`

`\log L(a, \lambda) = n\lambda \log(a) - n \log(\Gamma(\lambda)) - a \sum_{i=1}^{n} x_i + (\lambda - 1) \sum_{i=1}^{n} \log(x_i)`

`\frac{\partial \log L(a, \lambda)}{\partial a} = \frac{n\lambda}{a} - \sum_{i=1}^{n} x_i`

Now putting `\frac{\partial \log L(a, \lambda)}{\partial a} = 0`

`\frac{n\lambda}{a} = \sum_{i=1}^{n} x_i`

`\hat a_{MLE} = \frac{n\lambda}{\sum_{i=1}^{n} x_i}\Rightarrow \frac{\lambda}{{\bar x}}`

Properties of MLEs

1. Non uniqueness of MLE

Levy 1985 and More 1971 have shown that if an MLE is exist it is unique and they are functions of sufficient statistic.

Let `X_{1}, X_{2}, X_{3}.........X_{n} \sim U[\theta - a, \theta + a], \theta \in \mathbb{R}, a>0`.

Here a is known constant.

The likelihood equation is

`L(\theta, x) = \prod_{i=0}^{n} f(x_{i},\theta)`

`L(\theta, x) = \prod_{i=0}^{n} \left( \frac{1}{2a} \right)`

`L(\theta, x) = \left(\frac{1}{2a}\right)^n`

We know

`X_{(1) > \theta - a = \theta < X_{(1)} + a`

`X_{(n)} < \theta + a = \theta > X_{(n)} - a`

So any value between `X_{(1)} + a` and `X_{(n)} - a` is a MLE of `\theta`.

So here MLE is not unique and given by

`\alpha \big( X_{(n)} - 1 \big) + (1 - \alpha)\big( X_{(1)} + 1 \big)`

2. MLE need not be in a nice analytic form

Let $X_{1}, X_{2}, X_{3}.........X_{n} \sim N(\theta, \theta^2), \theta > 0$

`f(x; \theta) = \frac{1}{\sqrt{2\pi \theta^2}} exp{-\frac{(x - \theta)^2}{2\theta^2}}`

`L(\theta) = \prod_{i=1}^{n}[ \frac{1}{\sqrt{2\pi \theta^2}} exp{-\frac{(x_i - \theta)^2}{2\theta^2}}]`

`L(\theta) = \frac{1}{(2\pi \theta^2)^{n/2}} exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2}`

`\log L(\theta) = \log \left( \frac{1}{(2\pi \theta^2)^{n/2}} \right) + \log \left( exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2} \right)`

`\log L(\theta) = -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log(\theta^2) - \frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2`

`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta)`

`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta) = 0`

`-n\theta^2 + \sum_{i=1}^{n} (x_i - \theta) = 0`

`-n\theta^2 + \sum_{i=1}^{n} x_i - n\theta = 0`

`-n\theta^2 - n\theta + \sum_{i=1}^{n} x_i = 0`

`\theta^2 + \theta - \frac{1}{n} \sum_{i=1}^{n} x_i = 0`

`\theta^2 + \theta - C = 0, \quad \text{where } C = \frac{1}{n} \sum_{i=1}^{n} x_i`

`\theta = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}`

`\theta = \frac{-1 \pm \sqrt{1^2 - 4(1)\left(-\frac{1}{n} \sum_{i=1}^{n} x_i\right)}}{2(1)}`

`\theta = \frac{-1 \pm \sqrt{1 + \frac{4}{n} \sum_{i=1}^{n} x_i}}{2}`

3. MLE may not be in a closed form

Let $X_{1}, X_{2}, X_{3}.........X_{n} ~ Gamma(r,\lambda)$

When $\lambda$ is known and $r$ is unknown

`f(x; r, \lambda) = \frac{\lambda^r x^{r-1} e^{-\lambda x}}{\Gamma(r)}, \quad x > 0, \, r > 0, \, \lambda > 0`

`L(r, \lambda) = \prod_{i=1}^{n} \frac{\lambda^r x_i^{r-1} e^{-\lambda x_i}}{\Gamma(r)}`

`L(r, \lambda) = \frac{\lambda^{nr}}{\Gamma(r)^n} \prod_{i=1}^{n} x_i^{r-1} e^{-\lambda \sum_{i=1}^{n} x_i}`

`\log L(r, \lambda) = \log \left( \frac{\lambda^{nr}}{\Gamma(r)^n} \prod_{i=1}^{n} x_i^{r-1} e^{-\lambda \sum_{i=1}^{n} x_i} \right)`

`\frac{\partial}{\partial r} \left( nr \log(\lambda) \right) = n \log(\lambda)`

`\frac{\partial}{\partial r} \left( -n \log(\Gamma(r)) \right) = -n \psi(r)`

`\frac{\partial}{\partial r} \left( (r-1) \sum_{i=1}^{n} \log(x_i) \right) = \sum_{i=1}^{n} \log(x_i)`

`\frac{\partial \log L(r, \lambda)}{\partial r} = n \log(\lambda) - n \psi(r) + \sum_{i=1}^{n} \log(x_i)`

Due to the involvement of the digamma function, the MLE for `r` does not have a closed-form solution and requires numerical optimization techniques. Various numerical methods, such as Newton-Raphson or gradient-based approaches, are employed to approximate . For this work, the parameter `r` was estimated numerically using the Scipy Optimize Minimize method in Python. It is not a nice analytic form because we can't solve di gamma function.

4. Invariance property of MLE

Zehna discovered the estimation of MLE for the function of $\theta$ but condition that function must be one to one or many to one.

If $\hat \theta$ is MLE of $\theta$ then the $g(\hat \theta)$ is the MLE of $g(\theta)$

So `\hat \theta_{MLE} = g(\hat \theta_{MLE})`

5. Asymptotic Properties of MLE

$X ~ f(x, \theta), \theta \in \Theta$ an open interval in $mathbb{R}$

Regularity Assumptions

1. $\partial/{\partial \theta}log f $ exist for all $x in |\theta - \theta_{°}|< \delta.$ means $\frac{\partial f}{\partial \theta}$,

$\frac{\partial^2 f}{\partial \theta^2}$, $\frac{\partial^3 f}{\partial \theta^3}$ also exist.

2. $E(\partial/{\partial \theta}log f(x,\theta))_{\theta = \theta_{°})=0$ and it's square > 0.

3. $\frac{\partial^3 log f}{\partial \theta^3}$ is Continuous in $\theta$. The quantity $\int f(x,\theta) dx =1$ can be differentiated twice under the integral sign with respect to $\theta$, so that

$I(\theta) = \mathbb{E} \left[ \left( \frac{\partial \log f(X; \theta)}{\partial \theta} \right)^2 \right]=0$

Under these assumptions we will have large samples results.

MLE is the excellent estimator because it is consistent, efficient, sometimes may be unbaised and a function of sufficient statistics. That why MLE is the most widely used parametric estimation technique.

Report Abuse

Labels

Maximum Likelihood Estimator for Log Normal Distribution

Maximum likelihood estimation for Pareto observations

Data Deep Dive

Methods of Estimation in Statistical Inference

Method of Maximum Likelihood Estimator

Maximum Likelihood Estimation - Normal Distribution

Terms related to M.L.E

1. Likelihood Function

Properties of MLEs

1. Non uniqueness of MLE

2. MLE need not be in a nice analytic form

3. MLE may not be in a closed form

4. Invariance property of MLE

5. Asymptotic Properties of MLE

Post a Comment

Literature in Testing of Hypotheses: Concepts, Theory, and Applications

Cramer Rao Inequality - statsclick

Maximum Likelihood Estimator in exponential distribution

Exploring the world of artificial intelligence robot

Analysis of Variance - Statsclick

Data Deep Dive