Statistical estimation is the cornerstone of statistical inference, enabling us to derive insights about population characteristics using sample data. The population is described by a probability density function (pdf)
$f(x;θ)$, which depends on unknown parameters θ. These parameters are often not directly observable but play a vital role in defining the population's behavior and structure.
To obtain a good such estimators following method can be used:
In practical scenarios, θ is estimated using a random sample. This process involves leveraging principles such as sufficiency, unbiasedness, and minimal variance to ensure estimators are both reliable and optimal. The goal is to minimize errors, typically in the sense of mean squared error (MSE).
This article explores key parametric estimation methods for deriving robust estimates of θ. Among these, the Method of Moments (MoM), introduced by Karl Pearson in the 18th century, stands out as one of the earliest and simplest techniques. Additionally, advanced methods like Minimum Chi-Square, Least Squares (MoLS), and Maximum Likelihood Estimation (MoML) are discussed for their applications and theoretical underpinnings.
By understanding these methods, statisticians can select the most appropriate technique for their analysis, balancing computational simplicity and statistical efficiency.
The observed sample points in parameter space maximises the likelihood function in order to achieve the global maxima.
`L(\theta) = \prod_{i=0}^{n} f(x_{i},\theta_{1},\theta_{2}.....\theta_{n},)`
This is the most general method of estimation of parameter is called the Maximum Likelihood Estimator (M.L.E.). This is earlier formulated by C.F. Gauss but the modern method of estimation is developed by Prof. R.A. Fisher with increase in advancement of its several optimal properties and he compared this various methods of estimator.This method used in the development of least squares theory.
The M.L.E. is a parameter point for the for which the observed sample is most likely or attaining the maximum value.
General the M.L.E. is a good point estimator possessing some optimal properties. The principal of maximum likelihood estimation is to finding an good an estimator for unknown parameter $\theta = (\theta_{1}, \theta_{2},....... \theta_{k})$ which maximises the likelihood function in...$\tag{(1.1)}. But the drawback in finding the M.L.E. is sometimes it is difficult to calculate the calculus of M.L.E.
The principal of maximum likelihood consists finding an estimator for the unknown parameter $\theta = (\theta_{1}, \theta_{2},......,\theta_{k})$. Say which maximises the likelihood function $L(\theta)$ for variations in parameter. We want $\hat \theta_{MLE}$. Thus if there exists a function
`\hat \theta =\hat \theta(x_{1}, x_{2}, x_{3}.........x_{n})`
of the sample values which maximises $L$ for variations in $\theta$. Then $\hat \theta$ is to be taken as an estimator of $\theta$. $\hat \theta$ is usually called maximum likelihood estimator.
`\frac{\partial L(\theta)}{\partial \theta}= \theta` & `\frac{\partial^2 L(\theta)}{\partial \theta^2} < 0`
MLE have many attractive large samples property. Thus for large samples MLE is an consistent estimator, asymptotically efficient, sometimes unbiased and under some regularity conditions it also provide a UMVUE. That's why the maximum likelihood estimation is most widely used parametric estimation technique.
MLE are like modes which maximize the pdf pmf and mode.
The most efficient estimators are necessarily MLE but the converse that MLE are the most efficient is not always true.
Obviously this method of finding the Estimator is in favour of the likelihood principal.
Example (Normal Likelihood)
Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $N(\mu , \sigma^2)$. Then the MLE of $\mu$ and $\sigma$ will be
Solution: the pdf if Normal distribution is as follows
`f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2π}}\exp^{-1/2({x-\mu}/\sigma)^2}`
Case 1 : MLE for $\mu$
The likelihood function is defined by
`L(\theta) = \prod_{i=1}^{n} f(x_{i}|\theta)`
`L(\theta) = \prod_{i=1}^{n} \left[\frac{1}{\sigma\sqrt{2\pi}} \exp\left\{-\frac{1}{2}\left(\frac{x_i-\mu}{\sigma}\right)^2\right\}\right]`
`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`
`log L(\theta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2`
`\frac{\partial}{\partial \mu} \log L(\theta) = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)`
Now putting $\frac{\partial L(\theta)}{\partial \mu}=0$
`\hat \mu_{MLE} = \frac{\sum_{i=1}^{n} x_{i}}{n} \Rightarrow \bar x`
--
Case 2 : MLE for $\sigma^2$
`L(\theta) = \prod_{i=1}^{n} \left[ \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{-\frac{(x_i - \mu)^2}{2\sigma^2}\right\} \right]`
`L(\theta) = \left(\frac{1}{\sigma\sqrt{2\pi}}\right)^n \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}`
`\Rightarrow \frac{\partial L(\sigma^2)}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2`
`\Rightarrow -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0`
`\Rightarrow -n\sigma^2 + \sum_{i=1}^{n} (x_i - \mu)^2 = 0`
`\Rightarrow \hat \sigma_{MLE}^{2} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2`
Example Gamma Distribution
Let $X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from $Gamma(a,\lambda)$
`f(x; a, \lambda)= \frac{a^{\lambda}}{{\Gamma(\lambda)}} e^{-ax}x^{\lambda-1} ; x > 0, \, a > 0, \, \lambda > 0`
Solution:
The likelihood function is
`L(a, \lambda) = \frac{a^{n\lambda}}{\Gamma(\lambda)^n} (e^{-a \sum_{i=1}^{n} x_i}) (\prod_{i=1}^{n} x_i^{\lambda - 1})`
`\log L(a, \lambda) = n\lambda \log(a) - n \log(\Gamma(\lambda)) - a \sum_{i=1}^{n} x_i + (\lambda - 1) \sum_{i=1}^{n} \log(x_i)`
`\frac{\partial \log L(a, \lambda)}{\partial a} = \frac{n\lambda}{a} - \sum_{i=1}^{n} x_i`
Now putting `\frac{\partial \log L(a, \lambda)}{\partial a} = 0`
`\frac{n\lambda}{a} = \sum_{i=1}^{n} x_i`
`\hat a_{MLE} = \frac{n\lambda}{\sum_{i=1}^{n} x_i}\Rightarrow \frac{\lambda}{{\bar x}}`
`L(\theta, x) = \prod_{i=0}^{n} f(x_{i},\theta)`
`L(\theta, x) = \prod_{i=0}^{n} \left( \frac{1}{2a} \right)`
`L(\theta, x) = \left(\frac{1}{2a}\right)^n`
We know
`X_{(1) > \theta - a = \theta < X_{(1)} + a`
`X_{(n)} < \theta + a = \theta > X_{(n)} - a`
So any value between `X_{(1)} + a` and `X_{(n)} - a` is a MLE of `\theta`.
So here MLE is not unique and given by
`\alpha \big( X_{(n)} - 1 \big) + (1 - \alpha)\big( X_{(1)} + 1 \big)`
Let $X_{1}, X_{2}, X_{3}.........X_{n} \sim N(\theta, \theta^2), \theta > 0$
`L(\theta) = \prod_{i=1}^{n}[ \frac{1}{\sqrt{2\pi \theta^2}} exp{-\frac{(x_i - \theta)^2}{2\theta^2}}]`
`L(\theta) = \frac{1}{(2\pi \theta^2)^{n/2}} exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2}`
`\log L(\theta) = \log \left( \frac{1}{(2\pi \theta^2)^{n/2}} \right) + \log \left( exp{-\frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2} \right)`
`\log L(\theta) = -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log(\theta^2) - \frac{1}{2\theta^2} \sum_{i=1}^{n} (x_i - \theta)^2`
`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta)`
`\frac{\partial \log L(\theta)}{\partial \theta} = -\frac{n}{\theta} + \frac{1}{\theta^3} \sum_{i=1}^{n} (x_i - \theta) = 0`
`-n\theta^2 + \sum_{i=1}^{n} (x_i - \theta) = 0`
`-n\theta^2 + \sum_{i=1}^{n} x_i - n\theta = 0`
`-n\theta^2 - n\theta + \sum_{i=1}^{n} x_i = 0`
`\theta^2 + \theta - \frac{1}{n} \sum_{i=1}^{n} x_i = 0`
`\theta^2 + \theta - C = 0, \quad \text{where } C = \frac{1}{n} \sum_{i=1}^{n} x_i`
`\theta = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}`
`\theta = \frac{-1 \pm \sqrt{1^2 - 4(1)\left(-\frac{1}{n} \sum_{i=1}^{n} x_i\right)}}{2(1)}`
`\theta = \frac{-1 \pm \sqrt{1 + \frac{4}{n} \sum_{i=1}^{n} x_i}}{2}`
`\theta = \frac{-1 \pm \sqrt{1 + \frac{4}{n} \sum_{i=1}^{n} x_i}}{2}`
`f(x; r, \lambda) = \frac{\lambda^r x^{r-1} e^{-\lambda x}}{\Gamma(r)}, \quad x > 0, \, r > 0, \, \lambda > 0`
`L(r, \lambda) = \prod_{i=1}^{n} \frac{\lambda^r x_i^{r-1} e^{-\lambda x_i}}{\Gamma(r)}`
`L(r, \lambda) = \frac{\lambda^{nr}}{\Gamma(r)^n} \prod_{i=1}^{n} x_i^{r-1} e^{-\lambda \sum_{i=1}^{n} x_i}`
Due to the involvement of the digamma function, the MLE for `r` does not have a closed-form solution and requires numerical optimization techniques. Various numerical methods, such as Newton-Raphson or gradient-based approaches, are employed to approximate . For this work, the parameter `r` was estimated numerically using the Scipy Optimize Minimize method in Python. It is not a nice analytic form because we can't solve di gamma function.
Zehna discovered the estimation of MLE for the function of $\theta$ but condition that function must be one to one or many to one.
If $\hat \theta$ is MLE of $\theta$ then the $g(\hat \theta)$ is the MLE of $g(\theta)$
So `\hat \theta_{MLE} = g(\hat \theta_{MLE})`