Data Sufficiency and Data Summarization - Theory of Estimation

Maximum likelihood estimation for Pareto observations

Published by Sumit Kumar On December 03, 2024

MLE for Pareto distribution The pdf of Pareto distribution is given by: `f(x; \alpha, \lambda) = \frac{\alpha \lambda^\alpha}{x^{\alpha+1}} \quad \text{for } x \geq \lambda.` `f(x_i, \alpha, \lambda) = \frac{\alpha \lambda^\alpha}{x_i^{\alpha+1}} \cdot I(x_i \geq \lambda), \quad x_i \in \mathbb{R} ` Writing the likelihood function `L(\alpha, \lambda) = \prod_{i=1}^n f(x_i , \alpha, \lambda)` `\Rightarrow L(\alpha, \lambda) = \prod_{i=1}^n \left( \frac{\alpha \lambda^\alpha}{x_i^{\alpha+1}} \cdot I(x_(i) \geq \lambda) \right)` `= L(\alpha, \lambda) = \alpha^n \lambda^{n\alpha} \prod_{i=1}^n \frac{1}{x_i^{\alpha+1}} \cdot \prod_{i=1}^n I(x_(i)\geq \lambda)` Finding the calculus for MLE in Pareto Distribution will be easy if we optimize it by taking log both sides because it is also a increasing function. `= \log L(\alpha, \lambda) = n \log \alpha + n \alpha \log \lambda - (\alpha+1) \sum_{i=1}^n \log x_i + \sum_{i=1}^n \log I(x_i \geq \lambda)` `= \log L(\alpha, \l...

The data collected on the behaviour of the parameter $\theta$ in the form of a sample $X_{1},X_{2},X_{3},...........,X_{n}$.

it is not costly voluminous but also costly to maintain.
there remains a possibility of loss of data in storage or transit.
it is not sufficient to retain a voluminous.
the sample $X_{1}, X_{2}, X_{3},...........,X_{n}$ contains some additional information which are not relevant for estimate theta.

that is why statistician tries to reduce the dimensionality of taken sample space by considering a summary statistic $T(x)$ to reduce the dimension of data which discards the such information which is not beneficial for us. So the sufficient estimator use such information only which is relevant for estimation.

Sufficiency

Data sufficiency or reduction in dimension of a statistic. A statistic is said to be sufficient estimator if for a given family distribution like :

`F = {F_{\theta} ; \theta \in \Theta}`

if the conditional distribution of $(X | T = t)$ is independent of $\theta$ for all $t$. So we get a estimator which will be independent of $\theta$.

Fisher Neymann - Factorization Theorem

if we are given with a family of distribution :

`F = (F_{\theta} \, ; \, \theta \in \Theta)`

A statistic is said to be sufficient for $F$ iff there exist non - negative function $g(\theta, T(x))$ and function of alone $x$ say $h(x)$ and their joint density function can be expressed as

`P_{\theta}(X =x) = g(\theta, T(x)).h(x)`

where $g(\theta, T(x))$ depends only on parameter and along with some function of $x$, where as $h(x)$ depends only on sample observations.

Proof of Factorization theorem

We assume here that random variable $X$ is discrete with pmf $p(x : \theta)$. We assume that $T=t$ is sufficient for given probability mass function(pmf).

By definition $P(X | T = t)$ will be independent for $\theta$.

$\Rightarrow P_{\theta}(X = x) = P_{\theta}{X = x , T(X) = T(x)}$

$\Rightarrow P_{\theta}{T(X) = T(x)}P{X=x | T(X)=T(x)}$

with $h(x) = P(X = x | T(X) = T(x))$ $g(\theta , T(x)) = P{T = T(x)}$ which follows equation `(1)`.

suppose equation $(1)$ holds. Then for $T = t_{o}$ we have

`\Rightarrow P_{\theta}{T = t_{o}} = \sum_{x : T(x) = t_(o)} P_{\theta}{X = x}`

Features of Sufficient Statistics

If $T$ is a sufficient statistic and $T=h(s)$ for some function of s then $s$ is also sufficient. Further if $s$ is sufficient and $h$ is one to one, then $T$ is also sufficient and it is said to be equivalent to $s$. In this case both the statistics $T`$ and $s$ carry exactly same amount of information about parameter. If h is not one to one than $T$ greater reduction of dimension of data.
If $(T=\sum_{i=1}^{n} X_{i})$ is sufficient then $(\sum_{i=1}^{m} X_i, \sum_{i=m+1}^{n} X_i)$ is sufficient for the parameter. But here $(T=\sum_{i=1}^{n} X_{i})$ shows more reduction in data or contain less irrelevant information about parameter. If $\left(T_{1} =\sum_{i=1}^{n} X_i\right)$ is sufficient for $\theta$, then $\left(T_{2} = a\sum_{i=1}^{n} X_{i}+b\right)$ is also sufficient for $\theta$ iff $\left(a_{1} = 1 , b=0\right)$,
By data summarization we mean reduction in dimension of the space of a sufficient statistics.
If we consider density function with observation $ X_{1}, X_{2}, X_{3}.........X_{n}$
The original data $ X_{1}, X_{2}, X_{3}.........X_{n}$ is always sufficient for the parameter. But this is a trivial statistic since it is the original data and there is not more reduction at this stage.
Order statistics $ T(X)=\left(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)}\right)$ is always sufficient for $\theta$ , order statistic also have same dimension as that of given observation but it is not trivial.
if $U$ and $V$ are equivalent statistics and $U$ is sufficient for $ \theta$ then $ v$ is also sufficient for $\theta$ .
If $ T(X)$ is a sufficient statistics for $ \theta$ and $ \hat \theta$ is a maximum likelihood estimator of $ \theta$ then $\theta$ is a function of $T$ . Moreover $ \hat \theta(x)$ is unique and sufficient then it is minimal sufficient for$ \theta$ .

Minimal Sufficent Statistics

If we have several statistics, then we will try to identifying the sufficient statistic which will provide the maximum summarization in data without loosing the irrelevant information.

In some cases the dimensions of a minimal sufficient statistics and of a sufficient statistics are both same. Like we have $Cauchy \: Distribution$ in which the order statistics is sufficient $ T(X)=(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)})$ is minimal sufficient. In the sense of uniqueness the minima sufficient statistics is also not unique if we have minimal sufficient statistic take $ t$ and $ m = f(u)$ for some one to one function then m will also a minimal sufficient statistic.

Let we have a observation from a pdf or pmf $ f(x ; \theta)$ and there exist a sufficient statistics $ T$ and for any two points $ X$ and $ Y$ in same observation set then the ratio $\frac{f(x ; \theta}{f(y ; \theta)}$ is free of $ \theta$ iff $ X$ and $ Y$ are in the same observations.

Let take example, let $ X_{1}, X_{2}, X_{3}.........X_{n}$ be a random sample from double exponential distribution

$ f(x ; \theta) = \frac{1}{2}exp{-|x- \theta|} ; -\infty < x < \infty$

The joint function of exponential distribution is given by

`f(x ; \theta) = \frac{1}{2}exp{- \sum|x_{i}-theta|}`

`f(x ; \theta) = \frac{1}{2}exp{- \sum|x_(i)-theta|}`

by factorization theorem $ T(X)=(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)})$ is sufficient for $ theta$ .

`\frac{f(x \ ; \ \theta)}{f(y \ ; \ \theta)} = \frac{\frac{1}{2^n} \exp\left(-\sum |x_i - \theta|\right)}{\frac{1}{2^n} \exp\left(-\sum |y_i - \theta|\right)}`

So this will be independent of $ \theta$ iff $ (\sum x_{i} = \sum y_{i})$ so $ (\sum x_{i})$ is minimal sufficient.

Consider we have observations from $ B(1, \theta) P(x ; \theta)=\theta^x (1-\theta)^{1-x}, x\in 0,1$ .

Reconstruction of the original Sample by Sufficient Statistic $T$

Reconstruction of the Original Sample by a Sufficient Statistic $T$ By the sufficiency of a statistic $T$, we mean the potential in the statistic $T = t$ towards reconstructing the original sample. Basically, one draws a random sample $Y_1, Y_2, \dots, Y_n$ from the conditional distribution of $\mathbf{X}|t$. The sample $Y_1, Y_2, \dots, Y_n$ is easily obtained with the help of a random number table or by a similar mechanism, since the conditional distribution of $\mathbf{X}|t$ is independent of $\theta$. The conditional distribution of $\mathbf{Y}|t$ is same as the conditional distribution of $\mathbf{X}|t$; therefore, their unconditional distributions are also the same. Consider a discrete variable case and assume that $T$ is sufficient for $\theta$. Given a value $t$ of $T$, we can define a conditional distribution of $\mathbf{X}|T = t$ on the restricted sample space $A_t$. We, then, generate a pseudo sample $Y_1, Y_2, \dots, Y_n$ from $A_t$ by the random number generator. We may note that the probability distribution \[ P(\mathbf{Y} = \mathbf{x} | T(\mathbf{X}) = T(\mathbf{x})) = P(\mathbf{X} = \mathbf{x} | T(\mathbf{X}) = T(\mathbf{x})) \] is defined on the set $A_{T(\mathbf{x})}$. Now, the events $\{\mathbf{Y} = \mathbf{x}\}$ and $\{\mathbf{X} = \mathbf{x}\}$ are the subsets of $\{T(\mathbf{X}) = T(\mathbf{x})\}$. It can easily be seen, now, that the unconditional distributions of $\mathbf{Y}$ and $\mathbf{X}$ are the same, i.e., \[ P_\theta(\mathbf{Y} = \mathbf{x}) = P_\theta(\mathbf{X} = \mathbf{x}) \quad \forall \mathbf{x} \text{ and } \forall \theta \] Therefore, the new sample $Y_1, Y_2, \dots, Y_n$ and the original data $X_1, X_2, \dots, X_n$ carry equal amount of probabilistic information about $\theta$ since $Y_1, Y_2, \dots, Y_n$ can be regarded as another sample from the same population from where the original data $X_1, X_2, \dots, X_n$ is drawn. Therefore, we can ``recover data" if we discard $X_1, X_2, \dots, X_n$ and retain $T$. It is in this sense we mean that $T$ is ``sufficient." Note that there may be several statistics which are sufficient in the above sense. We have noted that the construction of original sample $\mathbf{X}$ by getting observations on $\mathbf{Y}$ depends on $T$ as well as on some random mechanism. We, therefore, noticed that the estimator $T(\mathbf{Y})$ not only depends on $T$ but also on the drawing mechanism of $\mathbf{Y}$; and that the estimator of $\theta$ is a random estimator which for each fixed value of $\mathbf{X} = \mathbf{x}$ is a random variable $T(\mathbf{Y})$ with a known distribution. We denote this randomized estimation $T(\mathbf{Y})$ by $T_\mathbf{x}$. The risk of the randomized estimator $T(\mathbf{Y})$ is, therefore, given by \[ R(\theta, T) = \mathbb{E}_\theta^{\mathbf{x}} \, \mathbb{E}^{\mathbf{Y}} \, L(\theta, T_\mathbf{x}) \]

Report Abuse

Labels

Maximum Likelihood Estimator for Log Normal Distribution

Methods of Estimation in Statistical Inference

Maximum likelihood estimation for Pareto observations

Data Deep Dive

Data Sufficiency and Data Summarization - Theory of Estimation

Sufficiency

Fisher Neymann - Factorization Theorem

Proof of Factorization theorem

Features of Sufficient Statistics

Reconstruction of the original Sample by Sufficient Statistic $T$

Post a Comment

Literature in Testing of Hypotheses: Concepts, Theory, and Applications

Cramer Rao Inequality - statsclick

Maximum Likelihood Estimator in exponential distribution

Exploring the world of artificial intelligence robot

Regular Exponential Family of Distributions

Data Deep Dive