Hey there! Ready to crunch some numbers? Let’s dive into the world of statistics! Feel Free to reach out any time Contact Us

Data Sufficiency and Data Summarization - Theory of Estimation

Fisher Neymamm, Sufficiency, Data Summarization, Sufficient statistics, Factorisation theorem

The data collected on the behaviour of the parameter $\theta$ in the form of a sample $X_{1},X_{2},X_{3},...........,X_{n}$. 

  • it is not costly voluminous but also costly to maintain.
  • there remains a possibility of loss of data in storage or transit.
  • it is not sufficient to retain a voluminous.
  • the sample $X_{1}, X_{2}, X_{3},...........,X_{n}$ contains some additional information which are not relevant for estimate theta. 
that is why statistician tries to reduce the dimensionality of taken sample space by considering a summary statistic $T(x)$ to reduce the dimension of data which discards the such information which is not beneficial for us. So the sufficient estimator use such information only which is relevant for estimation.

Sufficiency 

Data sufficiency or reduction in dimension of  a statistic. A statistic is said to be sufficient estimator if for a given family distribution like : 

`F = {F_{\theta} ; \theta \in \Theta}`

if the conditional distribution of  $(X | T = t)$ is independent of $\theta$ for all $t$. So we get a estimator which will be independent of $\theta$. 

Fisher Neymann - Factorization Theorem 

if we are given with a family of distribution :

`F = (F_{\theta} \, ; \, \theta \in \Theta)`

A statistic is said to be sufficient for $F$ iff there exist non - negative function  $g(\theta, T(x))$ and function of alone $x$ say $h(x)$ and their joint density function can be expressed as 

`P_{\theta}(X =x) = g(\theta, T(x)).h(x)`

where  $g(\theta, T(x))$ depends only on parameter and along with some function of  $x$, where as $h(x)$ depends only on sample observations.

Proof of Factorization theorem 

We assume here that random variable $X$ is discrete with pmf  $p(x : \theta)$. We assume that $T=t$ is sufficient for given probability mass function(pmf).
By definition $P(X | T = t)$ will be independent for $\theta$.

$\Rightarrow P_{\theta}(X = x) = P_{\theta}{X = x , T(X) = T(x)}$

$\Rightarrow P_{\theta}{T(X) = T(x)}P{X=x | T(X)=T(x)}$

with $h(x) = P(X = x | T(X) = T(x))$ $g(\theta , T(x)) = P{T = T(x)}$  which follows equation `(1)`.

suppose equation $(1)$ holds. Then for $T = t_{o}$ we have

`\Rightarrow P_{\theta}{T = t_{o}} = \sum_{x : T(x) = t_(o)} P_{\theta}{X = x}`

Features of Sufficient Statistics

  1. If  $T$ is a sufficient statistic and $T=h(s)$ for some function of s then $s$ is also sufficient. Further if $s$ is sufficient and $h$ is one to one, then $T$  is also sufficient and it is said to be equivalent to $s$. In this case both the statistics $T`$ and $s$ carry exactly same amount of information about parameter. If h is not one to one than $T$ greater reduction of dimension of data. 
  2. If  $(T=\sum_{i=1}^{n} X_{i})$ is sufficient then $(\sum_{i=1}^{m} X_i, \sum_{i=m+1}^{n} X_i)$ is sufficient for the parameter. But here $(T=\sum_{i=1}^{n} X_{i})$ shows more reduction in data or contain less irrelevant information about parameter. If  $\left(T_{1} =\sum_{i=1}^{n} X_i\right)$ is sufficient for $\theta$, then $\left(T_{2} = a\sum_{i=1}^{n} X_{i}+b\right)$ is also sufficient for $\theta$ iff  $\left(a_{1} = 1 , b=0\right)$,
  3.  By data summarization we mean reduction in dimension of the space of a sufficient statistics.
  4. If we consider density function with observation $ X_{1}, X_{2}, X_{3}.........X_{n}$  
  5. The original data $ X_{1}, X_{2}, X_{3}.........X_{n}$  is always sufficient for the parameter. But this is a trivial statistic since it is the original data and there is not more reduction at this stage.
  6. Order statistics $ T(X)=\left(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)}\right)$  is always sufficient for $\theta$ , order statistic also have same dimension as that of given observation but it is not trivial.
  7.  if $U$  and $V$  are equivalent statistics and $U$  is sufficient for $ \theta$  then $ v$  is also sufficient for $\theta$ .
  8. If  $ T(X)$  is a sufficient statistics for $ \theta$  and $ \hat \theta$  is a maximum likelihood estimator of $ \theta$  then $\theta$  is a function of  $T$ . Moreover $ \hat \theta(x)$  is unique and sufficient then it is minimal sufficient for$ \theta$ .

Minimal Sufficent Statistics

If we have several statistics, then we will try to identifying the sufficient statistic which will provide the maximum summarization in data without loosing the irrelevant information.

In some cases the dimensions of a minimal sufficient statistics and of a sufficient statistics are both same. Like we have $Cauchy \:  Distribution$  in which the order statistics is sufficient $ T(X)=(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)})$   is minimal sufficient. In the sense of uniqueness the minima sufficient statistics is also not unique if we have minimal sufficient statistic take $ t$  and $ m = f(u)$  for some one to one function then m will also a minimal sufficient statistic.

Let we have a observation from a pdf or pmf  $ f(x ; \theta)$  and there exist a sufficient statistics $ T$  and for any two points $ X$  and $ Y$  in same observation set then the ratio $\frac{f(x  ; \theta}{f(y  ; \theta)}$  is free of $ \theta$  iff  $ X$  and $ Y$  are in the same observations.

Let take example, let $ X_{1}, X_{2}, X_{3}.........X_{n}$  be a random sample from double exponential distribution 

$ f(x  ;  \theta)  =  \frac{1}{2}exp{-|x- \theta|}  ;  -\infty < x < \infty$

The joint function of exponential distribution is given by 

`f(x ;  \theta) =  \frac{1}{2}exp{- \sum|x_{i}-theta|}`

`f(x ;  \theta) =  \frac{1}{2}exp{- \sum|x_(i)-theta|}`

by factorization theorem $ T(X)=(X_{(1)},X_{(2)},X_{(3)},.........,X_{(n)})$  is sufficient for  $ theta$ .

`\frac{f(x \ ; \ \theta)}{f(y \ ; \ \theta)} = \frac{\frac{1}{2^n} \exp\left(-\sum |x_i - \theta|\right)}{\frac{1}{2^n} \exp\left(-\sum |y_i - \theta|\right)}`

So this will be independent of $ \theta$  iff $ (\sum x_{i} = \sum y_{i})$  so $ (\sum x_{i})$   is minimal sufficient.

Consider we have observations from  $ B(1, \theta)  P(x  ;  \theta)=\theta^x (1-\theta)^{1-x},  x\in 0,1$ .

Reconstruction of the original Sample by Sufficient Statistic $T$

Reconstruction of the Original Sample by a Sufficient Statistic $T$ By the sufficiency of a statistic $T$, we mean the potential in the statistic $T = t$ towards reconstructing the original sample. Basically, one draws a random sample $Y_1, Y_2, \dots, Y_n$ from the conditional distribution of $\mathbf{X}|t$. The sample $Y_1, Y_2, \dots, Y_n$ is easily obtained with the help of a random number table or by a similar mechanism, since the conditional distribution of $\mathbf{X}|t$ is independent of $\theta$. The conditional distribution of $\mathbf{Y}|t$ is same as the conditional distribution of $\mathbf{X}|t$; therefore, their unconditional distributions are also the same. Consider a discrete variable case and assume that $T$ is sufficient for $\theta$. Given a value $t$ of $T$, we can define a conditional distribution of $\mathbf{X}|T = t$ on the restricted sample space $A_t$. We, then, generate a pseudo sample $Y_1, Y_2, \dots, Y_n$ from $A_t$ by the random number generator. We may note that the probability distribution \[ P(\mathbf{Y} = \mathbf{x} | T(\mathbf{X}) = T(\mathbf{x})) = P(\mathbf{X} = \mathbf{x} | T(\mathbf{X}) = T(\mathbf{x})) \] is defined on the set $A_{T(\mathbf{x})}$. Now, the events $\{\mathbf{Y} = \mathbf{x}\}$ and $\{\mathbf{X} = \mathbf{x}\}$ are the subsets of $\{T(\mathbf{X}) = T(\mathbf{x})\}$. It can easily be seen, now, that the unconditional distributions of $\mathbf{Y}$ and $\mathbf{X}$ are the same, i.e., \[ P_\theta(\mathbf{Y} = \mathbf{x}) = P_\theta(\mathbf{X} = \mathbf{x}) \quad \forall \mathbf{x} \text{ and } \forall \theta \] Therefore, the new sample $Y_1, Y_2, \dots, Y_n$ and the original data $X_1, X_2, \dots, X_n$ carry equal amount of probabilistic information about $\theta$ since $Y_1, Y_2, \dots, Y_n$ can be regarded as another sample from the same population from where the original data $X_1, X_2, \dots, X_n$ is drawn. Therefore, we can ``recover data" if we discard $X_1, X_2, \dots, X_n$ and retain $T$. It is in this sense we mean that $T$ is ``sufficient." Note that there may be several statistics which are sufficient in the above sense. We have noted that the construction of original sample $\mathbf{X}$ by getting observations on $\mathbf{Y}$ depends on $T$ as well as on some random mechanism. We, therefore, noticed that the estimator $T(\mathbf{Y})$ not only depends on $T$ but also on the drawing mechanism of $\mathbf{Y}$; and that the estimator of $\theta$ is a random estimator which for each fixed value of $\mathbf{X} = \mathbf{x}$ is a random variable $T(\mathbf{Y})$ with a known distribution. We denote this randomized estimation $T(\mathbf{Y})$ by $T_\mathbf{x}$. The risk of the randomized estimator $T(\mathbf{Y})$ is, therefore, given by \[ R(\theta, T) = \mathbb{E}_\theta^{\mathbf{x}} \, \mathbb{E}^{\mathbf{Y}} \, L(\theta, T_\mathbf{x}) \]

Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
Site is Blocked
Sorry! This site is not available in your country.