Hey there! Ready to crunch some numbers? Let’s dive into the world of statistics! Feel Free to reach out any time Contact Us

Interval Estimation and Confidence Limits - statsclick

Confidence Interval Estimation, Behren Fisher Problem, Confidence Intervals in chi square distribution, normal distribution

 Introduction 

Point Estimate gives only a single value which may be close to the true value. However, it may not be equal to the true continuous distribution, the point probabilities are zero. Under these and other practical considerations, it sometimes reasonable to report and interval as a function of sample observations by combining the estimated value and it's standard error with a high probability or credibility that it will contain the true value of $\theta$. This interval is formally known as confidence interval and the probability is called confid level.
The shortest the interval the better is the precision or credibility that it captures the true value of the parameter. 
We are not interested here in estimating $\mu$ by$\bar{x}$, when $X_{i} \sim N(\mu, 1)$ but by the interval estimator $\left( \bar{x}-1, \bar{x}+1 \right)$.


Basic Notations 

Random Set

Let the data $\underline{x} \sim f(\underline{x} ; \theta)$. The family of subsets $S(x)$ of $\Theta$ is called a family of random sets, where $S(\underline{x})$ is a function of data $X_{1}, X_{2}, X_{3}.........X_{n}$ and not of $\theta$.

Random Interval 

If $\Theta \in \mathbb{R}, S(\underline{X} = \left(\theta(\underline{x}), \overline{\theta}(x) \right)$ is called the family of random Intervals where $\theta(\underline{x})$ and $\overline{\theta}(x)$ are the lower and upper bounds of $\theta$ respectively.

Family of $(1-\alpha)$ - level confidence interval

A family of random sets $S(X)$ of $\Theta \in \mathbb{R^{k}}$ for the parameter $\theta$ is called the family of $(1-\alpha)$ level confidence sets. 

If 

`P \left(S(\underline(X)) \exists \underline{\theta} \right) \geq 1-\alpha` 

Above condition states that the confidence set $S(X)$ covers the unknown \theta with high probability not less than $(1-\alpha)$

Level of Significance and Confidence Coefficient 

If $\inf_{\theta \in \Theta} P \left( S(X) \exists \theta \right) \geq 1-\alpha$, the family of such confidence sets $S(X)$ is called a family of Confidence sets for $\theta$ at level of Significance $(1-\alpha)$ and the quantity on the left hand side is the confidence coefficient of $S(x)$. The confidence coefficient indicates the highest possible level of Significance $S(X)$. 

The confidence set when $\Theta \in \mathbb{R}$, $S(X) = (-\infty , \overline{\theta}(x)$ is called $Upper\, confidence\, bound$ ; and the $S(X) = (\underline{\theta}(x), \infty)$ is called $lower\, confidence\, bound$; and $S(X) = \left( \underline{\theta}(x), \overline{\theta}(x) \right)$

Expected length of Interval

Given an interval $S(X) = \left( \underline{\theta}(x), \overline{\theta}(x) \right)$ of the parameter $\theta$, the quantity 
`E \left[ S(X) = \left( \underline{\theta}(x), \overline{\theta}(x) \right) \right]`
is called it's expected length.

Construction of Confidence Intervals through Pivot

Pivot

A function of dtaa $X_{1}, X_{2}, X_{3}.........X_{n}$ and $\theta , T(X,\theta)$ is called a pivotal quantity if it's distribution does not depend on $\theta$ whenever X is distributed as $F_{\theta}, \theta \in \Theta$
Note that pivot is not an ancillary statistic it depends on $\theta$.

We given some pivotal quantity based on different distributions.

Distribution Parameter Condition Pivotal Quantity \(Q\) Distribution of \(Q\)
Normal \(N(\mu, \sigma^2)\) Mean \(\mu\) \(\sigma^2\) Known \(Z = \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma}\) \(N(0, 1)\)
Normal \(N(\mu, \sigma^2)\) Mean \(\mu\) \(\sigma^2\) Unknown \(T = \frac{\sqrt{n}(\bar{X} - \mu)}{S}\) \(t_{n-1}\)
Normal \(N(\mu, \sigma^2)\) Variance \(\sigma^2\) \(\mu\) Unknown \(\chi^2 = \frac{(n-1)S^2}{\sigma^2}\) \(\chi^2_{n-1}\)
2-Sample Normal Diff \(\mu_1 - \mu_2\) \(\sigma\) Known \(Z = \frac{(\bar{X} - \bar{Y}) - \Delta_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\) \(N(0, 1)\)
2-Sample Normal Diff \(\mu_1 - \mu_2\) \(\sigma_1=\sigma_2\) Unknown \(T = \frac{(\bar{X} - \bar{Y}) - \Delta_0}{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\) \(t_{n_1+n_2-2}\)
2-Sample Normal Ratio \(\frac{\sigma_1^2}{\sigma_2^2}\) Means Unknown \(F = \frac{S_1^2}{S_2^2}\) \(F_{n_1-1, n_2-1}\)
Exponential \(Exp(\lambda)\) Rate \(\lambda\) -- \(Q = 2\lambda \sum X_i\) \(\chi^2_{2n}\)
Uniform \(U(0, \theta)\) Scale \(\theta\) Standard \(Q = \frac{X_{(n)}}{\theta}\) \(Beta(n, 1)\)
Location Family Loc \(\theta\) \(f(x-\theta)\) \(Q = X_{(n)} - \theta\) Parameter Free
Scale Family Scale \(\theta\) \(\frac{1}{\theta}f(\frac{x}{\theta})\) \(Q = \frac{X_{(n)}}{\theta}\) Parameter Free

Confidence interval for $\mu$ in a Normal Population

Consider a random sample $X_{1}, X_{2}, X_{3}.........X_{n}$ drawn from $N(\mu, \sigma^{2})$, where $\sigma^2$ is known. The quantity 
`T(X,\mu) = \frac{\sqrt{n}(\overline{x}-\mu}{\sigma} \sim N(0,1)`
is a pivotal quantity. Using this 

1. Confidence Interval for \(\mu\) (When \(\sigma^2\) is Known)

Consider a random sample \(X_1, X_2, \dots, X_n\) drawn from a normal population \(N(\mu, \sigma^2)\), where the variance \(\sigma^2\) is known.

The pivotal quantity used is:

\[ T(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \sim N(0, 1) \]

Using the property of the Standard Normal distribution:

\[ P\left(-z_{\alpha/2} \le \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \le z_{\alpha/2}\right) = 1 - \alpha \]

On rearranging the terms to isolate \(\mu\), we get:

\[ P\left( \bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right) = 1 - \alpha \] Result: The \((1-\alpha)\)-level confidence interval for \(\mu\) is: \[ \left( \bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \quad \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right) \]
The length of this confidence interval is \( 2z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \).

Sample Size Determination:
The sample size \(n_0\) required for estimating the population mean \(\mu\) correct to within \(d\) units with probability \((1-\alpha)\) is:

\[ n_0 = \left( \frac{z_{\alpha/2} \sigma}{d} \right)^2 \]

 2. Confidence Interval for \(\mu\) (When \(\sigma^2\) is Unknown)

Case A: Small Sample Size (\(n \le 30\))

When \(\mu\) and \(\sigma^2\) are both unknown and the sample comes from a normal population, we use the Student's \(t\)-statistic. The pivotal quantity is:

\[ T(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \sim t_{n-1} \]

Using the \(t\)-distribution with \(n-1\) degrees of freedom:

\[ P\left( -t_{n-1, \alpha/2} \le \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \le t_{n-1, \alpha/2} \right) = 1 - \alpha \]

On simplification, we get the confidence interval:

\[ P\left( \bar{X} - t_{n-1, \alpha/2}\frac{S_{n-1}}{\sqrt{n}} \le \mu \le \bar{X} + t_{n-1, \alpha/2}\frac{S_{n-1}}{\sqrt{n}} \right) = 1 - \alpha \]

Case B: Large Sample Size (\(n > 30\))

If the sample size is large, the normality assumption of the population is not strictly required due to the Central Limit Theorem. The distribution of \(\sqrt{n}(\bar{X} - \mu)/\sigma\) approximates to Normal.

Even if \(\sigma\) is unknown, for large \(n\), we can replace \(\sigma\) with the sample standard deviation \(S_{n-1}\) and use the \(Z\)-distribution instead of \(t\).

The \((1-\alpha)\)-level confidence interval for \(\mu\) (Large \(n\)): \[ P\left( \bar{X} - z_{\alpha/2}\frac{S_{n-1}}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{S_{n-1}}{\sqrt{n}} \right) = 1 - \alpha \] 

3. Confidence Interval for \(\sigma^2\) in a Normal Population

Further, the \((1-\alpha)\)-level confidence interval for \(\sigma^2\) is constructed by considering the pivot:

` T(\mathbf{X}, \sigma^2) = \frac{(n-1)S_{n-1}^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \sim \chi_{n-1}^2`

The range of \(\chi^2\) is \((0, \infty)\). It is not symmetric, positively skewed with a long upper tail, and the density curve becomes symmetric for large degrees of freedom.

The probability statement is:

` P\left( \chi_{n-1, 1-\alpha/2}^2 \le \frac{(n-1)S_{n-1}^2}{\sigma^2} \le \chi_{n-1, \alpha/2}^2 \right) = 1 - \alpha `

Rearranging the terms to isolate \(\sigma^2\):

` P\left( \frac{(n-1)S_{n-1}^2}{\chi_{n-1, \alpha/2}^2} \le \sigma^2 \le \frac{(n-1)S_{n-1}^2}{\chi_{n-1, 1-\alpha/2}^2} \right) = 1 - \alpha `

Consequently, the \((1-\alpha)\)-level confidence interval for the standard deviation \(\sigma\) is:

` P\left( \sqrt{\frac{(n-1)S_{n-1}^2}{\chi_{n-1, \alpha/2}^2}} \le \sigma \le \sqrt{\frac{(n-1)S_{n-1}^2}{\chi_{n-1, 1-\alpha/2}^2}} \right) = 1 - \alpha `

4. Simultaneous Confidence Interval for \((\mu, \sigma^2)\)

Consider a random sample \(X_1, X_2, \dots, X_n\) drawn from \(N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown. The quantities:

`T_1(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \sim t_{n-1}` `T_2(\mathbf{X}, \sigma^2) = \frac{(n-1)S_{n-1}^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \sim \chi_{n-1}^2`

are considered as pivotal quantities. For events \(A\) and \(B\), Boole's inequality is given by:

`P(AB) \ge 1 - P(\bar{A}) - P(\bar{B})`

Using Boole's inequality, we can construct a \((1 - \alpha_1 - \alpha_2)\)-level simultaneous confidence interval for \((\mu, \sigma^2)\):

` P\left( -t_{\alpha_1/2} \lt \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \lt t_{\alpha_1/2}, \quad \chi_{n-1, 1-\alpha_2/2}^2 \lt \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \lt \chi_{n-1, \alpha_2/2}^2 \right) \ge 1 - P\left( \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \lt -t_{n-1, \alpha_1/2} \text{ or } \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \ge t_{n-1, \alpha_1/2} \right) - P\left( \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \le \chi_{n-1, 1-\alpha_2/2}^2 \text{ or } \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \ge \chi_{n-1, \alpha_2/2}^2 \right)`

This $(1-\alpha_{1} - \alpha_{2} )$ - level simultaneous confidence interval for $(\mu, \sigma²)$ is given by

`\left(\bar{X} - t_{n-1, \alpha_1/2} \frac{S_{n-1}}{\sqrt{n}}, \bar{X} + t_{n-1, \alpha_1/2} \frac{S_{n-1}}{\sqrt{n}}\right) \times \left(\frac{(n-1)S_{n-1}^2}{\chi^2_{n-1, \alpha_2/2}}, \frac{(n-1)S_{n-1}^2}{\chi^2_{n-1, 1-(\alpha_2/2)}}\right)`

Confidence Interval for $\mu_1 - \mu_2$ in Two Normal Populations

Consider two independent samples of sizes $n_1$ and $n_2$ from normal populations $N(\mu_1, \sigma_1^2)$ and $N(\mu_2, \sigma_2^2)$, respectively. We will discuss cases for constructing a $(1 - \alpha)$-level confidence interval for the difference of means $(\mu_1 - \mu_2)$.

Case (i): Known and Equal Variances

Assume $\sigma_1^2 = \sigma_2^2 = \sigma^2$, where $\sigma^2$ is known. Let the means of the samples be given by $\bar{X}_1$ and $\bar{X}_2$.

We consider the pivot quantity:

`T(\mathbf{X}_1, \mathbf{X}_2, \mu_1, \mu_2) = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \sim N(0, 1) `

Based on this pivot, we can construct a `(1 - \alpha)`-level confidence interval for $(\mu_1 - \mu_2)$ by choosing $a$ and $b$ such that:

`P{a < T < b} = 1 - \alpha`

Let the probability $\alpha$ be equally distributed in the tails of the distribution of $T$:

` P\{T < a\} = \frac{\alpha}{2} = P\{T > b\}`

Since the distribution of $T$ is symmetric about zero (Standard Normal):

` b = z_{\alpha/2}, \quad a = -z_{\alpha/2}` Result: Therefore, the $(1 - \alpha)$-level confidence interval for $\mu_1 - \mu_2$ is given by:
`\left( (\bar{X}_1 - \bar{X}_2) - z_{\alpha/2}\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, \quad (\bar{X}_1 - \bar{X}_2) + z_{\alpha/2}\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \right)`

Case (ii): Known but Different Variances

However, if $\sigma_1^2$ and $\sigma_2^2$ are known but different, we use the following pivot:

` T(\mathbf{X}_1, \mathbf{X}_2, \mu_1, \mu_2) = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \sim N(0, 1) ` Result: Therefore, the required $(1 - \alpha)$-level confidence interval for $\mu_1 - \mu_2$ is given by:
` \left( (\bar{X}_1 - \bar{X}_2) - z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, \quad (\bar{X}_1 - \bar{X}_2) + z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \right)`

Case (iii) When $\sigma² = \sigma^{2}_{1}=\sigma^{2}_{2}$ is not known 

The independence of $(n-1)S_{1}^{2}/\sigma² \sim \chi^{2}_{(n-1)}$ gives
`\frac{(n_{1} + n_{2} - 2)}{\sigma^{2}} S²_{p} = \frac{(n_{1} - 1)}{\sigma^{2}}S²_{1} + \frac{(n_{1} + n_{2} - 2)}{\sigma^{2}}S²_{2} \chi²_{(n_{1} + n_{2} -2 )}`

where $S_1^2$ and $S_2^2$ are the sample variances with divisors $(n_1 - 1)$ and $(n_2 - 1)$, respectively, and is called the pooled variance. The pivotal quantity is defined by

` \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{\sigma\sqrt{\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} ` `= \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{\sqrt{\frac{(n_1 - 1)S_1^2}{\sigma^2} + \frac{(n_2 - 1)S_2^2}{\sigma^2} \over (n_1 + n_2 - 2)} \sqrt{\frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{(n_1 + n_2 - 2)} \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}` `= \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{S_p \sqrt{\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \sim t_{n_1 + n_2 - 2}`

Note that the statistics in numerator and denominator are independent. Thus, the pivot follows $t$-distribution with $(n_1 + n_2 - 2)$ degrees of freedom. Based on this pivot, we can now construct $(1 - \alpha)$-level confidence interval for $(\mu_1 - \mu_2)$ by choosing $a$ and $b$ such that

`P\{a < T < b\} = 1 - \alpha`

If the probability $\alpha$ is distributed in the tails of the distribution of $T$ equally, we choose $a$ and $b$ so that

`P\{T < a\} = \frac{\alpha}{2} = P\{T > b\}`

Since the distribution of $T$ is symmetric about zero, we have

`b = t_{\alpha/2}, \ a = -t_{\alpha/2}`

Thus, the required confidence interval is given by

` P \left[ (\bar{X}_1 - \bar{X}_2) - t_{n_1 + n_2 - 2, \alpha/2} S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \le (\mu_1 - \mu_2) \right.` ` \left. \le (\bar{X}_1 - \bar{X}_2) + t_{n_1 + n_2 - 2, \alpha/2} S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \right] = 1 - \alpha`

Case (iv) : Ratio of variances in two normal populations.

Consider the pivotal quantity 

`T\left(X,\sigma^{2}_{1}, \sigma^{2}_{2}\right) = \frac{S_{1}^{2}/ \sigma_{1}^{2}}{S_{2}^{2}/ \sigma_{2}^{2}} \sim F_{n_{1} - 1, n_{2}-1}`

for constructing $(1-\alpha)$- level confidence interval for the ratio of the variances $\sigma_{1}^{2} / \sigma_{2}^{2}$, when $\mu_{1}, \mu_{2}$ are not known. 

`P\left( F_{n_1 - 1, n_2 - 1, 1 - \alpha_1} \leq \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} \leq F_{n_1 - 1, n_2 - 1, \alpha_2} \right) = 1 - \alpha_1 - \alpha_2`

`P\left( \frac{S_1^2}{S_2^2} \cdot \frac{1}{F_{n_1 - 1, n_2 - 1, \alpha_2}} \leq \frac{\sigma_1^2}{\sigma_2^2} \leq \frac{S_1^2}{S_2^2} \cdot \frac{1}{F_{n_1 - 1, n_2 - 1, 1 - \alpha_1}} \right) = 1 - \alpha_1 - \alpha_2`

So that $\alpha = \alpha_1 + \alpha_2$

Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
Site is Blocked
Sorry! This site is not available in your country.
Chat with AI