A family of random sets $S(X)$ of $\Theta \in \mathbb{R^{k}}$ for the parameter $\theta$ is called the family of $(1-\alpha)$ level confidence sets.
If
`P \left(S(\underline(X)) \exists \underline{\theta} \right) \geq 1-\alpha`
Above condition states that the confidence set $S(X)$ covers the unknown \theta with high probability not less than $(1-\alpha)$
The confidence set when $\Theta \in \mathbb{R}$, $S(X) = (-\infty , \overline{\theta}(x)$ is called $Upper\, confidence\, bound$ ; and the $S(X) = (\underline{\theta}(x), \infty)$ is called $lower\, confidence\, bound$; and $S(X) = \left( \underline{\theta}(x), \overline{\theta}(x) \right)$
| Distribution | Parameter | Condition | Pivotal Quantity \(Q\) | Distribution of \(Q\) |
|---|---|---|---|---|
| Normal \(N(\mu, \sigma^2)\) | Mean \(\mu\) | \(\sigma^2\) Known | \(Z = \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma}\) | \(N(0, 1)\) |
| Normal \(N(\mu, \sigma^2)\) | Mean \(\mu\) | \(\sigma^2\) Unknown | \(T = \frac{\sqrt{n}(\bar{X} - \mu)}{S}\) | \(t_{n-1}\) |
| Normal \(N(\mu, \sigma^2)\) | Variance \(\sigma^2\) | \(\mu\) Unknown | \(\chi^2 = \frac{(n-1)S^2}{\sigma^2}\) | \(\chi^2_{n-1}\) |
| 2-Sample Normal | Diff \(\mu_1 - \mu_2\) | \(\sigma\) Known | \(Z = \frac{(\bar{X} - \bar{Y}) - \Delta_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\) | \(N(0, 1)\) |
| 2-Sample Normal | Diff \(\mu_1 - \mu_2\) | \(\sigma_1=\sigma_2\) Unknown | \(T = \frac{(\bar{X} - \bar{Y}) - \Delta_0}{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\) | \(t_{n_1+n_2-2}\) |
| 2-Sample Normal | Ratio \(\frac{\sigma_1^2}{\sigma_2^2}\) | Means Unknown | \(F = \frac{S_1^2}{S_2^2}\) | \(F_{n_1-1, n_2-1}\) |
| Exponential \(Exp(\lambda)\) | Rate \(\lambda\) | -- | \(Q = 2\lambda \sum X_i\) | \(\chi^2_{2n}\) |
| Uniform \(U(0, \theta)\) | Scale \(\theta\) | Standard | \(Q = \frac{X_{(n)}}{\theta}\) | \(Beta(n, 1)\) |
| Location Family | Loc \(\theta\) | \(f(x-\theta)\) | \(Q = X_{(n)} - \theta\) | Parameter Free |
| Scale Family | Scale \(\theta\) | \(\frac{1}{\theta}f(\frac{x}{\theta})\) | \(Q = \frac{X_{(n)}}{\theta}\) | Parameter Free |
Consider a random sample \(X_1, X_2, \dots, X_n\) drawn from a normal population \(N(\mu, \sigma^2)\), where the variance \(\sigma^2\) is known.
The pivotal quantity used is:
\[ T(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \sim N(0, 1) \]Using the property of the Standard Normal distribution:
\[ P\left(-z_{\alpha/2} \le \frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \le z_{\alpha/2}\right) = 1 - \alpha \]On rearranging the terms to isolate \(\mu\), we get:
\[ P\left( \bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right) = 1 - \alpha \] Result: The \((1-\alpha)\)-level confidence interval for \(\mu\) is: \[ \left( \bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \quad \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right) \]
Sample Size Determination:
The sample size \(n_0\) required for estimating the population mean \(\mu\)
correct to within \(d\) units with probability \((1-\alpha)\) is:
Case A: Small Sample Size (\(n \le 30\))
When \(\mu\) and \(\sigma^2\) are both unknown and the sample comes from a normal population, we use the Student's \(t\)-statistic. The pivotal quantity is:
\[ T(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \sim t_{n-1} \]Using the \(t\)-distribution with \(n-1\) degrees of freedom:
\[ P\left( -t_{n-1, \alpha/2} \le \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \le t_{n-1, \alpha/2} \right) = 1 - \alpha \]On simplification, we get the confidence interval:
\[ P\left( \bar{X} - t_{n-1, \alpha/2}\frac{S_{n-1}}{\sqrt{n}} \le \mu \le \bar{X} + t_{n-1, \alpha/2}\frac{S_{n-1}}{\sqrt{n}} \right) = 1 - \alpha \]Case B: Large Sample Size (\(n > 30\))
If the sample size is large, the normality assumption of the population is not strictly required due to the Central Limit Theorem. The distribution of \(\sqrt{n}(\bar{X} - \mu)/\sigma\) approximates to Normal.
Even if \(\sigma\) is unknown, for large \(n\), we can replace \(\sigma\) with the sample standard deviation \(S_{n-1}\) and use the \(Z\)-distribution instead of \(t\).
The \((1-\alpha)\)-level confidence interval for \(\mu\) (Large \(n\)): \[ P\left( \bar{X} - z_{\alpha/2}\frac{S_{n-1}}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{S_{n-1}}{\sqrt{n}} \right) = 1 - \alpha \]Further, the \((1-\alpha)\)-level confidence interval for \(\sigma^2\) is constructed by considering the pivot:
` T(\mathbf{X}, \sigma^2) = \frac{(n-1)S_{n-1}^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \sim \chi_{n-1}^2`The range of \(\chi^2\) is \((0, \infty)\). It is not symmetric, positively skewed with a long upper tail, and the density curve becomes symmetric for large degrees of freedom.
The probability statement is:
` P\left( \chi_{n-1, 1-\alpha/2}^2 \le \frac{(n-1)S_{n-1}^2}{\sigma^2} \le \chi_{n-1, \alpha/2}^2 \right) = 1 - \alpha `Rearranging the terms to isolate \(\sigma^2\):
` P\left( \frac{(n-1)S_{n-1}^2}{\chi_{n-1, \alpha/2}^2} \le \sigma^2 \le \frac{(n-1)S_{n-1}^2}{\chi_{n-1, 1-\alpha/2}^2} \right) = 1 - \alpha `Consequently, the \((1-\alpha)\)-level confidence interval for the standard deviation \(\sigma\) is:
` P\left( \sqrt{\frac{(n-1)S_{n-1}^2}{\chi_{n-1, \alpha/2}^2}} \le \sigma \le \sqrt{\frac{(n-1)S_{n-1}^2}{\chi_{n-1, 1-\alpha/2}^2}} \right) = 1 - \alpha `Consider a random sample \(X_1, X_2, \dots, X_n\) drawn from \(N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown. The quantities:
`T_1(\mathbf{X}, \mu) = \frac{\sqrt{n}(\bar{X} - \mu)}{S_{n-1}} \sim t_{n-1}` `T_2(\mathbf{X}, \sigma^2) = \frac{(n-1)S_{n-1}^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2}{\sigma^2} \sim \chi_{n-1}^2`
are considered as pivotal quantities. For events \(A\) and \(B\), Boole's inequality is given by:
`P(AB) \ge 1 - P(\bar{A}) - P(\bar{B})`Using Boole's inequality, we can construct a \((1 - \alpha_1 - \alpha_2)\)-level simultaneous confidence interval for \((\mu, \sigma^2)\):
This $(1-\alpha_{1} - \alpha_{2} )$ - level simultaneous confidence interval for $(\mu, \sigma²)$ is given by
Consider two independent samples of sizes $n_1$ and $n_2$ from normal populations $N(\mu_1, \sigma_1^2)$ and $N(\mu_2, \sigma_2^2)$, respectively. We will discuss cases for constructing a $(1 - \alpha)$-level confidence interval for the difference of means $(\mu_1 - \mu_2)$.
Assume $\sigma_1^2 = \sigma_2^2 = \sigma^2$, where $\sigma^2$ is known. Let the means of the samples be given by $\bar{X}_1$ and $\bar{X}_2$.
We consider the pivot quantity:
`T(\mathbf{X}_1, \mathbf{X}_2, \mu_1, \mu_2) = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \sim N(0, 1) `Based on this pivot, we can construct a `(1 - \alpha)`-level confidence interval for $(\mu_1 - \mu_2)$ by choosing $a$ and $b$ such that:
`P{a < T < b} = 1 - \alpha`Let the probability $\alpha$ be equally distributed in the tails of the distribution of $T$:
` P\{T < a\} = \frac{\alpha}{2} = P\{T > b\}`Since the distribution of $T$ is symmetric about zero (Standard Normal):
` b = z_{\alpha/2}, \quad a = -z_{\alpha/2}` Result: Therefore, the $(1 - \alpha)$-level confidence interval for $\mu_1 - \mu_2$ is given by:However, if $\sigma_1^2$ and $\sigma_2^2$ are known but different, we use the following pivot:
` T(\mathbf{X}_1, \mathbf{X}_2, \mu_1, \mu_2) = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \sim N(0, 1) ` Result: Therefore, the required $(1 - \alpha)$-level confidence interval for $\mu_1 - \mu_2$ is given by:where $S_1^2$ and $S_2^2$ are the sample variances with divisors $(n_1 - 1)$ and $(n_2 - 1)$, respectively, and is called the pooled variance. The pivotal quantity is defined by
` \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{\sigma\sqrt{\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} ` `= \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{\sqrt{\frac{(n_1 - 1)S_1^2}{\sigma^2} + \frac{(n_2 - 1)S_2^2}{\sigma^2} \over (n_1 + n_2 - 2)} \sqrt{\frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{(n_1 + n_2 - 2)} \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}` `= \frac{\bar{X}_1 - \bar{X}_2 - (\mu_1 - \mu_2)}{S_p \sqrt{\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \sim t_{n_1 + n_2 - 2}`Note that the statistics in numerator and denominator are independent. Thus, the pivot follows $t$-distribution with $(n_1 + n_2 - 2)$ degrees of freedom. Based on this pivot, we can now construct $(1 - \alpha)$-level confidence interval for $(\mu_1 - \mu_2)$ by choosing $a$ and $b$ such that
`P\{a < T < b\} = 1 - \alpha`If the probability $\alpha$ is distributed in the tails of the distribution of $T$ equally, we choose $a$ and $b$ so that
`P\{T < a\} = \frac{\alpha}{2} = P\{T > b\}`Since the distribution of $T$ is symmetric about zero, we have
`b = t_{\alpha/2}, \ a = -t_{\alpha/2}`Thus, the required confidence interval is given by
`T\left(X,\sigma^{2}_{1}, \sigma^{2}_{2}\right) = \frac{S_{1}^{2}/ \sigma_{1}^{2}}{S_{2}^{2}/ \sigma_{2}^{2}} \sim F_{n_{1} - 1, n_{2}-1}`
for constructing $(1-\alpha)$- level confidence interval for the ratio of the variances $\sigma_{1}^{2} / \sigma_{2}^{2}$, when $\mu_{1}, \mu_{2}$ are not known.
`P\left( F_{n_1 - 1, n_2 - 1, 1 - \alpha_1} \leq \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} \leq F_{n_1 - 1, n_2 - 1, \alpha_2} \right) = 1 - \alpha_1 - \alpha_2`
`P\left( \frac{S_1^2}{S_2^2} \cdot \frac{1}{F_{n_1 - 1, n_2 - 1, \alpha_2}} \leq \frac{\sigma_1^2}{\sigma_2^2} \leq \frac{S_1^2}{S_2^2} \cdot \frac{1}{F_{n_1 - 1, n_2 - 1, 1 - \alpha_1}} \right) = 1 - \alpha_1 - \alpha_2`
So that $\alpha = \alpha_1 + \alpha_2$