In statistical hypothesis testing, our goal is to use sample data to make decisions about population characteristics. We set up two competing claims: the null hypothesis ($H_0$), which typically represents the status quo or no effect, and the alternative hypothesis ($H_1$), which represents the effect we are looking for. However, not all statistical tests are created equal. The challenge is to find the "best" test—one that correctly rejects a false null hypothesis as often as possible. This is where the concepts of Most Powerful (MP) and Uniformly Most Powerful (UMP) tests become critical. A most powerful test that is one of size $\alpha$ that have highest power among all powerful test that exist.
Consider, for instance, a clinical trial designed to evaluate whether a new drug is more effective than the standard treatment. A poorly chosen test may fail to detect a genuine improvement, leading to the rejection of a potentially life-saving therapy.
Similarly, in quality control and manufacturing, detecting even a slight deviation from the target mean can prevent large-scale production losses. In finance, identifying abnormal market returns may help investors detect inefficiencies or fraudulent activities. In each of these examples, an optimal testing procedure—one that maximizes the power for detecting true effects—is crucial for making reliable decisions.
The concept of Most Powerful Tests provides a solution to this challenge by focusing on tests that offer the highest probability of correctly rejecting the null hypothesis for a given significance level. Extending this idea, Uniformly Most Powerful (UMP) Tests seek procedures that maintain maximal power across all possible parameter values under the alternative hypothesis.
In parametric space, when the null hypothesis is true we denote its parameter set by $ \Theta_0 $, and when the alternative is true by $ \Theta_1 $. If a hypothesis contains a single parameter value (i.e., it completely specifies the distribution), it is called a simple hypothesis.
We write the testing problem as $ H_0: \theta \in \Theta_0 $ versus $ H_1: \theta \in \Theta_1 $. When either $ \Theta_0 $ or $ \Theta_1 $ contains multiple parameter values the hypothesis is called composite. This distinction affects both the form of the most powerful tests and the methods used to construct them.
Test function: The test function $ \phi(x) $ is the probability of rejecting $H_0$ when $X=x$ is observed.
` \phi(x) = \begin{cases} 1, & \text{if } x \in \omega \\ 0, & \text{if } x \in \omega^c \end{cases} `
Error probabilities are:
` P(\text{Type I error}) = P(x \in \omega \mid H_0) = \int_\omega f_{\theta_0}(x)\,dx `
` P(\text{Type II error}) = P(x \in \omega^c \mid H_1) = \int_{\omega^c} f_{\theta_1}(x)\,dx `
The power of a test is the probability of correctly rejecting a false null hypothesis; the size (significance level) $ \alpha $ is the probability of incorrectly rejecting a true null.
` 1 - \beta = P(x \in \omega \mid H_1) = \int_\omega f_{\theta_1}(x)\,dx `
A Best Critical Region (BCR) of size $ \alpha $ maximizes power among all regions with Type I error $ \le \alpha $. For simple versus simple problems the Neyman–Pearson lemma gives a complete characterization of the BCR.
The NP lemma states that the most powerful test of size $ \alpha $ for testing $ H_0 : \theta = \theta_0 $ against $ H_1 : \theta = \theta_1 $ rejects $H_0$ when the likelihood ratio exceeds a threshold.
` \omega = \{ x : \dfrac{L(\theta_1; x)}{L(\theta_0; x)} \ge k \}, \quad k > 0 `
Let $X_1,\dots,X_n$ be a sample from $N(\mu,\sigma^2)$ with known $\sigma^2$. Test $H_0: \mu = \mu_0$ vs $H_1: \mu = \mu_1$ with $\mu_1 > \mu_0$.
The NP most powerful region is of the form ` \bar{x} \ge k_1 `. After standardizing under $H_0$ we obtain:
` \phi(x) = \begin{cases} 1, & \text{if } \bar{x} \ge \mu_0 + z_{\alpha}\dfrac{\sigma}{\sqrt{n}} \\[6pt] 0, & \text{otherwise} \end{cases} `
For the one-sided opposite alternative ($\mu_1 < \mu_0$) the direction reverses:
` \phi(x) = \begin{cases} 1, & \text{if } \bar{x} \le k_1 \\ 0, & \text{otherwise} \end{cases} `
Let $X_1,\dots,X_n$ be a sample from $N(\mu,\sigma^2)$ with known $\mu$. Test $H_0: \sigma = \sigma_0$ vs $H_1: \sigma = \sigma_1$ with $\sigma_1 > \sigma_0$.
The likelihood ratio leads to a test based on the sum of squared deviations:
` \sum_{i=1}^n (x_i - \mu)^2 \ge k_1 `
Under $H_0$ the statistic `\sum_{i=1}^n\left(\frac{X_i-\mu}{\sigma_0}\right)^2` has a $ \chi^2_n $ distribution. Choosing $k_1$ to satisfy the size condition gives:
` \phi(x) = \begin{cases} 1, & \text{if } \sum_{i=1}^{n}\left(\frac{x_i - \mu}{\sigma_0}\right)^2 \ge \chi^2_{n,\alpha} \\[6pt] 0, & \text{otherwise} \end{cases} `
Here `\chi^2_{n,\alpha}` denotes the upper $\alpha$-quantile of $ \chi^2_n $. In some texts you will see the critical constant written as $ \sigma_0^2 \chi^2_{n,\alpha} $ depending on algebraic arrangement.
| H0 | H1 | (σ² known) | (σ² unknown) | 
|---|---|---|---|
| μ = μ0 | μ ≠ μ0 | `\left|\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{\sigma}\right| > z_{\alpha/2}` | `\left|\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{s}\right| > t_{\alpha/2,n-1}` | 
| μ ≤ μ0 | μ > μ0 | `\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{\sigma} > z_{\alpha}` | `\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{s} > t_{\alpha,n-1}` | 
| μ ≥ μ0 | μ < μ0 | `\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{\sigma} < z_{1-\alpha}` | `\dfrac{\sqrt{n}(\bar{x}-\mu_0)}{s} < t_{1-\alpha,n-1}` | 
| H0 | H1 | μ known | μ unknown | 
|---|---|---|---|
| `\sigma^2 = \sigma_0^2` | `\sigma^2 \ne \sigma_0^2` | `\dfrac{\sum_{i=1}^{n}(x_i-\mu)^2}{\sigma_0^2} \le
            \chi^2_{n,1-\alpha/2}` or `\dfrac{\sum_{i=1}^{n}(x_i-\mu)^2}{\sigma_0^2} \ge \chi^2_{n,\alpha/2}` | `\dfrac{(n-1)s^2}{\sigma_0^2} \le \chi^2_{n-1,1-\alpha/2}` or `\dfrac{(n-1)s^2}{\sigma_0^2} \ge \chi^2_{n-1,\alpha/2}` | 
| `\sigma^2 \le \sigma_0^2` | `\sigma^2 > \sigma_0^2` | `\dfrac{\sum_{i=1}^{n}(x_i-\mu)^2}{\sigma_0^2} \ge \chi^2_{n,\alpha}` | `\dfrac{(n-1)s^2}{\sigma_0^2} \ge \chi^2_{n-1,\alpha}` | 
| `\sigma^2 \ge \sigma_0^2` | `\sigma^2 < \sigma_0^2` | `\dfrac{\sum_{i=1}^{n}(x_i-\mu)^2}{\sigma_0^2} \le \chi^2_{n,1-\alpha}` | `\dfrac{(n-1)s^2}{\sigma_0^2} \le \chi^2_{n-1,1-\alpha}` | 
| H0 | H1 | σ₁², σ₂² known | σ₁² = σ₂² unknown | 
|---|---|---|---|
| `\mu_1 = \mu_2` | `\mu_1 \ne \mu_2` | `\dfrac{|\bar{x}-\bar{y}|}{\sqrt{\dfrac{\sigma_1^2}{m}+\dfrac{\sigma_2^2}{n}}} > z_{\alpha/2}` | `\dfrac{|\bar{x}-\bar{y}|}{s\sqrt{\dfrac{1}{m}+\dfrac{1}{n}}} > t_{\alpha/2,m+n-2}` | 
| `\mu_1 \le \mu_2` | `\mu_1 > \mu_2` | `\dfrac{\bar{x}-\bar{y}}{\sqrt{\dfrac{\sigma_1^2}{m}+\dfrac{\sigma_2^2}{n}}} > z_{\alpha}` | `\dfrac{\bar{x}-\bar{y}}{s\sqrt{\dfrac{1}{m}+\dfrac{1}{n}}} > t_{\alpha,m+n-2}` | 
| `\mu_1 \ge \mu_2` | `\mu_1 < \mu_2` | `\dfrac{\bar{x}-\bar{y}}{\sqrt{\dfrac{\sigma_1^2}{m}+\dfrac{\sigma_2^2}{n}}} < -z_{\alpha}` | `\dfrac{\bar{x}-\bar{y}}{s\sqrt{\dfrac{1}{m}+\dfrac{1}{n}}} < -t_{\alpha,m+n-2}` |