Hey there! Ready to crunch some numbers? Let’s dive into the world of statistics! Feel Free to reach out any time Contact Us

Analysis of Variance - Statsclick

Analysis of variance one way or two way

Introduction 

Suppose that we wish to be more objective in our analysis of the data. Specifically, suppose that we wish to test for differences between the mean etch rates at all a 4 level of RF power. Thus, we are interested in testing the equality of all four means. It might seem that this problem could be solved by performing a t-test for all six possible pairs of means. However, this is not the best solution to this problem. First of all, performing all six pairwise t-tests is inefficient. It takes a lot of effort. Second, conducting all these pairwise comparisons inflates the type I error. Suppose that all four means are equal, so if we select 0.05, the probability of reaching the correct decision on any single comparison is 0.95. However, the probability of reaching the correct conclusion on all six comparisons is considerably less than 0.95, so the type I error is inflated. The appropriate procedure for testing the equality of several means is the analysis of variance. However, the analysis of variance has a much wider application than the problem above. It is probably the most useful technique in the field of statistical inference.

ANOVA mind map 

Introduction to ANOVA 

Definitions 

ANOVA is a statistical technique to compare means of three or more groups to determine if there are statistically difference between them.

Purpose 

It test null hypothesis that all group means are equal against the alternative hypothesis that at atleast one group is different 

Origin

Introduced by Prof. R.A. Fisher in 1920s primary for agricultural experiment.

Applications

Widely used in various fields such as agriculture industry, Psychology, education and business to analyse experimental data.

Assumptions

Independence 

Observations must be independent of each other.

Normality

The data in each group should be normally distributed.

Homogeneity Variance 

The variance of each group should be equal (homoscadesticity).

Additivity

The effect of different factors are additive (no effects in one way ANOVA)

Introduction to ANOVA (One - Way)


Treatment
(Level)
Observations Totals Averages
\( y_{i1} \) \( y_{i2} \) \(\cdots\) \( y_{in} \)
1 \( y_{11} \) \( y_{12} \) \(\cdots\) \( y_{1n} \) \( y_{1\cdot} \) \( \overline{y}_{1\cdot} \)
2 \( y_{21} \) \( y_{22} \) \(\cdots\) \( y_{2n} \) \( y_{2\cdot} \) \( \overline{y}_{2\cdot} \)
\(\vdots\) \(\vdots\) \(\vdots\) \(\cdots\) \(\vdots\) \(\vdots\) \(\vdots\)
\( a \) \( y_{a1} \) \( y_{a2} \) \(\cdots\) \( y_{an} \) \( y_{a\cdot} \) \( \overline{y}_{a\cdot} \)
Totals: \( y_{\cdot \cdot} \) \( \overline{y}_{\cdot \cdot} \)

Fixed Effect Model

The name ANOVA stems from a partitioning of the total variability in the response variable into components that are consistent with a model for the experiment.

The basic single-factor ANOVA model is:

` y_{ij} = \mu + \tau_i + \varepsilon_{ij}`

where:

  • $y_{ij}$: General observation
  • $\mu$: Grand mean
  • $\tau_i$: Fixed effect factor
  • $\varepsilon_{ij} \sim N(0, \sigma^2)$: Random error

Models for the Data

Means Model:

` y_{ij} = \mu_i + \varepsilon_{ij}`

Effect Model:

  • ` \mu_i = \mu + \tau_i = \overline{y}_{i\cdot} \quad \text{for } i=1,2,\dots,a`
  • ` y_{ij} = \mu + \tau_i + \varepsilon_{ij} \quad \text{for } j=1,2,\dots,n`

ANOVA Model

1) Fixed Effect Model

In this model, the \( i \)-th treatment mean \( \mu_i \) is broken into two components: \( \mu_i = \mu + \tau_i \). We consider \( \mu \) as the overall mean so that:

`\frac{1}{a} \sum_{i=1}^{a} \mu_i = \mu`

It implies that:

`\sum_{i=1}^{a} \tau_i = 0`

2) Random Effect Model

Here, \( \tau_i \) are random variables. Knowing about the specific treatment means is less informative; instead, the hypothesis is tested on the variability of \( \tau_i \).

Decomposition of Sum of Squares

Consider the model:

`y_{ij} = \mu + \tau_i + \varepsilon_{ij} \sim N(0,\sigma^2)`

With conditions:

  • \( \sum_{i=1}^{a} \tau_i = 0 \)
  • \( \sum_{j=1}^{n} \varepsilon_{ij} = 0 \)

The decomposition can be written as:

`y_{ij} = \overline{y}_{\cdot\cdot} + (\overline{y}_{i\cdot} - \overline{y}_{\cdot\cdot}) + (y_{ij} - \overline{y}_{i\cdot})`

Again:

`y_{ij} - \overline{y}_{\cdot\cdot} = (\overline{y}_{i\cdot} - \overline{y}_{\cdot\cdot}) + (y_{ij} - \overline{y}_{i\cdot})`

The sum of squares can be decomposed as:

`\sum_{i=1}^{a} \sum_{j=1}^{n} (y_{ij} - \overline{y}_{\cdot\cdot})^2 = \sum_{i=1}^{a} \sum_{j=1}^{n} \left[(\overline{y}_{i\cdot} - \overline{y}_{\cdot\cdot}) + (y_{ij} - \overline{y}_{i\cdot})\right]^2`

Total Variability (TSS)

Total variability is measured by the Total Sum of Squares (TSS):

`SS_T = \sum_{i=1}^{a} \sum_{j=1}^{n} (y_{ij} - \overline{y}_{\cdot\cdot})^2`

The partitioning of TSS in ANOVA is:

`SS_T = SS_{\text{Treatment}} + SS_E`

Under the null hypothesis $H_0: \mu_1 = \mu_2 = \cdots = \mu_a$, the test statistics follow:

  • $\dfrac{SS_T}{\sigma^2} \sim \chi^2_{a(n-1)}$
  • $\dfrac{SS_{\text{Treatment}}}{\sigma^2} \sim \chi^2_{a-1}$
  • $\dfrac{SS_E}{\sigma^2} \sim \chi^2_{a(n-1)}$

Hypotheses

  • $H_0: \mu_1 = \mu_2 = \cdots = \mu_a$
  • $H_1: \text{At least one } \mu_i \neq \mu$

A large value of $SS_{\text{Treatment}}$ reflects significant differences among treatment means, whereas a small value suggests no substantial difference.

Example

Consider the observed etch rate data:

RF Power (W) 1 2 3 4 5 $y_{i\cdot}$ $\overline{y}_{i\cdot}$
160 575 542 530 539 570 2756 551.2
180 605 593 590 579 610 2937 587.4
200 665 613 610 637 629 3127 625.4
220 725 700 715 685 710 3355 671.0

Grand total: $y_{\cdot\cdot} = 13,355$   |   Grand mean: $\overline{y}_{\cdot\cdot} = 617.75$

Sum of Squares Calculations

Error sum of squares:

`SS_E = \sum_{i=1}^{a} \sum_{j=1}^{n} (y_{ij} - \overline{y}_{i\cdot})^2 = 5339.20`

Treatment sum of squares:

`SS_{\text{Treatment}} = n \sum_{i=1}^{a} (\overline{y}_{i\cdot} - \overline{y}_{\cdot\cdot})^2 = 66870.55`

Total sum of squares:

`SS_T = SS_E + SS_{\text{Treatment}} = 5339.20 + 66870.55 = 72209.75`

Degrees of Freedom (df)

Degrees of freedom relations:

`df_{\text{Total}} = df_{\text{Treatment}} + df_{\text{Error}}` `df_{\text{Total}} = an - 1 = a - 1 + a(n - 1)`

Mean Squares

Mean squares are calculated as:

`MS_{\text{Treatment}} = \dfrac{SS_{\text{Treatment}}}{a-1}` `MS_E = \dfrac{SS_E}{a(n-1)} \quad \text{(an unbiased estimator of } \sigma^2 \text{)} `

ANOVA (Analysis of Variance with one observation per cell) Two Way

Statistical Model:

`Y_{ij} = \mu + T_i + B_j + \epsilon_{ij}, \quad i=1,2,\dots,a, \quad j=1,2,\dots,b`

where:

  • $ Y_{ij} $ = Observed response for the $ i $-th treatment in the $ j $-th block.
  • $ \mu $ = Overall mean.
  • $ T_i $ = Effect of the $ i $-th treatment.
  • $ B_j $ = Effect of the $ j $-th block.
  • $ \epsilon_{ij} $ = Random error component, assumed to be normally distributed with mean 0 and variance $ \sigma^2 $.

Hypothesis:

`H_0: T_1 = T_2 = \dots = T_a = 0`
`H_1: T_i \neq 0 \quad \text{for at least one } i`

Sums of Squares Computation:

`SS_T = \sum_{i=1}^{a} \sum_{j=1}^{b} Y_{ij}^2 - \frac{Y_{..}^2}{ab}` 
`SS_B = \frac{1}{a} \sum_{j=1}^{b} Y_{\cdot j}^2 - \frac{Y_{..}^2}{ab}`
`SS_A = \frac{1}{b} \sum_{i=1}^{a} Y_{i \cdot}^2 - \frac{Y_{..}^2}{ab}`
`SS_E = SS_T - SS_A - SS_B`

Total sum of squares can be written as:

`SS_T = \sum_{i=1}^{a} \sum_{j=1}^{b} (Y_{ij} - \bar{Y}_{..})^2`

By expanding, we get:

`\sum_{i=1}^{a} \sum_{j=1}^{b} (Y_{ij} - \bar{Y}_{..})^2 = b \sum_{i=1}^{a} (\bar{Y}_{i.} - \bar{Y}_{..})^2 + a \sum_{j=1}^{b} (\bar{Y}_{.j} - \bar{Y}_{..})^2 + \sum_{i=1}^{a} \sum_{j=1}^{b} (Y_{ij} - \bar{Y}_{i.} - \bar{Y}_{.j} + \bar{Y}_{..})^2`

Partitioning the Sum of Squares:

`SS_T = SS_{\text{Treatment}} + SS_{\text{Block}} + SS_E`
`ab - 1 = (a - 1) + (b - 1) + (a-1)(b-1)`

Sum of Squares Expressions:

`SS_T = \sum_{i=1}^{a} \sum_{j=1}^{b} Y_{ij}^2 - \frac{Y_{..}^2}{N}`
`SS_{\text{Treatment}} = b \sum_{i=1}^{a} \bar{Y}_{i.}^2 - \frac{Y_{..}^2}{N}`
`SS_{\text{Block}} = a \sum_{j=1}^{b} \bar{Y}_{.j}^2 - \frac{Y_{..}^2}{N}`
`SS_E = SS_T - SS_{\text{Treatment}} - SS_{\text{Block}}`

Mean Squares Computation:

`MS_{\text{Treatment}} = \frac{SS_{\text{Treatment}}}{a-1}`
`MS_{\text{Block}} = \frac{SS_{\text{Block}}}{b-1}`
`MS_E = \frac{SS_E}{(a-1)(b-1)}`

F-Test for Treatment Effect:

`F_0 = \frac{MS_{\text{Treatment}}}{MS_E}`

Reject the null hypothesis if:

`F_0 > F_{\alpha, (a-1), (a-1)(b-1)}`

Analysis of Variance Table:

Source of Variation Sum of Squares (SS) DF Mean Square (MS) F-value
Treatment (A) $ SS_A $ $ a-1 $ $ MS_A = \frac{SS_A}{a-1} $ $ F_A = \frac{MS_A}{MS_E} $
Blocks (B) $ SS_B $ $ b-1 $ $ MS_B = \frac{SS_B}{b-1} $ $ F_B = \frac{MS_B}{MS_E} $
Error (E) $ SS_E $ $ (a-1)(b-1) $ $ MS_E = \frac{SS_E}{(a-1)(b-1)} $
Total $ SS_T $ $ ab-1 $

Understanding ANOVA (Analysis of Variance with $n$ observations per cell) One Way Table

If the treatment means are equal, the treatment and error mean squares of the model will be (theoretically) equal.

If the treatment means differ, the treatment mean square will be larger than the error mean square of the model.

ANOVA Table

Source of Variation Sum of Squares df Mean Square (M.Sq) $F_0$
Between Treatments `SS_{\text{Treatment}} = n \sum_{i=1}^{a} (\overline{y}_{i\cdot} - \overline{y}_{\cdot\cdot})^2` $a - 1$ `MS_{\text{Treatment}} = \dfrac{SS_{\text{Treatment}}}{a - 1}` `F_0 = \dfrac{MS_{\text{Treatment}}}{MSE}`
Error (Within) `SS_E = SS_T - SS_{\text{Treatment}}` $N - a$ `MSE = \dfrac{SS_E}{N - a}`
Total `SS_T = \sum_{i=1}^{a} \sum_{j=1}^{n} (y_{ij} - \overline{y}_{\cdot\cdot})^2` $N - 1$

Test Statistic and Decision Rule

The reference distribution for $F_0$ is the $F_{a-1, (N-a)}$ distribution.

Reject the null hypothesis (equal treatment means) if: `F_0 > F_{\alpha, a-1, (N-a)}`

Point and Confidence Interval Estimates

The point estimate of $\mu_i$ is given by: `\hat{\mu}_i = \overline{y}_{\cdot\cdot} + \tau_i = \overline{y}_{i\cdot}`

A $100(1 - \alpha)\%$ confidence interval for the $i$th treatment mean is:

`\overline{y}_{i\cdot} - t_{\frac{\alpha}{2}, (N-a)} \sqrt{\dfrac{MSE}{n}} < \mu_i < \overline{y}_{i\cdot} + t_{\frac{\alpha}{2}, (N-a)} \sqrt{\dfrac{MSE}{n}}`

For the difference between two treatment means $\mu_i - \mu_j$, the $100(1 - \alpha)\%$ confidence interval is:

`\overline{y}_{i\cdot} - \overline{y}_{j\cdot} - t_{\frac{\alpha}{2}, (N-a)} \sqrt{\dfrac{2MSE}{n}} < \mu_i - \mu_j < \overline{y}_{i\cdot} - \overline{y}_{j\cdot} + t_{\frac{\alpha}{2}, (N-a)} \sqrt{\dfrac{2MSE}{n}}`

Important Notes

  • If the null hypothesis is false, the expected value of $MS_{\text{Treatment}}$ is greater than $\sigma^2$.
  • Use the p-value of the $F$-test for treatment effect decision-making.
  • Sum of square between treatments $SS_{B}=Treatment_{AVG} - Grand_{AVG}$.
  • Sum of squares within or $SS_{B}= SS_{T}-SS{B}$.

Two-Factor ANOVA

The basic two-factor ANOVA model is given by:

`y_{ijk} = \mu + \tau_i + \beta_j + (\tau\beta)_{ij} + \epsilon_{ijk}`

where: - $\mu$ = overall mean - $\tau_i$ = effect of the $i$th treatment (row factor) - $\beta_j$ = effect of the $j$th column factor - $(\tau\beta)_{ij}$ = interaction effect between $\tau_i$ and $\beta_j$ - $\epsilon_{ijk}$ = experimental error, assumed to be independently and normally distributed with mean 0 and variance $\sigma^2$

Index Ranges

  • $i = 1, 2, \dots, a$ (levels of Factor A)
  • $j = 1, 2, \dots, b$ (levels of Factor B)
  • $k = 1, 2, \dots, n$ (number of replications)

Data Layout

Factor A Factor B
L1 L2 ... Lb
1 `y_{111}, y_{112}, \dots, y_{11n}` `y_{121}, y_{122}, \dots, y_{12n}` ... `y_{1b1}, y_{1b2}, \dots, y_{1bn}`
2 `y_{211}, y_{212}, \dots, y_{21n}` `y_{221}, y_{222}, \dots, y_{22n}` ... `y_{2b1}, y_{2b2}, \dots, y_{2bn}`
... ... ... ... ...
a `y_{a11}, y_{a12}, \dots, y_{a1n}` `y_{a21}, y_{a22}, \dots, y_{a2n}` ... `y_{ab1}, y_{ab2}, \dots, y_{abn}`

Mean Calculations

The model equation can be simplified to:

`y_{ijk} = \mu + M_{ij} + \epsilon_{ijk}`

Mean definitions:

- Row mean: `\overline{y}_{i..} = \dfrac{1}{bn} \sum_{j=1}^{b} \sum_{k=1}^{n} y_{ijk}` for $i = 1, 2, \dots, a$ - Column mean: `\overline{y}_{.j.} = \dfrac{1}{an} \sum_{i=1}^{a} \sum_{k=1}^{n} y_{ijk}` for $j = 1, 2, \dots, b$ - Cell mean: `\overline{y}_{ij.} = \dfrac{1}{n} \sum_{k=1}^{n} y_{ijk}` for $i=1,\dots,a$ and $j=1,\dots,b$ - Grand mean: `\overline{y}_{...} = \dfrac{1}{abn} \sum_{i=1}^{a} \sum_{j=1}^{b} \sum_{k=1}^{n} y_{ijk}`

Experimental Setup

In general: - There are two factors (row and column). - Factor A has $a$ levels (row treatments). - Factor B has $b$ levels (column treatments). - Each treatment combination has $n$ replications. - Total number of runs: `abn`.

Note: This model assumes a fixed effect case. Random effect models require different analysis approaches.

Decomposition of Total Sum of Squares

The objective is to test hypotheses concerning the row (A treatments), column (B treatments), and interaction effects.

The decomposition of the total sum of squares is given by:

`y_{ijk} = \overline{y}_{...} + (\overline{y}_{i..} - \overline{y}_{...}) + (\overline{y}_{.j.} - \overline{y}_{...}) + (\overline{y}_{ij.} - \overline{y}_{i..} - \overline{y}_{.j.} + \overline{y}_{...}) + (y_{ijk} - \overline{y}_{ij.})`
  • $\overline{y}_{...}$ = grand mean
  • $(\overline{y}_{i..} - \overline{y}_{...})$ = row (Factor A) effect
  • $(\overline{y}_{.j.} - \overline{y}_{...})$ = column (Factor B) effect
  • $(\overline{y}_{ij.} - \overline{y}_{i..} - \overline{y}_{.j.} + \overline{y}_{...})$ = interaction effect
  • $(y_{ijk} - \overline{y}_{ij.})$ = error

Total variability is measured by the Total Sum of Squares (TSS):

`SS_T = \sum_{i=1}^{a} \sum_{j=1}^{b} \sum_{k=1}^{n} (y_{ijk} - \overline{y}_{...})^2`

The total sum of squares is decomposed as:

`SS_T = SS_A + SS_B + SS_{AB} + SS_E`

Where:

  • `SS_A` = Sum of squares due to Factor A (row treatments)
  • `SS_B` = Sum of squares due to Factor B (column treatments)
  • `SS_{AB}` = Sum of squares due to interaction between A and B
  • `SS_E` = Sum of squares due to error

Degrees of Freedom

  • Total: `N - 1 = abn - 1`
  • Factor A: `a - 1`
  • Factor B: `b - 1`
  • Interaction: `(a - 1)(b - 1)`
  • Error: `ab(n - 1)`

Two-Way ANOVA Table

Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
A treatments `SS_A` `a - 1` `MS_A = \dfrac{SS_A}{a - 1}` `F_0 = \dfrac{MS_A}{MS_E}`
B treatments `SS_B` `b - 1` `MS_B = \dfrac{SS_B}{b - 1}` `F_0 = \dfrac{MS_B}{MS_E}`
Interaction (A × B) `SS_{AB}` `(a - 1)(b - 1)` `MS_{AB} = \dfrac{SS_{AB}}{(a - 1)(b - 1)}` `F_0 = \dfrac{MS_{AB}}{MS_E}`
Error `SS_E` `ab(n - 1)` `MS_E = \dfrac{SS_E}{ab(n - 1)}` -
Total `SS_T` `abn - 1` - -

Interpretation

  • Two-way ANOVA compares mean differences between groups split on two independent variables (factors).
  • The primary purpose is to understand if there is an interaction effect between the two factors on the dependent variable.

Hypothesis Testing in Two-Way ANOVA

In Two-Way ANOVA, we test three hypotheses:

  1. Hypothesis for the equality of row (Factor A) treatment effects:
    Null Hypothesis: `H_0: \tau_1 = \tau_2 = \cdots = \tau_a`
    Alternative Hypothesis: `H_1:` At least one `\tau_i \neq 0`
  2. Hypothesis for the equality of column (Factor B) treatment effects:
    Null Hypothesis: `H_0: \beta_1 = \beta_2 = \cdots = \beta_b`
    Alternative Hypothesis: `H_1:` At least one `\beta_j \neq 0`
  3. Hypothesis for the equality of interaction effects between Factor A and Factor B:
    Null Hypothesis: `H_0: (\tau\beta)_{ij} = 0`
    Alternative Hypothesis: `H_1:` At least one `(\tau\beta)_{ij} \neq 0`

Formulas for Sum of Squares

  • Total Sum of Squares (TSS): `SS_T = \sum_{i=1}^{a} \sum_{j=1}^{b} \sum_{k=1}^{n} y_{ijk}^2 - \dfrac{y_{...}^2}{abn}`
  • Sum of Squares for Factor A (Rows): `SS_A = \dfrac{1}{bn}\sum_{i=1}^{a} y_{i..}^2 - \dfrac{y_{...}^2}{abn}`
  • Sum of Squares for Factor B (Columns): `SS_B = \dfrac{1}{an}\sum_{j=1}^{b} y_{.j.}^2 - \dfrac{y_{...}^2}{abn}`
  • Sum of Squares for Interaction: `SS_{AB} = SS_{\text{subtotal}} - SS_A - SS_B`
  • Error Sum of Squares: `SS_E = SS_T - SS_A - SS_B - SS_{AB}`

Worked Example

An engineer is studying methods to improve target detection on a radar scope. The factors are:

  • Ground Clutter (A): Low, Medium, High
  • Filter Type (B): Type-1, Type-2

Data Table

Ground Clutter Filter Type
Type-1 Type-2
Low 90, 96, 108, 98 66, 84, 92, 94
Medium 102, 105, 106, 109 92, 98, 91, 95
High 114, 112, 108, 109 93, 91, 95, 83

Step-by-Step Calculations

Total Sum of Squares:

`SS_T = \sum y_{ijk}^2 - \dfrac{y_{...}^2}{abn} = 1985.33`

Sum of Squares for Ground Clutter (A):

`SS_A = \dfrac{1}{bn}\sum y_{i..}^2 - \dfrac{y_{...}^2}{abn} = 353.083`

Sum of Squares for Filter Type (B):

`SS_B = \dfrac{1}{an}\sum y_{.j.}^2 - \dfrac{y_{...}^2}{abn} = 937.5`

Sum of Squares for Interaction:

`SS_{AB} = SS_{\text{subtotal}} - SS_A - SS_B = 81.25`

Error Sum of Squares:

`SS_E = SS_T - SS_A - SS_B - SS_{AB} = 613.5`

ANOVA Summary Table

Source of Variation SS df MS F0 Decision
Ground Clutter `353.083` `2` `176.54` `5.18` Significant
Filter Type `937.5` `1` `937.5` `27.5` Significant
Interaction `81.25` `2` `40.62` `1.19` Not Significant
Error `613.5` `18` `34.08` - -
Total `1985.33` `23` - - -

Interpretation of Results

  • Ground Clutter (p < 0.05): Significant effect on detection ability.
  • Filter Type (p < 0.05): Significant effect on detection ability.
  • Interaction (p > 0.05): No significant interaction between Ground Clutter and Filter Type.

Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
Site is Blocked
Sorry! This site is not available in your country.