You only need to understand two or three concepts if you have read the ANOVA Part-1 article. We use two columns instead of one in a two-way ANOVA. This basically means there will be two categorical columns and one continuous column. We will examine the differences between the two categories, both individually and collectively. As a result, we have three null hypotheses to accept or reject. Furthermore, for each null hypothesis, there is an alternate hypothesis.

With each term we derive, we will highlight a solved example using programming and mathematical denotations.

x	20	15	21	14	5	9	16	13	6	11	10	17	18	7	19	8	22	12
y	1	1	1	1	1	1	2	2	2	2	2	2	3	3	3	3	3	3
z	a	a	a	b	b	b	a	a	a	b	b	b	a	a	a	b	b	b

This is the dataset we’ll use to demonstrate two-way ANOVA.

This dataset has

The x column in this dataset represents our continuous factor.
The categories 1, 2, and 3 are divided equally in y column.
The categories a and b in the z column are divided equally.

In total, we have 18 rows and 3 columns

Hypothesis: 1

H0 : y has no effect on x.

Ha : y has effect on x.

Hypothesis: 2

H0 : z has no effect on x.

Ha : z has effect on x.

Hypothesis: 3

H0 : y and z combined has no effect on x.

Ha : y and z combined has effect on x.

Data Preparation

Let’s begin by entering the data into a program

Python

# To handle mathematical operations
import numpy as np
# To handle data operations
import pandas as pd
# To find F value.
import scipy.stats

# Data
x=random.sample(range(0, 13), 12)
anov={'x':x,
      'y':[1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3],
      'z':['a','a','a','b','b','b','a','a','a','b','b','b','a','a','a','b','b','b']
      }
df=pd.DataFrame(anov)
print(df)

Output

    x   y  z
0   14  1  a
1    6  1  a
2   10  1  b
3    9  1  b
4   15  2  a
5    8  2  a
6    7  2  b
7   11  2  b
8   16  3  a
9   13  3  a
10  12  3  b
11   5  3  b

Python

Mean Table

Compute the means for each category combination

Python

av_tab=df.groupby(['y','z'])['x'].mean().reset_index()
print(av_tab)

Output

  y	z	  x
0	1	a	18.666667
1	1	b	9.333333
2	2	a	11.666667
3	2	b	12.666667
4	3	a	14.666667
5	3	b	14.000000

Python

Individual Means

Compute the means for each category combination

Y Mean

We should use the basic mean formula to get the mean for each category of y. In Python, we will use group-by to achieve the same result.

Python

y_mean=av_tab.groupby('y')['x'].mean().reset_index()
y_mean

Output

  y	x
0	1	14.000000
1	2	12.166667
2	3	14.333333

Python

Z Mean

This will be a procedure similar to Y Mean.

Python

z_mean=av_tab.groupby('z')['x'].mean().reset_index()
z_mean

Output

	z	x
0	a	15.0
1	b	12.0

Python

Sum Of Squares

The sum of squares is an important part of statistical analysis. The sum of squares is used to compare relationships between factors.

The sum of squares principle states that the group mean should be subtracted from the observations. And squaring it.

By doing so, we will get the variance of the observations from the mean. We can deduce relationships between data by analysing such a sum of squares.

Sum Of Squares of First Factor

To find the sum of the squares of the first factor, use We cumulate a squared series, which is created by subtracting the grand mean from all group means. This subtraction is repeated to match the number of observations with the first factor.

Python

df['ssf']=df['y']
mean_vals=dict([(i, float(y_mean[y_mean['y']==i]['x']) ) for i in df['y'].unique()])
df = df.replace({"ssf": mean_vals})
SSF=(df['ssf']-(df['x'].sum()/len(df)))**2
SSF=SSF.sum()
print(SSF)

Output

16.333333333333332

Python

Sum of Squares of Second Factor

The sum of squares for the second factor. This operation will be similar to the first factor; the only difference will be that we will use the second factor this time.

Python

df['SSS']=df['z']
mean_vals=dict([(i, float(z_mean[z_mean['z']==i]['x']) ) for i in df['z'].unique()])
df = df.replace({"SSS": mean_vals})
SSS=(df['SSS']-(df['x'].sum()/len(df)))**2
SSS=SSS.sum()
print(SSS)

Output

40.5

Python

Sum of squares within

The sum of squares within the groups. We subtract the sub means of y and z combined and subtract the length of their respective yz combos. And square each iteration to get one answer.

Python

df['yz']=df['y'].astype('str')+df['z']
submean=0
for i in df['yz'].unique():
    submean=df[df['yz']==i]['x'].sum()/len(df[df['yz']==i])
    df=df.replace({'yz':{i:submean}})
SSW=(df['x']-df['yz'])**2
SSW=SSW.sum()
print(SSW)

Output

335.33333333333337

Python

Sum of Squares Total

To calculate the total sum of squares, we subtract the grand mean from each observation, square the difference, and cumulate over the data.

Python

SST=((df['x']-df['x'].sum()/len(df))**2).sum()
print(SST)

Output

484.5

Python

Sum of Squares of Both Factors

Sum of squares of both factors can be calculated a by a different formula too. But we will use the following formula for ease of understanding and reduction of calculations.

Python

SSB=SST-SSF-SSS-SSW
print(SSB)

Output

92.33333333333331

Python

Degree of Freedom

Degree of Freedom for Y

Ny denotes the number of observations in y columns across various categories.

Python

ssfdf=len(df['y'].unique())-1
print(ssfdf)

Output

Python

Degree of Freedom for Z

N_z means the number of observations for z column categories.

Python

sssdf=len(df['z'].unique())-1
print(sssdf)

Output

Python

Degree of Freedom within

To calculate the degree of freedom for the sum of squares within, we will count the number of unique combination repeats in the data. And subtract that from the product of each category set length.

{y} is the length of the y category set.

{z} is the length of the z category set.

{zy} unique combinations of each category

Python

df['ssf']=df['y'].astype('str')+df['z']
sswdf=df['ssf'].value_counts().sum() - (len(df['y'].unique())*len(df['z'].unique()))
print(sswdf)

Output

Python

Sum Of Square Degree of Freedom Both Factors

to get d.f.B we multiply df. y and df. z

Python

ssbdf=ssfdf*sssdf
print(ssbdf)

Output

Python

Degree of Freedom for a total sum of square

you use d.f.T to find errors and mistakes in prior calculations.

D.F.T

Python

sstdf= sssdf + ssfdf + sswdf + ssbdf
print(sstdf)

Output

Python

Values Table

We can now concentrate on calculating multiple values at once. Based on current values, we will calculate more. Let’s create a table to record these values to facilitate calculation.

Python

final_table=pd.DataFrame({
    'Sum of Squares':[SSF,SSS,SSB,SSW,SST],
    'Degree of Freedom':[ssfdf,sssdf,ssbdf,sswdf,sstdf],
    'Mean Square':[np.nan for x in range(5)],
    'F score':[np.nan for x in range(5)],
    'F Value':[np.nan for x in range(5)],
    'H0':[np.nan for x in range(5)]} ,
    index=['Sum of Squares Y','Sum of Squares Z','Sum of Squares Both','Sum of Squares Within','Sum of Squares Total'])
print(final_table)

	Sum of Squares	Degree of Freedom
Sum of Squares Y	16.333333	2
Sum of Squares Z	40.500000	1
Sum of Squares Both	92.333333	2
Sum of Squares Within	335.333333	12
Sum of Squares Total	484.500000	17

Mean Square

Calculate the Mean Square of the sum of squares. To achieve the mean square, we need to divide the sum of squares by their respective degrees of freedom.

Python

final_table['Mean Square']=final_table.loc[:'Sum of Squares Within']['Sum of Squares']/final_table['Degree of Freedom']
final_table

	Sum of Squares	Degree of Freedom	Mean Square
Sum of Squares Y	16.333333	2	8.166667
Sum of Squares Z	40.500000	1	40.500000
Sum of Squares Both	92.333333	2	46.166667
Sum of Squares Within	335.333333	12	27.944444
Sum of Squares Total	484.500000	17

F Score

Determine the F score. Now we will find the F score for x, y, and both factors interactions. By dividing the mean square of sum of squares within from all other mean squares.

Python

final_table['F score']=final_table.loc[:'Sum of Squares Both']['Mean Square']/final_table.loc['Sum of Squares Within']['Mean Square']

	Sum of Squares	Degree of Freedom	Mean Square	F score
Sum of Squares Y	16.333333	2	8.166667	0.292247
Sum of Squares Z	40.500000	1	40.500000	1.449304
Sum of Squares Both	92.333333	2	46.166667	1.652087
Sum of Squares Within	335.333333	12	27.944444
Sum of Squares Total	484.500000	17

F value and Hypothesis Testing

Let’s find the f value, compare the f-score, and reject/suggest the Hypothesis. There are ways to manually find the f value from an f distribution table. But you will need to automate this process, so use the following method.

Python

for i in final_table.index[:3]:
    print(final_table['Degree of Freedom'][i])
    numerator=final_table['Degree of Freedom'][i]
    denominator=final_table.loc['Sum of Squares Within']['Degree of Freedom']
    final_table['F Value'][i]=scipy.stats.f.isf(0.05, numerator,denominator)
    if final_table['F score'][i] < final_table['F Value'][i]:
        final_table['H0'][i]=True
    else:
        final_table['H0'][i]=False
print(final_table)

	Sum of Squares	Degree of Freedom	Mean Square	F score	F Value	H0
Sum of Squares Y	16.333333	2	8.166667	0.292247	3.885294	True
Sum of Squares Z	40.500000	1	40.500000	1.449304	4.747225	True
Sum of Squares Both	92.333333	2	46.166667	1.652087	3.885294	True
Sum of Squares Within	335.333333	12	27.944444
Sum of Squares Total	484.500000	17

Because this is randomly generated data, there is a low chance that there is any relation between x, y, and z. So the conclusion seems accurate.

Z has no effect on x.

Y has no effect on X.

You can use the code above to perform a two-way ANOVA on any set of data. Given that you changed the names of categorical columns to y and z and continuous data to x,

x	20	15	21	14	5	9	16	13	6	11	10	17	18	7	19	8	22	12
y	1	1	1	1	1	1	2	2	2	2	2	2	3	3	3	3	3	3
z	a	a	a	b	b	b	a	a	a	b	b	b	a	a	a	b	b	b

x	20	15	21	14	5	9	16	13	6	11	10	17	18	7	19	8	22	12
y	1	1	1	1	1	1	2	2	2	2	2	2	3	3	3	3	3	3
z	a	a	a	b	b	b	a	a	a	b	b	b	a	a	a	b	b	b

Two-Way ANOVA

Hypothesis: 1

Hypothesis: 2

Hypothesis: 3

Data Preparation

Mean Table

Individual Means

Y Mean

Z Mean

Sum Of Squares

Sum Of Squares of First Factor

Sum of Squares of Second Factor

Sum of squares within

Sum of Squares Total

Sum of Squares of Both Factors

Degree of Freedom

Degree of Freedom for Y

Degree of Freedom for Z

Degree of Freedom within

Sum Of Square Degree of Freedom Both Factors

Degree of Freedom for a total sum of square

Values Table

Mean Square

F Score

F value and Hypothesis Testing

Leave a Reply Cancel reply

Points You Earned

x	20	15	21	14	5	9	16	13	6	11	10	17	18	7	19	8	22	12
y	1	1	1	1	1	1	2	2	2	2	2	2	3	3	3	3	3	3
z	a	a	a	b	b	b	a	a	a	b	b	b	a	a	a	b	b	b