Interquartile Range

Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.

What are statistical ranges? 

Ranges are small units to analyze how data points are spread in a series, they calculate the difference between two conclusions of an observation. This topic is closely related to variance, standard deviation.

Student IDScore
15
225
337
438
542
648

Range= 42-25 = 17

What are Interquartile Ranges?

To say it in layman’s terms, interquartile range is the difference between first and last quarters in a series of numbers.

A Quartile range means a four-partition series of numbers. We first divide the series in two parts using median. And then divide the two parts in smaller halves. In doing so, we get three numbers that are at the centre of each part, and one used to divide at the start.

These points are called the first quartile Q1, The lower Quartile, second quartile Q2, Third Quartile Q3 The upper Quartile.

Formula for IQR (Inter Quartile Range)

image
IQR Formula, Inter Quartile Range formula

Series

1       2       3       4       5       6       7       8       9       10       11

Lower Quartile

1       2       3        4        5       

Upper Quartile  

7       8       9       10       11

How Do we calculate IQR?

We can calculate IQR by following formula.

IQR = Q3 – Q1

Q1 Q2 Q3

1       2       3       4       5       6       7       8       9       8       10       11

IQR = 9 – 3 = 6

That seems easy enough. But we need a proper method to find these points in a series.

There are several Methods. First common steps are to arrange the series in low to high order.

Method 1: Median Inclusion

Even Case

If the Number of items in series are even, then use two middle numbers to find the median of the series.

1       2       3       4       5       6       7       8

Q2 = (4+5)/2 = 4.5

Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.

Put the median at the last place of the lower quartile. And at the first place of the lower quartile.

Lower Quartile = 1       2       3       4

Upper Quartile = 4       5       6       7

Find median of both parts.

Median of the Lower Quartile is Q1

Q1 = (2+3)/2 = 2.5

Median of the Upper Quartile is Q3

Q3 = (5+6)/2 = 5.5

IQR = 5.5 – 2.5 = 3

Odd case

If the Number of items in the series are odd, then use the middle number to find the median of the series.

1       2       3      4       5       6       7

Q2 = 4

Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.

Put the median at the last place of the lower quartile. And at the first place of the lower quartile.

Lower Quartile = 1       2       3       4       4

Upper Quartile = 4       5       6       7       8

Find median of both parts.

Median of the Lower Quartile is Q1

Q1 = 3

Median of the Upper Quartile is Q3

Q3 = 6

IQR= 6 – 3 = 3

Method 2: Median Exclusion

Even Case

  • If the Number of items in series are even then use two middle numbers to find the median of the series.

1       2       3       4       5       6       7       8

  • Divide the series in half along the median. The first part is called the lower quartile and the second part is the upper quartile.
    • Lower Quartile = 1       2       3       4 
    • Upper Quartile=    5       6       7       8
  • Find median Of the both parts
    • Median of the Lower Quartile is Q1
      • Q1 = (2+3)/2 = 2.5
    • Median of the Upper Quartile is Q3
      • Q3 = (6+7)/2 = 6.5

IQR = 6.5 – 2.5 = 4.0

Odd Case

  • Otherwise, if the number of items in series are odd use the middle number as median.

1       2       3      4       5       6       7

  • Divide the series in half along the median. The first part is called the lower quartile and the second part is the upper quartile.
    • Lower Quartile = 1       2       3    
    • Upper Quartile=    5       6       7
  • Find median of both parts. 
    • Median of the Lower Quartile is Q1
      • Q1=2
    • Median of the Upper Quartile is Q3
      • Q3=6

IQR = Q3 – Q1 = 6 – 2 = 4

If you haven’t noticed, The median and IQR are the same. Because we used the data with evenly spread. The difference between each consecutive number is 1.

The procedure for determining the median of any series is the same.

  1. If the series is odd numbered, then the middle number is median.
  2. If the series is even, then the average of two numbers at the middle is the median.

These are the most basic methods of median inclusion and exclusion.

Let us see some examples regarding these methods with data.

Method 1 Examples

  • Let’s go over the general process of finding quartiles first.

11       4       6       7       1       2       3       5       10       8       9

  • First sort the series in low

1       2       3       4       5       6       7       8       9       10       11

  • Divides the series in half.

Lower Quartile:  1       2       3        4         5

6

Upper Quartile:  7       8      9       10       11

If the number of items in a series is odd, you may decide to remove the median, which is at the centre of the series.

After completing the general process, let’s start the different methods.

Method 1 example

                     Alternatively, we can include the series at the end of the first half and the beginning of the second half.

1       2       3        4         5         6

6      7      8        9       10       11 

Q1= (3+4)/2 = 3.5

Q3= (8+9)/2 = 8.5
IQR=Q3 – Q1 =   8.5 – 3.5 = 5.0

Method 2 example

1       2       3        4         5       7       8       9       10       11

Mean = (5 + 7) / 2 = 6

Median 6 exclusion

1       2       3        4         5    

7       8       9       10       11

Q1= 9

Q3= 6

IQR = Q3 – Q1 = 9 – 3 = 6

Method 3

Method 3 has a general approach. If a series is X0………….Xk .

  1. The first step of this part is similar to method 1 and method 2.

You have to find if the number of items are  (4n+1)=k or (4n+3)=k

By (4n+1)=k and (4n+3)=k the number of items must fit the (4n+1)=k or (4n+3)=k and n being the natural number.

if you have a series with 9 items then

(4n+1)=k -> (4n+1)=9 -> 4n = 9 – 1 -> n = 8/4 -> n=2

(4n+3)=k -> (4n+3)=11 -> 4n = 11-3 -> n = 8/4 -> n=2

so when we get n=2

If you have, (4n+1)=k  

K is the number of items in the series.

  1. Q1 is ( 0.25 × Xn ) + ( 0.75 × Xn+1)
  2.  Q3 is (0.75 × X3n+1)+(0.25 × X3n+2)

If you have, (4n+3)=k

K is the number of items in the series.

  1. Q1 is (0.75 × Xn+1)+(0.25 × Xn+2)
  2. Q3 is (0.25 × X3n+2) + (0.75 × X3n+3)

note:

In the above explanation, we are supposed to take 25% and 75% of a number to simplify it, so we are using 0.25 and 0.75, which will provide the same function.

More simply, we could use 4 and 3 to divide the numbers, but that will not drive in the point of taking a percentage of the numbers.

Method 3 Example

Let’s use example with 4n+1 number of items in series.

15       36       39       40       41       42       43       47       49

Let’s solve this problem with the following example.

Step 1

k=9 

4n+1 = (4 × 2) + 1 = 9

n = 2

Step 2

  • Q1 is ( 0.25 × Xn ) + ( 0.75 × Xn+1 )
    • ( 0.25 × X2 ) + ( 0.75× X3 )
    • Q1=( 0.25 × X2 ) + ( 0.75 × X3 )
    • Q1=( 0.25 × 36 ) + ( 0.75 × 39 )
    • Q1=38
  • Q3 is ( 0.75 × X3n+1 ) + ( 0.25 × X3n+2 )
    • Q3 = ( 0.75 × X7 ) + ( 0.25 × X8 )
    • Q3 = ( 0.75 × 43 ) + ( 0.25 × 47 )
    • Q3 = 44

final step

IQR = Q3-Q1 = 44 – 38 = 6

Let’s use example with 4n+3=k number of items.

6       17       17       19       22       22       26       31       35       35       38

We will solve this problem with following the above mentioned method.

Step 1

k=11 

4n+3=11

n=2

Step 2 : calculate Q1
  • Q1 is ( 0.75 × Xn+1 ) + ( 0.25 × Xn+2 )
    • Q1 = ( 0.25 × X3 ) + ( 0.75 × X4 )
    • Q1 = ( 0.25 × 17 ) + ( 0.75 × 19 )
    • Q1 = 18.50
Step3 : calculate Q3
  • Q3 is ( 0.25 × X3n+2 ) + ( 0.75 × X3n+3 )
    • Q3 = ( 0.75 × X8 ) + ( 0.25 × X9 )
    • Q3 = Q3=( 0.75 × 35 ) + ( 0.25 × 35 )
    • Q3 = 35

IQR = Q3 – Q1 = 35 –18.5 = 17.5

Method 4

  • The quantiles Q1, Q2, Q3 are so-called empirical quantiles. So, we calculate empirical quantiles Q1(25%), Q2(50%), Q3(75%)

The following formula is used to calculate any quantile from a series X1,X2,X3…….Xn.

Qn (p)= xk + α ( x(k+1) + xk )

  • Two values are required to be calculated before applying this formula k and α.
  • Formulae for them are. 

k =integer(  p × ( n + 1 ) )

α = p × ( n + 1 ) − k

P denotes the percentage, e.g. Q1(25%)

Method 4 Example

Series

6       17       17       19       22       22       26       31       35       35       38

n=11

Calculate quartile 1

Q1( 0.25 ) = Xk + α (X(k+1) + Xk )

k =integer(  p(n + 1 ) ) = integer( 0.25(11+1) )=3

α = p (n + 1) − k = 0.25 ( 11+1 ) – 3 = 0

We have k = 6, α = 0

Q2 ( 0.50 ) = X6 + 0( X(6+1) + X6 ) = 22 + 0 × ( 26 + 22 ) = 22

Q2( 0.50 ) = 22

Calculate quartile 3

Q3 ( 0.75 ) = Xk + α ( X(k+1) + Xk )

we need k and α

k =integer (  p*( n + 1 ) ) = integer ( 0.75 × ( 11+1 ) ) = 9

α = p(n + 1) − k = 0.75 × ( 11 + 1 ) – 9 = 0

We have k=9 , α = 0

Q3 ( 0.75 ) = X9 + 0 × ( X(9+1) + X9 ) = 35 + 0 ( 35 + 35 ) = 35  

Q3 ( 0.75 ) = 35

We have Q3 = 35, Q1 = 17

IQR = Q3 − Q1 = 35 – 17 = 18

IQR = 18

Application:

IQR can be used to analyse the spread of data around it’s mean. IF the IQR value is closer to the mean of the dataset, then the dataset is more densely populated around the mean. If the IQR value much higher than mean, then we can conclude the dataset is more widespread.

Identifying Outliers

Since we know multiple methods to find IQR, let’s also learn how to use IQR in data science.

IQR can be used to find the outliers of a dataset.

We need Q1 and Q3 with IQR for it. We will create a range which encompasses the data, and any point that falls out of that range will be considered an outlier.

Range = [ Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5) ]

We can use this to set a fence for normal data under this dataset.

Let’s use the dataset from method 3 for example

15       36       39       40       41       42       43       47       49

Q1 = 38, Q3 = 44, IQR = 6

Fence = [Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5)] = [38 − (6 × 1.5) , 44 + (6 × 1.5)]

Fence = [29 , 53]

According to the fence we created, it seems 15 is the outlier of this dataset. Let’s use a box plot from plotly to check the same.

The fences used in plotly box plot have different methods and calculation techniques, so they differ from our conclusions.

Conclusion :

We learned how to calculate IQR with 4 methods

Method 3 and method 4 are heavier towards formulation. But we can easily create a program in Python or any other language.

Method 1 and 2 are easier to follow and execute visually. So they are best for direct explanations and easy by hand solutions.

How useful was this post?

Click on a star to rate it!

Leave a Reply

Points You Earned

Untitled design 6
0 distinction_points
Untitled design 5
python_points 0
0 Solver points
Instagram
WhatsApp
error: Content is protected !!