Interquartile Range

Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.

What are statistical ranges? 

Ranges are small units to analyze how data points are spread in a series, they calculate the difference between two conclusions of an observation. This topic is closely related to variance, standard deviation.

Student IDScore
15
225
337
438
542
648

Range= 42-25 = 17

What are Interquartile Ranges?

To say it in layman’s terms, interquartile range is the difference between first and last quarters in a series of numbers.

A Quartile range means a four-partition series of numbers. We first divide the series in two parts using median. And then divide the two parts in smaller halves. In doing so, we get three numbers that are at the centre of each part, and one used to divide at the start.

These points are called the first quartile Q1, The lower Quartile, second quartile Q2, Third Quartile Q3 The upper Quartile.

Formula for IQR (Inter Quartile Range)

image
IQR Formula, Inter Quartile Range formula

Series

1       2       3       4       5       6       7       8       9       10       11

Lower Quartile

1       2       3        4        5       

Upper Quartile  

7       8       9       10       11

How Do we calculate IQR?

We can calculate IQR by following formula.

IQR = Q3 – Q1

Q1 Q2 Q3

1       2       3       4       5       6       7       8       9       8       10       11

IQR = 9 – 3 = 6

That seems easy enough. But we need a proper method to find these points in a series.

There are several Methods. First common steps are to arrange the series in low to high order.

Method 1: Median Inclusion

Even Case

If the Number of items in series are even, then use two middle numbers to find the median of the series.

1       2       3       4       5       6       7       8

Q2 = (4+5)/2 = 4.5

Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.

Put the median at the last place of the lower quartile. And at the first place of the lower quartile.

Lower Quartile = 1       2       3       4

Upper Quartile = 4       5       6       7

Find median of both parts.

Median of the Lower Quartile is Q1

Q1 = (2+3)/2 = 2.5

Median of the Upper Quartile is Q3

Q3 = (5+6)/2 = 5.5

IQR = 5.5 – 2.5 = 3

Odd case

If the Number of items in the series are odd, then use the middle number to find the median of the series.

1       2       3      4       5       6       7

Q2 = 4

Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.

Put the median at the last place of the lower quartile. And at the first place of the lower quartile.

Lower Quartile = 1       2       3       4       4

Upper Quartile = 4       5       6       7       8

Find median of both parts.

Median of the Lower Quartile is Q1

Q1 = 3

Median of the Upper Quartile is Q3

Q3 = 6

IQR= 6 – 3 = 3

Method 2: Median Exclusion

Even Case

  • If the Number of items in series are even then use two middle numbers to find the median of the series.

1       2       3       4       5       6       7       8

  • Divide the series in half along the median. The first part is called the lower quartile and the second part is the upper quartile.
    • Lower Quartile = 1       2       3       4 
    • Upper Quartile=    5       6       7       8
  • Find median Of the both parts
    • Median of the Lower Quartile is Q1
      • Q1 = (2+3)/2 = 2.5
    • Median of the Upper Quartile is Q3
      • Q3 = (6+7)/2 = 6.5

IQR = 6.5 – 2.5 = 4.0

Odd Case

  • Otherwise, if the number of items in series are odd use the middle number as median.

1       2       3      4       5       6       7

  • Divide the series in half along the median. The first part is called the lower quartile and the second part is the upper quartile.
    • Lower Quartile = 1       2       3    
    • Upper Quartile=    5       6       7
  • Find median of both parts. 
    • Median of the Lower Quartile is Q1
      • Q1=2
    • Median of the Upper Quartile is Q3
      • Q3=6

IQR = Q3 – Q1 = 6 – 2 = 4

If you haven’t noticed, The median and IQR are the same. Because we used the data with evenly spread. The difference between each consecutive number is 1.

The procedure for determining the median of any series is the same.

  1. If the series is odd numbered, then the middle number is median.
  2. If the series is even, then the average of two numbers at the middle is the median.

These are the most basic methods of median inclusion and exclusion.

Let us see some examples regarding these methods with data.

Method 1 Examples

  • Let’s go over the general process of finding quartiles first.

11       4       6       7       1       2       3       5       10       8       9

  • First sort the series in low

1       2       3       4       5       6       7       8       9       10       11

  • Divides the series in half.

Lower Quartile:  1       2       3        4         5

6

Upper Quartile:  7       8      9       10       11

If the number of items in a series is odd, you may decide to remove the median, which is at the centre of the series.

After completing the general process, let’s start the different methods.

Method 1 example

                     Alternatively, we can include the series at the end of the first half and the beginning of the second half.

1       2       3        4         5         6

6      7      8        9       10       11 

Q1= (3+4)/2 = 3.5

Q3= (8+9)/2 = 8.5
IQR=Q3 – Q1 =   8.5 – 3.5 = 5.0

Method 2 example

1       2       3        4         5       7       8       9       10       11

Mean = (5 + 7) / 2 = 6

Median 6 exclusion

1       2       3        4         5    

7       8       9       10       11

Q1= 9

Q3= 6

IQR = Q3 – Q1 = 9 – 3 = 6

Method 3

Method 3 has a general approach. If a series is X0………….Xk .

  1. The first step of this part is similar to method 1 and method 2.

You have to find if the number of items are  (4n+1)=k or (4n+3)=k

By (4n+1)=k and (4n+3)=k the number of items must fit the (4n+1)=k or (4n+3)=k and n being the natural number.

if you have a series with 9 items then

(4n+1)=k -> (4n+1)=9 -> 4n = 9 – 1 -> n = 8/4 -> n=2

(4n+3)=k -> (4n+3)=11 -> 4n = 11-3 -> n = 8/4 -> n=2

so when we get n=2

If you have, (4n+1)=k  

K is the number of items in the series.

  1. Q1 is ( 0.25 × Xn ) + ( 0.75 × Xn+1)
  2.  Q3 is (0.75 × X3n+1)+(0.25 × X3n+2)

If you have, (4n+3)=k

K is the number of items in the series.

  1. Q1 is (0.75 × Xn+1)+(0.25 × Xn+2)
  2. Q3 is (0.25 × X3n+2) + (0.75 × X3n+3)

note:

In the above explanation, we are supposed to take 25% and 75% of a number to simplify it, so we are using 0.25 and 0.75, which will provide the same function.

More simply, we could use 4 and 3 to divide the numbers, but that will not drive in the point of taking a percentage of the numbers.

Method 3 Example

Let’s use example with 4n+1 number of items in series.

15       36       39       40       41       42       43       47       49

Let’s solve this problem with the following example.

Step 1

k=9 

4n+1 = (4 × 2) + 1 = 9

n = 2

Step 2

  • Q1 is ( 0.25 × Xn ) + ( 0.75 × Xn+1 )
    • ( 0.25 × X2 ) + ( 0.75× X3 )
    • Q1=( 0.25 × X2 ) + ( 0.75 × X3 )
    • Q1=( 0.25 × 36 ) + ( 0.75 × 39 )
    • Q1=38
  • Q3 is ( 0.75 × X3n+1 ) + ( 0.25 × X3n+2 )
    • Q3 = ( 0.75 × X7 ) + ( 0.25 × X8 )
    • Q3 = ( 0.75 × 43 ) + ( 0.25 × 47 )
    • Q3 = 44

final step

IQR = Q3-Q1 = 44 – 38 = 6

Let’s use example with 4n+3=k number of items.

6       17       17       19       22       22       26       31       35       35       38

We will solve this problem with following the above mentioned method.

Step 1

k=11 

4n+3=11

n=2

Step 2 : calculate Q1
  • Q1 is ( 0.75 × Xn+1 ) + ( 0.25 × Xn+2 )
    • Q1 = ( 0.25 × X3 ) + ( 0.75 × X4 )
    • Q1 = ( 0.25 × 17 ) + ( 0.75 × 19 )
    • Q1 = 18.50
Step3 : calculate Q3
  • Q3 is ( 0.25 × X3n+2 ) + ( 0.75 × X3n+3 )
    • Q3 = ( 0.75 × X8 ) + ( 0.25 × X9 )
    • Q3 = Q3=( 0.75 × 35 ) + ( 0.25 × 35 )
    • Q3 = 35

IQR = Q3 – Q1 = 35 –18.5 = 17.5

Method 4

  • The quantiles Q1, Q2, Q3 are so-called empirical quantiles. So, we calculate empirical quantiles Q1(25%), Q2(50%), Q3(75%)

The following formula is used to calculate any quantile from a series X1,X2,X3…….Xn.

Qn (p)= xk + α ( x(k+1) + xk )

  • Two values are required to be calculated before applying this formula k and α.
  • Formulae for them are. 

k =integer(  p × ( n + 1 ) )

α = p × ( n + 1 ) − k

P denotes the percentage, e.g. Q1(25%)

Method 4 Example

Series

6       17       17       19       22       22       26       31       35       35       38

n=11

Calculate quartile 1

Q1( 0.25 ) = Xk + α (X(k+1) + Xk )

k =integer(  p(n + 1 ) ) = integer( 0.25(11+1) )=3

α = p (n + 1) − k = 0.25 ( 11+1 ) – 3 = 0

We have k = 6, α = 0

Q2 ( 0.50 ) = X6 + 0( X(6+1) + X6 ) = 22 + 0 × ( 26 + 22 ) = 22

Q2( 0.50 ) = 22

Calculate quartile 3

Q3 ( 0.75 ) = Xk + α ( X(k+1) + Xk )

we need k and α

k =integer (  p*( n + 1 ) ) = integer ( 0.75 × ( 11+1 ) ) = 9

α = p(n + 1) − k = 0.75 × ( 11 + 1 ) – 9 = 0

We have k=9 , α = 0

Q3 ( 0.75 ) = X9 + 0 × ( X(9+1) + X9 ) = 35 + 0 ( 35 + 35 ) = 35  

Q3 ( 0.75 ) = 35

We have Q3 = 35, Q1 = 17

IQR = Q3 − Q1 = 35 – 17 = 18

IQR = 18

Application:

IQR can be used to analyse the spread of data around it’s mean. IF the IQR value is closer to the mean of the dataset, then the dataset is more densely populated around the mean. If the IQR value much higher than mean, then we can conclude the dataset is more widespread.

Identifying Outliers

Since we know multiple methods to find IQR, let’s also learn how to use IQR in data science.

IQR can be used to find the outliers of a dataset.

We need Q1 and Q3 with IQR for it. We will create a range which encompasses the data, and any point that falls out of that range will be considered an outlier.

Range = [ Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5) ]

We can use this to set a fence for normal data under this dataset.

Let’s use the dataset from method 3 for example

15       36       39       40       41       42       43       47       49

Q1 = 38, Q3 = 44, IQR = 6

Fence = [Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5)] = [38 − (6 × 1.5) , 44 + (6 × 1.5)]

Fence = [29 , 53]

According to the fence we created, it seems 15 is the outlier of this dataset. Let’s use a box plot from plotly to check the same.

The fences used in plotly box plot have different methods and calculation techniques, so they differ from our conclusions.

Conclusion :

We learned how to calculate IQR with 4 methods

Method 3 and method 4 are heavier towards formulation. But we can easily create a program in Python or any other language.

Method 1 and 2 are easier to follow and execute visually. So they are best for direct explanations and easy by hand solutions.

How useful was this post?

Click on a star to rate it!

  • ANCOVA: Analysis of Covariance with python

    ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.

  • Learn Python The Fun Way

    What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!

  • Meet the most efficient and intelligent AI assistant : NotebookLM

    Start using NotebookLM today and embark on a smarter, more efficient learning journey!

  • Break the ice

    This can be a super guide for you to start and excel in your data science career.

  • Manova Quiz

    Solve this quiz for testing Manova Basics

  • Quiz on Group By

    Test your knowledge on pandas groupby with this quiz

  • Visualization Quiz

    Observe the dataset and try to solve the Visualization quiz on it

  • Versions of ANCOVA (Analysis Of Covariance) with python

    To perform ANCOVA (Analysis of Covariance) with a dataset that includes multiple types of variables, you’ll need to ensure your dependent variable is continuous, and you can include categorical variables as factors. Below is an example using the statsmodels library in Python: Mock Dataset Let’s create a dataset with a mix of variable types: Performing…

  • Python Variables

    How useful was this post? Click on a star to rate it! Submit Rating

  • A/B Testing Quiz

    Complete the code by dragging and dropping the correct functions

  • Python Functions

    Python functions are a vital concept in programming which enables you to group and define a collection of instructions. This makes your code more organized, modular, and easier to understand and maintain. Defining a Function: In Python, you can define a function via the def keyword, followed by the function name, any parameters wrapped in parentheses,…

  • Python Indexing: A Guide for Data Science Beginners

    Mastering indexing will significantly boost your data manipulation and analysis skills, a crucial step in your data science journey.

  • Diffusion Models: Making AI Creativity

    Stable Diffusion Models: Where Art and AI Collide Artificial Intelligence meets creativity in the fascinating realm of Stable Diffusion Models. These innovative models take text descriptions and bring them to life in the form of detailed and realistic images. Let’s embark on a journey to understand the magic behind Stable Diffusion in a way that’s…

Leave a Reply

Points You Earned

Untitled design 6
0 distinction_points
Untitled design 5
python_points 0
0 Solver points
Instagram
WhatsApp
error: Content is protected !!