Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.
Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.
Ranges are small units to analyze how data points are spread in a series, they calculate the difference between two conclusions of an observation. This topic is closely related to variance, standard deviation.
Student ID | Score |
---|---|
1 | 5 |
2 | 25 |
3 | 37 |
4 | 38 |
5 | 42 |
6 | 48 |
Range= 42-25 = 17
To say it in layman’s terms, interquartile range is the difference between first and last quarters in a series of numbers.
A Quartile range means a four-partition series of numbers. We first divide the series in two parts using median. And then divide the two parts in smaller halves. In doing so, we get three numbers that are at the centre of each part, and one used to divide at the start.
These points are called the first quartile Q1, The lower Quartile, second quartile Q2, Third Quartile Q3 The upper Quartile.
We can calculate IQR by following formula.
IQR = Q3 – Q1
Q1 Q2 Q3
IQR = 9 – 3 = 6
That seems easy enough. But we need a proper method to find these points in a series.
There are several Methods. First common steps are to arrange the series in low to high order.
If the Number of items in series are even, then use two middle numbers to find the median of the series.
1 2 3 4 5 6 7 8
Q2 = (4+5)/2 = 4.5
Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.
Put the median at the last place of the lower quartile. And at the first place of the lower quartile.
Lower Quartile = 1 2 3 4
Upper Quartile = 4 5 6 7
Find median of both parts.
Median of the Lower Quartile is Q1
Q1 = (2+3)/2 = 2.5
Median of the Upper Quartile is Q3
Q3 = (5+6)/2 = 5.5
IQR = 5.5 – 2.5 = 3
If the Number of items in the series are odd, then use the middle number to find the median of the series.
1 2 3 4 5 6 7
Q2 = 4
Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.
Put the median at the last place of the lower quartile. And at the first place of the lower quartile.
Lower Quartile = 1 2 3 4 4
Upper Quartile = 4 5 6 7 8
Find median of both parts.
Median of the Lower Quartile is Q1
Q1 = 3
Median of the Upper Quartile is Q3
Q3 = 6
IQR= 6 – 3 = 3
1 2 3 4 5 6 7 8
IQR = 6.5 – 2.5 = 4.0
1 2 3 4 5 6 7
IQR = Q3 – Q1 = 6 – 2 = 4
If you haven’t noticed, The median and IQR are the same. Because we used the data with evenly spread. The difference between each consecutive number is 1.
The procedure for determining the median of any series is the same.
- If the series is odd numbered, then the middle number is median.
- If the series is even, then the average of two numbers at the middle is the median.
These are the most basic methods of median inclusion and exclusion.
Let us see some examples regarding these methods with data.
Method 1 Examples
11 4 6 7 1 2 3 5 10 8 9
1 2 3 4 5 6 7 8 9 10 11
Lower Quartile: 1 2 3 4 5
6
Upper Quartile: 7 8 9 10 11
If the number of items in a series is odd, you may decide to remove the median, which is at the centre of the series.
After completing the general process, let’s start the different methods.
Alternatively, we can include the series at the end of the first half and the beginning of the second half.
1 2 3 4 5 6
6 7 8 9 10 11
Q1= (3+4)/2 = 3.5
Q3= (8+9)/2 = 8.5
IQR=Q3 – Q1 = 8.5 – 3.5 = 5.0
1 2 3 4 5 7 8 9 10 11
Mean = (5 + 7) / 2 = 6
Median 6 exclusion
1 2 3 4 5
7 8 9 10 11
Q1= 9
Q3= 6
IQR = Q3 – Q1 = 9 – 3 = 6
Method 3 has a general approach. If a series is X0………….Xk .
You have to find if the number of items are (4n+1)=k or (4n+3)=k
By (4n+1)=k and (4n+3)=k the number of items must fit the (4n+1)=k or (4n+3)=k and n being the natural number.
if you have a series with 9 items then
(4n+1)=k -> (4n+1)=9 -> 4n = 9 – 1 -> n = 8/4 -> n=2
(4n+3)=k -> (4n+3)=11 -> 4n = 11-3 -> n = 8/4 -> n=2
so when we get n=2
K is the number of items in the series.
K is the number of items in the series.
note:
In the above explanation, we are supposed to take 25% and 75% of a number to simplify it, so we are using 0.25 and 0.75, which will provide the same function.
More simply, we could use 4 and 3 to divide the numbers, but that will not drive in the point of taking a percentage of the numbers.
Let’s use example with 4n+1 number of items in series.
15 36 39 40 41 42 43 47 49
Let’s solve this problem with the following example.
k=9
4n+1 = (4 × 2) + 1 = 9
n = 2
IQR = Q3-Q1 = 44 – 38 = 6
6 17 17 19 22 22 26 31 35 35 38
We will solve this problem with following the above mentioned method.
k=11
4n+3=11
n=2
IQR = Q3 – Q1 = 35 –18.5 = 17.5
The following formula is used to calculate any quantile from a series X1,X2,X3…….Xn.
Qn (p)= xk + α ( x(k+1) + xk )
k =integer( p × ( n + 1 ) )
α = p × ( n + 1 ) − k
P denotes the percentage, e.g. Q1(25%)
6 17 17 19 22 22 26 31 35 35 38
n=11
Q1( 0.25 ) = Xk + α (X(k+1) + Xk )
k =integer( p(n + 1 ) ) = integer( 0.25(11+1) )=3
α = p (n + 1) − k = 0.25 ( 11+1 ) – 3 = 0
Q2 ( 0.50 ) = X6 + 0( X(6+1) + X6 ) = 22 + 0 × ( 26 + 22 ) = 22
Q2( 0.50 ) = 22
Q3 ( 0.75 ) = Xk + α ( X(k+1) + Xk )
k =integer ( p*( n + 1 ) ) = integer ( 0.75 × ( 11+1 ) ) = 9
α = p(n + 1) − k = 0.75 × ( 11 + 1 ) – 9 = 0
We have k=9 , α = 0
Q3 ( 0.75 ) = X9 + 0 × ( X(9+1) + X9 ) = 35 + 0 ( 35 + 35 ) = 35
Q3 ( 0.75 ) = 35
IQR = Q3 − Q1 = 35 – 17 = 18
IQR = 18
Application:
IQR can be used to analyse the spread of data around it’s mean. IF the IQR value is closer to the mean of the dataset, then the dataset is more densely populated around the mean. If the IQR value much higher than mean, then we can conclude the dataset is more widespread.
Since we know multiple methods to find IQR, let’s also learn how to use IQR in data science.
IQR can be used to find the outliers of a dataset.
We need Q1 and Q3 with IQR for it. We will create a range which encompasses the data, and any point that falls out of that range will be considered an outlier.
Range = [ Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5) ]
We can use this to set a fence for normal data under this dataset.
Let’s use the dataset from method 3 for example
15 36 39 40 41 42 43 47 49
Q1 = 38, Q3 = 44, IQR = 6
Fence = [Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5)] = [38 − (6 × 1.5) , 44 + (6 × 1.5)]
Fence = [29 , 53]
According to the fence we created, it seems 15 is the outlier of this dataset. Let’s use a box plot from plotly to check the same.
The fences used in plotly box plot have different methods and calculation techniques, so they differ from our conclusions.
Conclusion :
We learned how to calculate IQR with 4 methods
Method 3 and method 4 are heavier towards formulation. But we can easily create a program in Python or any other language.
Method 1 and 2 are easier to follow and execute visually. So they are best for direct explanations and easy by hand solutions.
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
Solve this quiz for testing Manova Basics
Test your knowledge on pandas groupby with this quiz
Observe the dataset and try to solve the Visualization quiz on it
To perform ANCOVA (Analysis of Covariance) with a dataset that includes multiple types of variables, you’ll need to ensure your dependent variable is continuous, and you can include categorical variables as factors. Below is an example using the statsmodels library in Python: Mock Dataset Let’s create a dataset with a mix of variable types: Performing…
How useful was this post? Click on a star to rate it! Submit Rating
Complete the code by dragging and dropping the correct functions
Python functions are a vital concept in programming which enables you to group and define a collection of instructions. This makes your code more organized, modular, and easier to understand and maintain. Defining a Function: In Python, you can define a function via the def keyword, followed by the function name, any parameters wrapped in parentheses,…
Mastering indexing will significantly boost your data manipulation and analysis skills, a crucial step in your data science journey.
Stable Diffusion Models: Where Art and AI Collide Artificial Intelligence meets creativity in the fascinating realm of Stable Diffusion Models. These innovative models take text descriptions and bring them to life in the form of detailed and realistic images. Let’s embark on a journey to understand the magic behind Stable Diffusion in a way that’s…
Leave a Reply
You must be logged in to post a comment.