Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.
Interquartile range is the difference between first and last quarters in a series of numbers. A Quartile range means a four-partition series of numbers.
Ranges are small units to analyze how data points are spread in a series, they calculate the difference between two conclusions of an observation. This topic is closely related to variance, standard deviation.
Student ID | Score |
---|---|
1 | 5 |
2 | 25 |
3 | 37 |
4 | 38 |
5 | 42 |
6 | 48 |
Range= 42-25 = 17
To say it in layman’s terms, interquartile range is the difference between first and last quarters in a series of numbers.
A Quartile range means a four-partition series of numbers. We first divide the series in two parts using median. And then divide the two parts in smaller halves. In doing so, we get three numbers that are at the centre of each part, and one used to divide at the start.
These points are called the first quartile Q1, The lower Quartile, second quartile Q2, Third Quartile Q3 The upper Quartile.
We can calculate IQR by following formula.
IQR = Q3 – Q1
Q1 Q2 Q3
IQR = 9 – 3 = 6
That seems easy enough. But we need a proper method to find these points in a series.
There are several Methods. First common steps are to arrange the series in low to high order.
If the Number of items in series are even, then use two middle numbers to find the median of the series.
1 2 3 4 5 6 7 8
Q2 = (4+5)/2 = 4.5
Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.
Put the median at the last place of the lower quartile. And at the first place of the lower quartile.
Lower Quartile = 1 2 3 4
Upper Quartile = 4 5 6 7
Find median of both parts.
Median of the Lower Quartile is Q1
Q1 = (2+3)/2 = 2.5
Median of the Upper Quartile is Q3
Q3 = (5+6)/2 = 5.5
IQR = 5.5 – 2.5 = 3
If the Number of items in the series are odd, then use the middle number to find the median of the series.
1 2 3 4 5 6 7
Q2 = 4
Divide the series in half along the median. First part is called the lower quartile and the second part is the upper quartile.
Put the median at the last place of the lower quartile. And at the first place of the lower quartile.
Lower Quartile = 1 2 3 4 4
Upper Quartile = 4 5 6 7 8
Find median of both parts.
Median of the Lower Quartile is Q1
Q1 = 3
Median of the Upper Quartile is Q3
Q3 = 6
IQR= 6 – 3 = 3
1 2 3 4 5 6 7 8
IQR = 6.5 – 2.5 = 4.0
1 2 3 4 5 6 7
IQR = Q3 – Q1 = 6 – 2 = 4
If you haven’t noticed, The median and IQR are the same. Because we used the data with evenly spread. The difference between each consecutive number is 1.
The procedure for determining the median of any series is the same.
- If the series is odd numbered, then the middle number is median.
- If the series is even, then the average of two numbers at the middle is the median.
These are the most basic methods of median inclusion and exclusion.
Let us see some examples regarding these methods with data.
Method 1 Examples
11 4 6 7 1 2 3 5 10 8 9
1 2 3 4 5 6 7 8 9 10 11
Lower Quartile: 1 2 3 4 5
6
Upper Quartile: 7 8 9 10 11
If the number of items in a series is odd, you may decide to remove the median, which is at the centre of the series.
After completing the general process, let’s start the different methods.
Alternatively, we can include the series at the end of the first half and the beginning of the second half.
1 2 3 4 5 6
6 7 8 9 10 11
Q1= (3+4)/2 = 3.5
Q3= (8+9)/2 = 8.5
IQR=Q3 – Q1 = 8.5 – 3.5 = 5.0
1 2 3 4 5 7 8 9 10 11
Mean = (5 + 7) / 2 = 6
Median 6 exclusion
1 2 3 4 5
7 8 9 10 11
Q1= 9
Q3= 6
IQR = Q3 – Q1 = 9 – 3 = 6
Method 3 has a general approach. If a series is X0………….Xk .
You have to find if the number of items are (4n+1)=k or (4n+3)=k
By (4n+1)=k and (4n+3)=k the number of items must fit the (4n+1)=k or (4n+3)=k and n being the natural number.
if you have a series with 9 items then
(4n+1)=k -> (4n+1)=9 -> 4n = 9 – 1 -> n = 8/4 -> n=2
(4n+3)=k -> (4n+3)=11 -> 4n = 11-3 -> n = 8/4 -> n=2
so when we get n=2
K is the number of items in the series.
K is the number of items in the series.
note:
In the above explanation, we are supposed to take 25% and 75% of a number to simplify it, so we are using 0.25 and 0.75, which will provide the same function.
More simply, we could use 4 and 3 to divide the numbers, but that will not drive in the point of taking a percentage of the numbers.
Let’s use example with 4n+1 number of items in series.
15 36 39 40 41 42 43 47 49
Let’s solve this problem with the following example.
k=9
4n+1 = (4 × 2) + 1 = 9
n = 2
IQR = Q3-Q1 = 44 – 38 = 6
6 17 17 19 22 22 26 31 35 35 38
We will solve this problem with following the above mentioned method.
k=11
4n+3=11
n=2
IQR = Q3 – Q1 = 35 –18.5 = 17.5
The following formula is used to calculate any quantile from a series X1,X2,X3…….Xn.
Qn (p)= xk + α ( x(k+1) + xk )
k =integer( p × ( n + 1 ) )
α = p × ( n + 1 ) − k
P denotes the percentage, e.g. Q1(25%)
6 17 17 19 22 22 26 31 35 35 38
n=11
Q1( 0.25 ) = Xk + α (X(k+1) + Xk )
k =integer( p(n + 1 ) ) = integer( 0.25(11+1) )=3
α = p (n + 1) − k = 0.25 ( 11+1 ) – 3 = 0
Q2 ( 0.50 ) = X6 + 0( X(6+1) + X6 ) = 22 + 0 × ( 26 + 22 ) = 22
Q2( 0.50 ) = 22
Q3 ( 0.75 ) = Xk + α ( X(k+1) + Xk )
k =integer ( p*( n + 1 ) ) = integer ( 0.75 × ( 11+1 ) ) = 9
α = p(n + 1) − k = 0.75 × ( 11 + 1 ) – 9 = 0
We have k=9 , α = 0
Q3 ( 0.75 ) = X9 + 0 × ( X(9+1) + X9 ) = 35 + 0 ( 35 + 35 ) = 35
Q3 ( 0.75 ) = 35
IQR = Q3 − Q1 = 35 – 17 = 18
IQR = 18
Application:
IQR can be used to analyse the spread of data around it’s mean. IF the IQR value is closer to the mean of the dataset, then the dataset is more densely populated around the mean. If the IQR value much higher than mean, then we can conclude the dataset is more widespread.
Since we know multiple methods to find IQR, let’s also learn how to use IQR in data science.
IQR can be used to find the outliers of a dataset.
We need Q1 and Q3 with IQR for it. We will create a range which encompasses the data, and any point that falls out of that range will be considered an outlier.
Range = [ Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5) ]
We can use this to set a fence for normal data under this dataset.
Let’s use the dataset from method 3 for example
15 36 39 40 41 42 43 47 49
Q1 = 38, Q3 = 44, IQR = 6
Fence = [Q1 − (IQR × 1.5) , Q3 + (IQR × 1.5)] = [38 − (6 × 1.5) , 44 + (6 × 1.5)]
Fence = [29 , 53]
According to the fence we created, it seems 15 is the outlier of this dataset. Let’s use a box plot from plotly to check the same.
The fences used in plotly box plot have different methods and calculation techniques, so they differ from our conclusions.
Conclusion :
We learned how to calculate IQR with 4 methods
Method 3 and method 4 are heavier towards formulation. But we can easily create a program in Python or any other language.
Method 1 and 2 are easier to follow and execute visually. So they are best for direct explanations and easy by hand solutions.
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
This article will introduce important functions in SQL rank, denserank, over, partition.
In SQL you can make queries in number of ways ,though we can break complex codes into small readable and calculated parts.
SQL offers several powerful analytical functions that can provide valuable insights
SQL’s analytic functions allow for complex calculations and deeper data insights
SQL’s window functions are a potent tool that enables you to perform
SQL has a powerful feature called Recursive Common Table Expressions (CTEs), enabling you to work with hierarchical or recursive data. When handling data structures such as organisational hierarchies, bills of materials, family trees, and other similar structures, they can prove extremely valuable. 1. What is a Recursive CTE? 2. Syntax of a Recursive CTE 3.…
Statistical and mathematical functions in SQL
solve these Efficient python code quizzes
This is the second segment of simple to advanced codes
Improve your analytical skills by practicing the following tasks
Leave a Reply
You must be logged in to post a comment.