SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions allow calculations over related rows, defined by the OVER() clause that partitions and orders the result set.
Here’s a more detailed account of window functions:
CREATE TABLE employees (
employee_id INT,
department VARCHAR(50),
employee_name VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
)
INSERT INTO employees (employee_id, department, employee_name, salary, hire_date)
/* (1, 'HR', 'Alice', 50000.00, '2022-03-15'), */
/* use above format to insert data for any other platform than oracle */
(1, 'HR', 'Alice', 50000.00, to_date('2022-03-15', 'YYYY-MM-DD')),
(2, 'HR', 'Bob', 52000.00, to_date('2021-07-10', 'YYYY-MM-DD')),
(3, 'HR', 'Carol', 48000.00, to_date('2023-01-20', 'YYYY-MM-DD')),
(4, 'IT', 'David', 60000.00, to_date('2022-05-05', 'YYYY-MM-DD')),
(5, 'IT', 'Emma', 65000.00, to_date('2023-02-18', 'YYYY-MM-DD')),
(6, 'Finance', 'Frank', 55000.00, to_date('2021-12-02', 'YYYY-MM-DD')),
(7, 'Finance', 'Grace', 58000.00, to_date('2022-08-30', 'YYYY-MM-DD'));
EMPLOYEE_ID | DEPARTMENT | EMPLOYEE_NAME | SALARY | HIRE_DATE |
---|---|---|---|---|
1 | HR | Alice | 50000 | 15-MAR-22 |
2 | HR | Bob | 52000 | 10-JUL-21 |
3 | HR | Carol | 48000 | 20-JAN-23 |
4 | IT | David | 60000 | 05-MAY-22 |
5 | IT | Emma | 65000 | 18-FEB-23 |
6 | Finance | Frank | 55000 | 02-DEC-21 |
7 | Finance | Grace | 58000 | 30-AUG-22 |
The result set is divided into partitions or groups based on one or more columns using the PARTITION BY clause. The window function is applied independently within each partition.
For example:
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_per_dept
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | AVG_SALARY_PER_DEPT |
---|---|---|---|
Finance | Frank | 55000 | 56500 |
Finance | Grace | 58000 | 56500 |
HR | Bob | 52000 | 50000 |
HR | Carol | 48000 | 50000 |
HR | Alice | 50000 | 50000 |
IT | Emma | 65000 | 62500 |
IT | David | 60000 | 62500 |
In this query, each department is divided using PARTITION BY, and the AVG(salary) window function calculates the average salary of each department.
The ORDER BY clause is crucial for functions that rely on row order within partitions, such as RANK(), LEAD(), and LAG().
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | DEPT_RANK |
---|---|---|---|
Finance | Grace | 58000 | 1 |
Finance | Frank | 55000 | 2 |
HR | Bob | 52000 | 1 |
HR | Alice | 50000 | 2 |
HR | Carol | 48000 | 3 |
IT | Emma | 65000 | 1 |
IT | David | 60000 | 2 |
Within each department, this system ranks employees based on their salary in descending order.
The window frame clause lets you specify a range of rows in each partition on which the window function operates.
Two types are available: ROWS BETWEEN and RANGE BETWEEN.
Use physical row positions with ROWS BETWEEN and specify frame bounds with UNBOUNDED PRECEDING, CURRENT ROW, or UNBOUNDED FOLLOWING.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | CUMULATIVE_SALARY |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 113000 |
HR | Bob | 52000 | 52000 |
HR | Alice | 50000 | 102000 |
HR | Carol | 48000 | 150000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 125000 |
This calculates the cumulative sum of salaries, ordered by hire date, for each department from the start of the partition to the current row.
RANGE BETWEEN operates on values within a window and is useful for non-integer data.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary ASC RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING) AS salary_range_sum
DEPARTMENT | EMPLOYEE_NAME | SALARY | SALARY_RANGE_SUM |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 58000 |
HR | Carol | 48000 | 48000 |
HR | Alice | 50000 | 50000 |
HR | Bob | 52000 | 52000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 65000 |
This query calculates the sum of salaries within each department for employees whose salaries fall within 1000 units of the current employee’s salary, based on the specified order.
Here are some common window functions:
RANK()
, DENSE_RANK()
, ROW_NUMBER()
:Assign a rank, dense rank, or row number within each partition.
These functions are used to assign a rank, dense rank, or row number to each row within a window defined by the OVER()
clause.
RANK()
assigns a unique rank to each distinct row, and if two or more rows have the same values, they receive the same rank, and the next rank is skipped.
DENSE_RANK()
also assigns a unique rank to each distinct row, but if two or more rows have the same values, they receive the same rank, and the next rank is not skipped.
ROW_NUMBER()
assigns a unique row number to each row within the window, without regard to duplicate values.
Example:
SELECT
employee_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;
department | employee_name | salary | rank | dense_rank | row_number |
---|---|---|---|---|---|
HR | Alice | 50000 | 2 | 2 | 1 |
HR | Bob | 52000 | 1 | 1 | 2 |
HR | Carol | 48000 | 3 | 3 | 3 |
IT | David | 60000 | 2 | 2 | 1 |
IT | Emma | 65000 | 1 | 1 | 2 |
Finance | Frank | 55000 | 2 | 2 | 1 |
Finance | Grace | 58000 | 1 | 1 | 2 |
In this query, we rank employees by their salary in descending order, using all three functions.
LEAD()
and LAG()
:Access values from subsequent or preceding rows.
LEAD()
and LAG()
allow you to access values from subsequent or preceding rows within the result set.
LEAD(column, n)
retrieves the value of column
from the row that is n
rows after the current row.
LAG(column, n)
retrieves the value of column
from the row that is n
rows before the current row.
Example:
SELECT
order_id,
order_date,
LEAD(order_date, 1) OVER (ORDER BY order_date) AS next_order_date,
LAG(order_date, 1) OVER (ORDER BY order_date) AS previous_order_date
FROM orders;
department | employee_name | salary | next_salary | previous_salary |
---|---|---|---|---|
HR | Alice | 50000 | 52000 | NULL |
HR | Bob | 52000 | 48000 | 50000 |
HR | Carol | 48000 | NULL | 52000 |
IT | David | 60000 | 65000 | NULL |
IT | Emma | 65000 | NULL | 60000 |
Finance | Frank | 55000 | 58000 | NULL |
Finance | Grace | 58000 | NULL | 55000 |
This query retrieves the order date, as well as the next and previous order dates for each order.
SUM()
, AVG()
, MIN()
, MAX()
:Perform aggregate calculations over the window.
These aggregate functions can be used with the OVER()
clause to perform calculations over a window of rows.
AVG(column) OVER (PARTITION BY partition_column)
calculates the average of column
within each partition.MIN(column) OVER (PARTITION BY partition_column)
finds the minimum value of column
within each partition.MAX(column) OVER (PARTITION BY partition_column)
finds the maximum value of column
within each partition.Example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department) AS total_salary_dept,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_dept
FROM employees;
department | employee_name | salary | total_salary_dept | avg_salary_dept |
---|---|---|---|---|
HR | Alice | 50000 | 150000 | 50000 |
HR | Bob | 52000 | 150000 | 50000 |
HR | Carol | 48000 | 150000 | 50000 |
IT | David | 60000 | 125000 | 62500 |
IT | Emma | 65000 | 125000 | 62500 |
Finance | Frank | 55000 | 113000 | 56500 |
Finance | Grace | 58000 | 113000 | 56500 |
In this query, we calculate the total and average salary within each department.
FIRST_VALUE()
and LAST_VALUE()
: Retrieve the first or last value within the window.
FIRST_VALUE(column)
retrieves the first value of column
within the window.
LAST_VALUE(column)
retrieves the last value of column
within the window.
Example:
SELECT
department,
employee_name,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS lowest_salary,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS highest_salary
FROM employees;
department | employee_name | salary | lowest_salary | highest_salary |
---|---|---|---|---|
HR | Alice | 50000 | 48000 | 52000 |
HR | Bob | 52000 | 48000 | 52000 |
HR | Carol | 48000 | 48000 | 52000 |
IT | David | 60000 | 60000 | 65000 |
IT | Emma | 65000 | 60000 | 65000 |
Finance | Frank | 55000 | 55000 | 58000 |
Finance | Grace | 58000 | 55000 | 58000 |
Window functions are especially useful for complex analytical tasks, such as calculating moving averages, identifying trends, and ranking data within groups. They provide a powerful way to gain insights into your data with precise control over how calculations are performed within specific partitions and ordered rows.
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
Solve this quiz for testing Manova Basics
Test your knowledge on pandas groupby with this quiz
Observe the dataset and try to solve the Visualization quiz on it
To perform ANCOVA (Analysis of Covariance) with a dataset that includes multiple types of variables, you’ll need to ensure your dependent variable is continuous, and you can include categorical variables as factors. Below is an example using the statsmodels library in Python: Mock Dataset Let’s create a dataset with a mix of variable types: Performing…
How useful was this post? Click on a star to rate it! Submit Rating
Complete the code by dragging and dropping the correct functions
Python functions are a vital concept in programming which enables you to group and define a collection of instructions. This makes your code more organized, modular, and easier to understand and maintain. Defining a Function: In Python, you can define a function via the def keyword, followed by the function name, any parameters wrapped in parentheses,…
Mastering indexing will significantly boost your data manipulation and analysis skills, a crucial step in your data science journey.
Stable Diffusion Models: Where Art and AI Collide Artificial Intelligence meets creativity in the fascinating realm of Stable Diffusion Models. These innovative models take text descriptions and bring them to life in the form of detailed and realistic images. Let’s embark on a journey to understand the magic behind Stable Diffusion in a way that’s…
One response to “SQL’s window function”
[…] 1. Window Functions: […]
Points You Earned