SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions allow calculations over related rows, defined by the OVER() clause that partitions and orders the result set.
Here’s a more detailed account of window functions:
CREATE TABLE employees (
employee_id INT,
department VARCHAR(50),
employee_name VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
)
INSERT INTO employees (employee_id, department, employee_name, salary, hire_date)
/* (1, 'HR', 'Alice', 50000.00, '2022-03-15'), */
/* use above format to insert data for any other platform than oracle */
(1, 'HR', 'Alice', 50000.00, to_date('2022-03-15', 'YYYY-MM-DD')),
(2, 'HR', 'Bob', 52000.00, to_date('2021-07-10', 'YYYY-MM-DD')),
(3, 'HR', 'Carol', 48000.00, to_date('2023-01-20', 'YYYY-MM-DD')),
(4, 'IT', 'David', 60000.00, to_date('2022-05-05', 'YYYY-MM-DD')),
(5, 'IT', 'Emma', 65000.00, to_date('2023-02-18', 'YYYY-MM-DD')),
(6, 'Finance', 'Frank', 55000.00, to_date('2021-12-02', 'YYYY-MM-DD')),
(7, 'Finance', 'Grace', 58000.00, to_date('2022-08-30', 'YYYY-MM-DD'));
EMPLOYEE_ID | DEPARTMENT | EMPLOYEE_NAME | SALARY | HIRE_DATE |
---|---|---|---|---|
1 | HR | Alice | 50000 | 15-MAR-22 |
2 | HR | Bob | 52000 | 10-JUL-21 |
3 | HR | Carol | 48000 | 20-JAN-23 |
4 | IT | David | 60000 | 05-MAY-22 |
5 | IT | Emma | 65000 | 18-FEB-23 |
6 | Finance | Frank | 55000 | 02-DEC-21 |
7 | Finance | Grace | 58000 | 30-AUG-22 |
The result set is divided into partitions or groups based on one or more columns using the PARTITION BY clause. The window function is applied independently within each partition.
For example:
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_per_dept
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | AVG_SALARY_PER_DEPT |
---|---|---|---|
Finance | Frank | 55000 | 56500 |
Finance | Grace | 58000 | 56500 |
HR | Bob | 52000 | 50000 |
HR | Carol | 48000 | 50000 |
HR | Alice | 50000 | 50000 |
IT | Emma | 65000 | 62500 |
IT | David | 60000 | 62500 |
In this query, each department is divided using PARTITION BY, and the AVG(salary) window function calculates the average salary of each department.
The ORDER BY clause is crucial for functions that rely on row order within partitions, such as RANK(), LEAD(), and LAG().
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | DEPT_RANK |
---|---|---|---|
Finance | Grace | 58000 | 1 |
Finance | Frank | 55000 | 2 |
HR | Bob | 52000 | 1 |
HR | Alice | 50000 | 2 |
HR | Carol | 48000 | 3 |
IT | Emma | 65000 | 1 |
IT | David | 60000 | 2 |
Within each department, this system ranks employees based on their salary in descending order.
The window frame clause lets you specify a range of rows in each partition on which the window function operates.
Two types are available: ROWS BETWEEN and RANGE BETWEEN.
Use physical row positions with ROWS BETWEEN and specify frame bounds with UNBOUNDED PRECEDING, CURRENT ROW, or UNBOUNDED FOLLOWING.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | CUMULATIVE_SALARY |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 113000 |
HR | Bob | 52000 | 52000 |
HR | Alice | 50000 | 102000 |
HR | Carol | 48000 | 150000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 125000 |
This calculates the cumulative sum of salaries, ordered by hire date, for each department from the start of the partition to the current row.
RANGE BETWEEN operates on values within a window and is useful for non-integer data.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary ASC RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING) AS salary_range_sum
DEPARTMENT | EMPLOYEE_NAME | SALARY | SALARY_RANGE_SUM |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 58000 |
HR | Carol | 48000 | 48000 |
HR | Alice | 50000 | 50000 |
HR | Bob | 52000 | 52000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 65000 |
This query calculates the sum of salaries within each department for employees whose salaries fall within 1000 units of the current employee’s salary, based on the specified order.
Here are some common window functions:
RANK()
, DENSE_RANK()
, ROW_NUMBER()
:Assign a rank, dense rank, or row number within each partition.
These functions are used to assign a rank, dense rank, or row number to each row within a window defined by the OVER()
clause.
RANK()
assigns a unique rank to each distinct row, and if two or more rows have the same values, they receive the same rank, and the next rank is skipped.
DENSE_RANK()
also assigns a unique rank to each distinct row, but if two or more rows have the same values, they receive the same rank, and the next rank is not skipped.
ROW_NUMBER()
assigns a unique row number to each row within the window, without regard to duplicate values.
Example:
SELECT
employee_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;
department | employee_name | salary | rank | dense_rank | row_number |
---|---|---|---|---|---|
HR | Alice | 50000 | 2 | 2 | 1 |
HR | Bob | 52000 | 1 | 1 | 2 |
HR | Carol | 48000 | 3 | 3 | 3 |
IT | David | 60000 | 2 | 2 | 1 |
IT | Emma | 65000 | 1 | 1 | 2 |
Finance | Frank | 55000 | 2 | 2 | 1 |
Finance | Grace | 58000 | 1 | 1 | 2 |
In this query, we rank employees by their salary in descending order, using all three functions.
LEAD()
and LAG()
:Access values from subsequent or preceding rows.
LEAD()
and LAG()
allow you to access values from subsequent or preceding rows within the result set.
LEAD(column, n)
retrieves the value of column
from the row that is n
rows after the current row.
LAG(column, n)
retrieves the value of column
from the row that is n
rows before the current row.
Example:
SELECT
order_id,
order_date,
LEAD(order_date, 1) OVER (ORDER BY order_date) AS next_order_date,
LAG(order_date, 1) OVER (ORDER BY order_date) AS previous_order_date
FROM orders;
department | employee_name | salary | next_salary | previous_salary |
---|---|---|---|---|
HR | Alice | 50000 | 52000 | NULL |
HR | Bob | 52000 | 48000 | 50000 |
HR | Carol | 48000 | NULL | 52000 |
IT | David | 60000 | 65000 | NULL |
IT | Emma | 65000 | NULL | 60000 |
Finance | Frank | 55000 | 58000 | NULL |
Finance | Grace | 58000 | NULL | 55000 |
This query retrieves the order date, as well as the next and previous order dates for each order.
SUM()
, AVG()
, MIN()
, MAX()
:Perform aggregate calculations over the window.
These aggregate functions can be used with the OVER()
clause to perform calculations over a window of rows.
AVG(column) OVER (PARTITION BY partition_column)
calculates the average of column
within each partition.MIN(column) OVER (PARTITION BY partition_column)
finds the minimum value of column
within each partition.MAX(column) OVER (PARTITION BY partition_column)
finds the maximum value of column
within each partition.Example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department) AS total_salary_dept,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_dept
FROM employees;
department | employee_name | salary | total_salary_dept | avg_salary_dept |
---|---|---|---|---|
HR | Alice | 50000 | 150000 | 50000 |
HR | Bob | 52000 | 150000 | 50000 |
HR | Carol | 48000 | 150000 | 50000 |
IT | David | 60000 | 125000 | 62500 |
IT | Emma | 65000 | 125000 | 62500 |
Finance | Frank | 55000 | 113000 | 56500 |
Finance | Grace | 58000 | 113000 | 56500 |
In this query, we calculate the total and average salary within each department.
FIRST_VALUE()
and LAST_VALUE()
: Retrieve the first or last value within the window.
FIRST_VALUE(column)
retrieves the first value of column
within the window.
LAST_VALUE(column)
retrieves the last value of column
within the window.
Example:
SELECT
department,
employee_name,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS lowest_salary,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS highest_salary
FROM employees;
department | employee_name | salary | lowest_salary | highest_salary |
---|---|---|---|---|
HR | Alice | 50000 | 48000 | 52000 |
HR | Bob | 52000 | 48000 | 52000 |
HR | Carol | 48000 | 48000 | 52000 |
IT | David | 60000 | 60000 | 65000 |
IT | Emma | 65000 | 60000 | 65000 |
Finance | Frank | 55000 | 55000 | 58000 |
Finance | Grace | 58000 | 55000 | 58000 |
Window functions are especially useful for complex analytical tasks, such as calculating moving averages, identifying trends, and ranking data within groups. They provide a powerful way to gain insights into your data with precise control over how calculations are performed within specific partitions and ordered rows.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
This can be a super guide for you to start and excel in your data science career.
Are demand forecasting truly predictable? Or are they changing randomly?
Let’s enjoy the highly interesting story of Tech Superstar chronologically.
starting your Python journey from scratch is a fantastic endeavour.
Try learning a topic from basic > if not understood, ask somebody>
A friendly guide what every computer science student should have when exams are coming
Learn SQL CRUD basics and Here’s a fast overview of how to utilize them in 5 minutes.
This article will introduce important functions in SQL rank, denserank, over, partition.
In SQL you can make queries in number of ways ,though we can break complex codes into small readable and calculated parts.
One response to “SQL’s window function”
[…] 1. Window Functions: […]
Points You Earned