SQL's window function - Distinctive Analytics

SQL’s window functions allow calculations over related rows, defined by the OVER() clause that partitions and orders the result set.

Here’s a more detailed account of window functions:

SQL

CREATE TABLE employees ( 
    employee_id INT, 
    department VARCHAR(50), 
    employee_name VARCHAR(50), 
    salary DECIMAL(10, 2), 
    hire_date DATE 
)

INSERT INTO employees (employee_id, department, employee_name, salary, hire_date) 
/* (1, 'HR', 'Alice', 50000.00, '2022-03-15'), */
/* use above format to insert data for any other platform than oracle */
(1, 'HR', 'Alice', 50000.00, to_date('2022-03-15', 'YYYY-MM-DD')),
(2, 'HR', 'Bob', 52000.00, to_date('2021-07-10', 'YYYY-MM-DD')),
(3, 'HR', 'Carol', 48000.00, to_date('2023-01-20', 'YYYY-MM-DD')),
(4, 'IT', 'David', 60000.00, to_date('2022-05-05', 'YYYY-MM-DD')),
(5, 'IT', 'Emma', 65000.00, to_date('2023-02-18', 'YYYY-MM-DD')),
(6, 'Finance', 'Frank', 55000.00, to_date('2021-12-02', 'YYYY-MM-DD')),
(7, 'Finance', 'Grace', 58000.00, to_date('2022-08-30', 'YYYY-MM-DD'));

EMPLOYEE_ID	DEPARTMENT	EMPLOYEE_NAME	SALARY	HIRE_DATE
1	HR	Alice	50000	15-MAR-22
2	HR	Bob	52000	10-JUL-21
3	HR	Carol	48000	20-JAN-23
4	IT	David	60000	05-MAY-22
5	IT	Emma	65000	18-FEB-23
6	Finance	Frank	55000	02-DEC-21
7	Finance	Grace	58000	30-AUG-22

1. PARTITION BY Clause:

The result set is divided into partitions or groups based on one or more columns using the PARTITION BY clause. The window function is applied independently within each partition.

For example:

SQL

SELECT
    department,
    employee_name,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS avg_salary_per_dept
FROM employees;

DEPARTMENT	EMPLOYEE_NAME	SALARY	AVG_SALARY_PER_DEPT
Finance	Frank	55000	56500
Finance	Grace	58000	56500
HR	Bob	52000	50000
HR	Carol	48000	50000
HR	Alice	50000	50000
IT	Emma	65000	62500
IT	David	60000	62500

In this query, each department is divided using PARTITION BY, and the AVG(salary) window function calculates the average salary of each department.

2. ORDER BY Clause:

The ORDER BY clause is crucial for functions that rely on row order within partitions, such as RANK(), LEAD(), and LAG().

SQL

SELECT
    department,
    employee_name,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

DEPARTMENT	EMPLOYEE_NAME	SALARY	DEPT_RANK
Finance	Grace	58000	1
Finance	Frank	55000	2
HR	Bob	52000	1
HR	Alice	50000	2
HR	Carol	48000	3
IT	Emma	65000	1
IT	David	60000	2

Within each department, this system ranks employees based on their salary in descending order.

3. Window Frame Clause (ROWS and RANGE):

The window frame clause lets you specify a range of rows in each partition on which the window function operates.
Two types are available: ROWS BETWEEN and RANGE BETWEEN.

Use physical row positions with ROWS BETWEEN and specify frame bounds with UNBOUNDED PRECEDING, CURRENT ROW, or UNBOUNDED FOLLOWING.

For example:

SQL

SELECT
    department,
    employee_name,
    salary,
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employees;

DEPARTMENT	EMPLOYEE_NAME	SALARY	CUMULATIVE_SALARY
Finance	Frank	55000	55000
Finance	Grace	58000	113000
HR	Bob	52000	52000
HR	Alice	50000	102000
HR	Carol	48000	150000
IT	David	60000	60000
IT	Emma	65000	125000

This calculates the cumulative sum of salaries, ordered by hire date, for each department from the start of the partition to the current row.

RANGE BETWEEN operates on values within a window and is useful for non-integer data.

For example:

SQL

SELECT
    department,
    employee_name,
    salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary ASC RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING) AS salary_range_sum

DEPARTMENT	EMPLOYEE_NAME	SALARY	SALARY_RANGE_SUM
Finance	Frank	55000	55000
Finance	Grace	58000	58000
HR	Carol	48000	48000
HR	Alice	50000	50000
HR	Bob	52000	52000
IT	David	60000	60000
IT	Emma	65000	65000

This query calculates the sum of salaries within each department for employees whose salaries fall within 1000 units of the current employee’s salary, based on the specified order.

4. Common Window Functions

Here are some common window functions:

4.1 `RANK()`, `DENSE_RANK()`, `ROW_NUMBER()`:

Assign a rank, dense rank, or row number within each partition.

These functions are used to assign a rank, dense rank, or row number to each row within a window defined by the OVER() clause.

RANK() assigns a unique rank to each distinct row, and if two or more rows have the same values, they receive the same rank, and the next rank is skipped.

DENSE_RANK() also assigns a unique rank to each distinct row, but if two or more rows have the same values, they receive the same rank, and the next rank is not skipped.

ROW_NUMBER() assigns a unique row number to each row within the window, without regard to duplicate values.

Example:

SQL

SELECT
    employee_name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;

department	employee_name	salary	rank	dense_rank	row_number
HR	Alice	50000	2	2	1
HR	Bob	52000	1	1	2
HR	Carol	48000	3	3	3
IT	David	60000	2	2	1
IT	Emma	65000	1	1	2
Finance	Frank	55000	2	2	1
Finance	Grace	58000	1	1	2

In this query, we rank employees by their salary in descending order, using all three functions.

4.2 `LEAD()` and `LAG()`:

Access values from subsequent or preceding rows.

LEAD() and LAG() allow you to access values from subsequent or preceding rows within the result set.

LEAD(column, n) retrieves the value of column from the row that is n rows after the current row.

LAG(column, n) retrieves the value of column from the row that is n rows before the current row.

Example:

SQL

SELECT
    order_id,
    order_date,
    LEAD(order_date, 1) OVER (ORDER BY order_date) AS next_order_date,
    LAG(order_date, 1) OVER (ORDER BY order_date) AS previous_order_date
FROM orders;

department	employee_name	salary	next_salary	previous_salary
HR	Alice	50000	52000	NULL
HR	Bob	52000	48000	50000
HR	Carol	48000	NULL	52000
IT	David	60000	65000	NULL
IT	Emma	65000	NULL	60000
Finance	Frank	55000	58000	NULL
Finance	Grace	58000	NULL	55000

This query retrieves the order date, as well as the next and previous order dates for each order.

4.3 `SUM()`, `AVG()`, `MIN()`, `MAX()`:

Perform aggregate calculations over the window.

These aggregate functions can be used with the OVER() clause to perform calculations over a window of rows.

AVG(column) OVER (PARTITION BY partition_column) calculates the average of column within each partition.
MIN(column) OVER (PARTITION BY partition_column) finds the minimum value of column within each partition.
MAX(column) OVER (PARTITION BY partition_column) finds the maximum value of column within each partition.

Example:

SQL

SELECT
    department,
    employee_name,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS total_salary_dept,
    AVG(salary) OVER (PARTITION BY department) AS avg_salary_dept
FROM employees;

department	employee_name	salary	total_salary_dept	avg_salary_dept
HR	Alice	50000	150000	50000
HR	Bob	52000	150000	50000
HR	Carol	48000	150000	50000
IT	David	60000	125000	62500
IT	Emma	65000	125000	62500
Finance	Frank	55000	113000	56500
Finance	Grace	58000	113000	56500

In this query, we calculate the total and average salary within each department.

4.4 `FIRST_VALUE()` and `LAST_VALUE()`:

Retrieve the first or last value within the window.

FIRST_VALUE(column) retrieves the first value of column within the window.

LAST_VALUE(column) retrieves the last value of column within the window.

Example:

SQL

SELECT
    department,
    employee_name,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS lowest_salary,
    LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS highest_salary
FROM employees;

department	employee_name	salary	lowest_salary	highest_salary
HR	Alice	50000	48000	52000
HR	Bob	52000	48000	52000
HR	Carol	48000	48000	52000
IT	David	60000	60000	65000
IT	Emma	65000	60000	65000
Finance	Frank	55000	55000	58000
Finance	Grace	58000	55000	58000

Window functions are especially useful for complex analytical tasks, such as calculating moving averages, identifying trends, and ranking data within groups. They provide a powerful way to gain insights into your data with precise control over how calculations are performed within specific partitions and ordered rows.

Distinctive Analytics