SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions are a potent tool that enables you to perform
SQL’s window functions allow calculations over related rows, defined by the OVER() clause that partitions and orders the result set.
Here’s a more detailed account of window functions:
CREATE TABLE employees (
employee_id INT,
department VARCHAR(50),
employee_name VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
)
INSERT INTO employees (employee_id, department, employee_name, salary, hire_date)
/* (1, 'HR', 'Alice', 50000.00, '2022-03-15'), */
/* use above format to insert data for any other platform than oracle */
(1, 'HR', 'Alice', 50000.00, to_date('2022-03-15', 'YYYY-MM-DD')),
(2, 'HR', 'Bob', 52000.00, to_date('2021-07-10', 'YYYY-MM-DD')),
(3, 'HR', 'Carol', 48000.00, to_date('2023-01-20', 'YYYY-MM-DD')),
(4, 'IT', 'David', 60000.00, to_date('2022-05-05', 'YYYY-MM-DD')),
(5, 'IT', 'Emma', 65000.00, to_date('2023-02-18', 'YYYY-MM-DD')),
(6, 'Finance', 'Frank', 55000.00, to_date('2021-12-02', 'YYYY-MM-DD')),
(7, 'Finance', 'Grace', 58000.00, to_date('2022-08-30', 'YYYY-MM-DD'));
EMPLOYEE_ID | DEPARTMENT | EMPLOYEE_NAME | SALARY | HIRE_DATE |
---|---|---|---|---|
1 | HR | Alice | 50000 | 15-MAR-22 |
2 | HR | Bob | 52000 | 10-JUL-21 |
3 | HR | Carol | 48000 | 20-JAN-23 |
4 | IT | David | 60000 | 05-MAY-22 |
5 | IT | Emma | 65000 | 18-FEB-23 |
6 | Finance | Frank | 55000 | 02-DEC-21 |
7 | Finance | Grace | 58000 | 30-AUG-22 |
The result set is divided into partitions or groups based on one or more columns using the PARTITION BY clause. The window function is applied independently within each partition.
For example:
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_per_dept
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | AVG_SALARY_PER_DEPT |
---|---|---|---|
Finance | Frank | 55000 | 56500 |
Finance | Grace | 58000 | 56500 |
HR | Bob | 52000 | 50000 |
HR | Carol | 48000 | 50000 |
HR | Alice | 50000 | 50000 |
IT | Emma | 65000 | 62500 |
IT | David | 60000 | 62500 |
In this query, each department is divided using PARTITION BY, and the AVG(salary) window function calculates the average salary of each department.
The ORDER BY clause is crucial for functions that rely on row order within partitions, such as RANK(), LEAD(), and LAG().
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | DEPT_RANK |
---|---|---|---|
Finance | Grace | 58000 | 1 |
Finance | Frank | 55000 | 2 |
HR | Bob | 52000 | 1 |
HR | Alice | 50000 | 2 |
HR | Carol | 48000 | 3 |
IT | Emma | 65000 | 1 |
IT | David | 60000 | 2 |
Within each department, this system ranks employees based on their salary in descending order.
The window frame clause lets you specify a range of rows in each partition on which the window function operates.
Two types are available: ROWS BETWEEN and RANGE BETWEEN.
Use physical row positions with ROWS BETWEEN and specify frame bounds with UNBOUNDED PRECEDING, CURRENT ROW, or UNBOUNDED FOLLOWING.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employees;
DEPARTMENT | EMPLOYEE_NAME | SALARY | CUMULATIVE_SALARY |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 113000 |
HR | Bob | 52000 | 52000 |
HR | Alice | 50000 | 102000 |
HR | Carol | 48000 | 150000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 125000 |
This calculates the cumulative sum of salaries, ordered by hire date, for each department from the start of the partition to the current row.
RANGE BETWEEN operates on values within a window and is useful for non-integer data.
For example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary ASC RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING) AS salary_range_sum
DEPARTMENT | EMPLOYEE_NAME | SALARY | SALARY_RANGE_SUM |
---|---|---|---|
Finance | Frank | 55000 | 55000 |
Finance | Grace | 58000 | 58000 |
HR | Carol | 48000 | 48000 |
HR | Alice | 50000 | 50000 |
HR | Bob | 52000 | 52000 |
IT | David | 60000 | 60000 |
IT | Emma | 65000 | 65000 |
This query calculates the sum of salaries within each department for employees whose salaries fall within 1000 units of the current employee’s salary, based on the specified order.
Here are some common window functions:
RANK()
, DENSE_RANK()
, ROW_NUMBER()
:Assign a rank, dense rank, or row number within each partition.
These functions are used to assign a rank, dense rank, or row number to each row within a window defined by the OVER()
clause.
RANK()
assigns a unique rank to each distinct row, and if two or more rows have the same values, they receive the same rank, and the next rank is skipped.
DENSE_RANK()
also assigns a unique rank to each distinct row, but if two or more rows have the same values, they receive the same rank, and the next rank is not skipped.
ROW_NUMBER()
assigns a unique row number to each row within the window, without regard to duplicate values.
Example:
SELECT
employee_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;
department | employee_name | salary | rank | dense_rank | row_number |
---|---|---|---|---|---|
HR | Alice | 50000 | 2 | 2 | 1 |
HR | Bob | 52000 | 1 | 1 | 2 |
HR | Carol | 48000 | 3 | 3 | 3 |
IT | David | 60000 | 2 | 2 | 1 |
IT | Emma | 65000 | 1 | 1 | 2 |
Finance | Frank | 55000 | 2 | 2 | 1 |
Finance | Grace | 58000 | 1 | 1 | 2 |
In this query, we rank employees by their salary in descending order, using all three functions.
LEAD()
and LAG()
:Access values from subsequent or preceding rows.
LEAD()
and LAG()
allow you to access values from subsequent or preceding rows within the result set.
LEAD(column, n)
retrieves the value of column
from the row that is n
rows after the current row.
LAG(column, n)
retrieves the value of column
from the row that is n
rows before the current row.
Example:
SELECT
order_id,
order_date,
LEAD(order_date, 1) OVER (ORDER BY order_date) AS next_order_date,
LAG(order_date, 1) OVER (ORDER BY order_date) AS previous_order_date
FROM orders;
department | employee_name | salary | next_salary | previous_salary |
---|---|---|---|---|
HR | Alice | 50000 | 52000 | NULL |
HR | Bob | 52000 | 48000 | 50000 |
HR | Carol | 48000 | NULL | 52000 |
IT | David | 60000 | 65000 | NULL |
IT | Emma | 65000 | NULL | 60000 |
Finance | Frank | 55000 | 58000 | NULL |
Finance | Grace | 58000 | NULL | 55000 |
This query retrieves the order date, as well as the next and previous order dates for each order.
SUM()
, AVG()
, MIN()
, MAX()
:Perform aggregate calculations over the window.
These aggregate functions can be used with the OVER()
clause to perform calculations over a window of rows.
AVG(column) OVER (PARTITION BY partition_column)
calculates the average of column
within each partition.MIN(column) OVER (PARTITION BY partition_column)
finds the minimum value of column
within each partition.MAX(column) OVER (PARTITION BY partition_column)
finds the maximum value of column
within each partition.Example:
SELECT
department,
employee_name,
salary,
SUM(salary) OVER (PARTITION BY department) AS total_salary_dept,
AVG(salary) OVER (PARTITION BY department) AS avg_salary_dept
FROM employees;
department | employee_name | salary | total_salary_dept | avg_salary_dept |
---|---|---|---|---|
HR | Alice | 50000 | 150000 | 50000 |
HR | Bob | 52000 | 150000 | 50000 |
HR | Carol | 48000 | 150000 | 50000 |
IT | David | 60000 | 125000 | 62500 |
IT | Emma | 65000 | 125000 | 62500 |
Finance | Frank | 55000 | 113000 | 56500 |
Finance | Grace | 58000 | 113000 | 56500 |
In this query, we calculate the total and average salary within each department.
FIRST_VALUE()
and LAST_VALUE()
: Retrieve the first or last value within the window.
FIRST_VALUE(column)
retrieves the first value of column
within the window.
LAST_VALUE(column)
retrieves the last value of column
within the window.
Example:
SELECT
department,
employee_name,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS lowest_salary,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS highest_salary
FROM employees;
department | employee_name | salary | lowest_salary | highest_salary |
---|---|---|---|---|
HR | Alice | 50000 | 48000 | 52000 |
HR | Bob | 52000 | 48000 | 52000 |
HR | Carol | 48000 | 48000 | 52000 |
IT | David | 60000 | 60000 | 65000 |
IT | Emma | 65000 | 60000 | 65000 |
Finance | Frank | 55000 | 55000 | 58000 |
Finance | Grace | 58000 | 55000 | 58000 |
Window functions are especially useful for complex analytical tasks, such as calculating moving averages, identifying trends, and ranking data within groups. They provide a powerful way to gain insights into your data with precise control over how calculations are performed within specific partitions and ordered rows.
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
In this article, we will learn how to utilize the functionalities provided by excel and python libraries to calculate IQR,
After tourism was established as a motivator of local economies (country, state), many governments stepped up to the plate.
Sentiment analysis can determine the polarity of sentiments from given sentences. We can classify them into certain categories.
Traverse a dictionary with for loop Accessing keys and values in dictionary. Use Dict.values() and Dict.keys() to generate keys and values as iterable. Nested Dictionaries with for loop Access Nested values of Nested Dictionaries How useful was this post? Click on a star to rate it! Submit Rating
For loop is one of the most useful methods to reuse a code for repetitive execution.
These all metrics are revolving around visits and hits which we are getting on websites. Single page visits, Bounce, Cart Additions, Bounce Rate, Exit rate,
Hypothesis testing is a statistical method for determining whether or not a given hypothesis is true. A hypothesis can be any assumption based on data.
A/B tests are randomly controlled experiments. In A/B testing, you get user response on various versions of the product, and users are split within multiple versions of the product to figure out the “winner” of the version.
This article covers ‘for’ loops and how they are used with tuples. Even if the tuples are immutable, the accessibility of the tuples is similar to that of the list.
MANOVA is an update of ANOVA, where we use a minimum of two dependent variables.
One response to “SQL’s window function”
[…] 1. Window Functions: […]
Points You Earned