SQL Analytic Functions

SQL’s analytic functions allow for complex calculations and in-depth analysis. They operate on rows in a query result set that are related to the current one.

I’ve prepared a thorough examination of common analytic functions for you.

These functions are intended to operate on a set of rows within a query result set that are related to the current row.

Understanding these functions will allow you to gain valuable insights into your data and make sound decisions.

Let’s use the table below as an example for the SQL command demonstration.

employee_id	department	employee_name	salary
1	HR	Alice	50000
2	HR	Bob	52000
3	HR	Carol	48000
4	IT	David	60000
5	IT	Emma	65000
6	Finance	Frank	55000
7	Finance	Grace	58000

1. NTILE(n):

NTILE(n) divides the result set into roughly equal-sized groups or “tiles,” each with its own group number. This function can be used to calculate quartiles or percentiles.

Examples

SQL

SELECT value, NTILE(4) OVER (ORDER BY value) AS quartile FROM dataset;

This query divides the dataset into four quartiles based on the value column

department	employee_name	salary	quartile
HR	Alice	50000	1
HR	Carol	48000	1
HR	Bob	52000	2
IT	David	60000	3
IT	Emma	65000	4
Finance	Frank	55000	3
Finance	Grace	58000	4

2.PERCENTILE_CONT(value) WITHIN GROUP (ORDER BY column):

PERCENTILE_CONT calculates the value at a specified percentile within a group of rows. This is particularly helpful for finding the median or other specific percentiles.

Example

SQL

SELECT department, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;

This query finds the median salary for each department.

department	median_salary
HR	50000
IT	60000
Finance	56500

3. PERCENTILE_DISC(value) WITHIN GROUP (ORDER BY column)

PERCENTILE DISC computes the value at a specified percentile within a group of rows, but instead of interpolated values, it returns an actual data value from the dataset. It can be used to find discrete percentiles.

SQL

SELECT department, PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY salary) AS first_quartile_salary
FROM employees
GROUP BY department;

This query finds the value at the first quartile (25th percentile) of salaries for each department.

department	first_quartile_salary
HR	49000
IT	60000
Finance	55000

4. CUME_DIST() WITHIN GROUP (ORDER BY column):

The cumulative distribution of a value within a group of rows is calculated by CUME DIST, indicating the relative position of a row within the group.

Example:

SQL

SELECT department, employee_name, salary, CUME_DIST() WITHIN GROUP (ORDER BY salary DESC) AS cumulative_salary_dist
FROM employees;

This query displays the cumulative distribution of salaries within the employees’ table, ordered by salary in descending order.

department	employee_name	salary	cumulative_salary_dist
IT	Emma	65000	0.4285714286
Finance	Grace	58000	0.8571428571
IT	David	60000	0.2857142857
Finance	Frank	55000	0.5714285714
HR	Bob	52000	1
HR	Alice	50000	0.8571428571
HR	Carol	48000	0.4285714286

5. Lag() and Lead():

The Lag() and Lead() functions allow you to access values from rows preceding or following a result set. They are frequently used to calculate data shifts or patterns.

SQL

SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;

This query retrieves the revenue for each day and the revenue for the previous day.

department	employee_name	salary	prev_employee_salary	next_employee_salary
HR	Alice	50000		52000
HR	Bob	52000	50000	48000
HR	Carol	48000	52000
IT	David	60000		65000
IT	Emma	65000	60000
Finance	Frank	55000		58000
Finance	Grace	58000	55000

6. First_Value() and Last_Value():

The functions First_Value() and Last_Value() return the first or last value within a group of rows in the specified order.

Example:

SQL

SELECT department, employee_name, salary,
       First_Value(employee_name) OVER (PARTITION BY department ORDER BY salary) AS lowest_paid_employee,
       Last_Value(employee_name) OVER (PARTITION BY department ORDER BY salary) AS highest_paid_employee
FROM employees;

This query finds the lowest- and highest-paid employees within each department.

department	employee_name	salary	lowest_paid_employee	highest_paid_employee
HR	Alice	50000	48000	52000
HR	Bob	52000	48000	52000
HR	Carol	48000	48000	52000
IT	David	60000	60000	65000
IT	Emma	65000	60000	65000
Finance	Frank	55000	55000	58000
Finance	Grace	58000	55000	58000

Analytic functions are versatile data analysis and reporting tools that allow you to perform a wide range of calculations within specific groups or ordered sets of data.

These examples illustrate how common analytic functions operate on a dataset and provide valuable insights into data distribution, trends, and percentiles. Analytic functions are powerful tools for data analysis, reporting, and decision-making in SQL.

Distinctive Analytics

1. NTILE(n):

2.PERCENTILE_CONT(value) WITHIN GROUP (ORDER BY column):

3. PERCENTILE_DISC(value) WITHIN GROUP (ORDER BY column)

4. CUME_DIST() WITHIN GROUP (ORDER BY column):

5. Lag() and Lead():

6. First_Value() and Last_Value():

ANCOVA: Analysis of Covariance with python

Learn Python The Fun Way

Meet the most efficient and intelligent AI assistant : NotebookLM

Break the ice

Two-Way ANOVA

ANOVA (Analysis of Variance ) part 1

Basic plots with Seaborn

Matplotlib in python

Plotly with Python and R

Numpy Array

NumPy: Python’s Mathematical Backbone

Introduction to Pandas: A Guide

Pandas Dataframe in brief

Exploring the World of Sets in Python

One response to “SQL Analytic Functions”

Points You Earned