SQL Analytic Functions

SQL’s analytic functions allow for complex calculations and deeper data insights

SQL’s analytic functions allow for complex calculations and in-depth analysis. They operate on rows in a query result set that are related to the current one.

I’ve prepared a thorough examination of common analytic functions for you.

These functions are intended to operate on a set of rows within a query result set that are related to the current row.

Understanding these functions will allow you to gain valuable insights into your data and make sound decisions.

Let’s use the table below as an example for the SQL command demonstration.

employee_iddepartmentemployee_namesalary
1HRAlice50000
2HRBob52000
3HRCarol48000
4ITDavid60000
5ITEmma65000
6FinanceFrank55000
7FinanceGrace58000

1. NTILE(n):

NTILE(n) divides the result set into roughly equal-sized groups or “tiles,” each with its own group number.  This function can be used to calculate quartiles or percentiles.

Examples

SQL
SQL
SQL
SELECT value, NTILE(4) OVER (ORDER BY value) AS quartile FROM dataset;

This query divides the dataset into four quartiles based on the value column

departmentemployee_namesalaryquartile
HRAlice500001
HRCarol480001
HRBob520002
ITDavid600003
ITEmma650004
FinanceFrank550003
FinanceGrace580004

2.PERCENTILE_CONT(value) WITHIN GROUP (ORDER BY column):

  • PERCENTILE_CONT calculates the value at a specified percentile within a group of rows. This is particularly helpful for finding the median or other specific percentiles.

Example

SQL
SQL
SQL
SELECT department, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;

This query finds the median salary for each department.

departmentmedian_salary
HR50000
IT60000
Finance56500

3. PERCENTILE_DISC(value) WITHIN GROUP (ORDER BY column)

PERCENTILE DISC computes the value at a specified percentile within a group of rows, but instead of interpolated values, it returns an actual data value from the dataset.  It can be used to find discrete percentiles.

SQL
SQL
SQL
SELECT department, PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY salary) AS first_quartile_salary
FROM employees
GROUP BY department;

This query finds the value at the first quartile (25th percentile) of salaries for each department.

departmentfirst_quartile_salary
HR49000
IT60000
Finance55000

4. CUME_DIST() WITHIN GROUP (ORDER BY column):

The cumulative distribution of a value within a group of rows is calculated by CUME DIST, indicating the relative position of a row within the group.

Example:

SQL
SQL
SQL
SELECT department, employee_name, salary, CUME_DIST() WITHIN GROUP (ORDER BY salary DESC) AS cumulative_salary_dist
FROM employees;

This query displays the cumulative distribution of salaries within the employees’ table, ordered by salary in descending order.

departmentemployee_namesalarycumulative_salary_dist
ITEmma650000.4285714286
FinanceGrace580000.8571428571
ITDavid600000.2857142857
FinanceFrank550000.5714285714
HRBob520001
HRAlice500000.8571428571
HRCarol480000.4285714286

5. Lag() and Lead():

The Lag() and Lead() functions allow you to access values from rows preceding or following a result set.  They are frequently used to calculate data shifts or patterns.

SQL
SQL
SQL
SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;

This query retrieves the revenue for each day and the revenue for the previous day.

departmentemployee_namesalaryprev_employee_salarynext_employee_salary
HRAlice5000052000
HRBob520005000048000
HRCarol4800052000
ITDavid6000065000
ITEmma6500060000
FinanceFrank5500058000
FinanceGrace5800055000

6. First_Value() and Last_Value():

The functions First_Value() and Last_Value() return the first or last value within a group of rows in the specified order.

Example:

SQL
SQL
SQL
SELECT department, employee_name, salary,
       First_Value(employee_name) OVER (PARTITION BY department ORDER BY salary) AS lowest_paid_employee,
       Last_Value(employee_name) OVER (PARTITION BY department ORDER BY salary) AS highest_paid_employee
FROM employees;

This query finds the lowest- and highest-paid employees within each department.

departmentemployee_namesalarylowest_paid_employeehighest_paid_employee
HRAlice500004800052000
HRBob520004800052000
HRCarol480004800052000
ITDavid600006000065000
ITEmma650006000065000
FinanceFrank550005500058000
FinanceGrace580005500058000

Analytic functions are versatile data analysis and reporting tools that allow you to perform a wide range of calculations within specific groups or ordered sets of data.

These examples illustrate how common analytic functions operate on a dataset and provide valuable insights into data distribution, trends, and percentiles. Analytic functions are powerful tools for data analysis, reporting, and decision-making in SQL.

How useful was this post?

Click on a star to rate it!

  • ANCOVA: Analysis of Covariance with python

    ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.

  • Learn Python The Fun Way

    What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!

  • Meet the most efficient and intelligent AI assistant : NotebookLM

    Start using NotebookLM today and embark on a smarter, more efficient learning journey!

  • Break the ice

    This can be a super guide for you to start and excel in your data science career.

  • Tourism Trend Prediction

    After tourism was established as a motivator of local economies (country, state), many governments stepped up to the plate.

  • Sentiment Analysis Polarity Detection using pos tag

    Sentiment analysis can determine the polarity of sentiments from given sentences. We can classify them into certain categories.

  • For loop with Dictionary

    Traverse a dictionary with for loop Accessing keys and values in dictionary. Use Dict.values() and Dict.keys() to generate keys and values as iterable. Nested Dictionaries with for loop Access Nested values of Nested Dictionaries How useful was this post? Click on a star to rate it! Submit Rating

  • For Loops with python

    For loop is one of the most useful methods to reuse a code for repetitive execution.

  • Metrics and terminologies of digital analytics

    These all metrics are revolving around visits and hits which we are getting on websites. Single page visits, Bounce, Cart Additions, Bounce Rate, Exit rate,

  • Hypothesis Testing

    Hypothesis testing is a statistical method for determining whether or not a given hypothesis is true. A hypothesis can be any assumption based on data.

  • A/B testing

    A/B tests are randomly controlled experiments. In A/B testing, you get user response on various versions of the product, and users are split within multiple versions of the product to figure out the “winner” of the version.

  • For Loop With Tuples

    This article covers ‘for’ loops and how they are used with tuples. Even if the tuples are immutable, the accessibility of the tuples is similar to that of the list.

  • Multivariate ANOVA (MANOVA) with python

    MANOVA is an update of ANOVA, where we use a minimum of two dependent variables.

  • Two-Way ANOVA

    You only need to understand two or three concepts if you have read the one-way ANOVA article. We use two factors instead of one in a two-way ANOVA.

One response to “SQL Analytic Functions”

Points You Earned

Untitled design 6
0 distinction_points
Untitled design 5
python_points 0
0 Solver points
Instagram
WhatsApp
error: Content is protected !!