Seaborn library has matplotlib at its core for data point visualizations. This library gives highly statistical informative graphics functionality to Seaborn.
Seaborn library has matplotlib at its core for data point visualizations. This library gives highly statistical informative graphics functionality to Seaborn.
Matplotlib is at the heart of Seaborn Library’s data point visualizations. This library gives highly statistically informative graphics functionality to Seaborn.
Following are short instructions for installing seaborn.
# command prompt
pip install seaborn
# jupyter lab / collab
!pip install seaborn
You can use these commands on the command prompt. Use the same command on Jupyter or Collab in the code cell.
While importing Seaborn, you should also import the Matplotlib and pyplot objects for visualizing the plots. In the following image, we import seaborn, numpy, and pandas for visualizations, mathematical functions, and dataset handling.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
KDE plots are used to show the data distribution. In the following code, we use artificially produced data to plot with KDE.
Distplot can help with observing histograms with trend lines. Distplot is useful for studying continuous data.
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.distplot(data['A']);
sns.distplot(data['B']);
We can also plot the data with KDE with a 2-factor distribution plot.
sns.kdeplot(x=data['A'], y=data['B'])
with sns.axes_style('white'):
sns.jointplot("A", "B", data, kind='kde');
with sns.axes_style('white'):
sns.jointplot("A", "B", data, kind="hex")
Let’s use the planets dataset from Seaborn Library for rendering multiple plots in a single image. Using pair plots will help in understanding paired relations among multiple variables.
planets = sns.load_dataset("planets")
planets.head()
method | number | orbital period | mass | distance | year | |
---|---|---|---|---|---|---|
Radial Velocity | 1 | 269.300 | 7.10 | 77.40 | 2006 | |
Radial Velocity | 1 | 874.774 | 2.21 | 56.95 | 2008 | |
Radial Velocity | 1 | 763.000 | 2.60 | 19.84 | 2011 | |
Radial Velocity | 1 | 326.030 | 19.40 | 110.62 | 2007 | |
Radial Velocity | 1 | 516.220 | 10.50 | 119.47 | 2009 |
sns.pairplot(planets, hue='year', size=2.5);
In a joint plot, we get a bivariate graph and two separate graphs placed on their respective axis to show each variable’s distribution plotted with histograms.
In total, you get three graphs,
aplanets=sns.load_dataset('geyser')
sns.jointplot("duration", "waiting", data=planets, kind='reg')
We can change the kind parameter and use different arguments in the joint plot. To show an example, we will use the ‘hex’ argument with kind.
geyser=sns.load_dataset('geyser')
with sns.axes_style('white'):
sns.jointplot("duration", "waiting", data=geyser, kind='hex')
For factor plots, we will use the taxis dataset, which has multivariate data and also has several data types. This dataset was originally published by the NYC Taxi and Limousine Commission (TLC)
taxis = sns.load_dataset('taxis')
taxis.head(5)
Pickup | Drop-Off | Passengers | Distance | Fare | Tip | Tolls | Total | Color | Payment | Pickup_Zone | Dropoff_Zone | Pickup_Borough | dropoff_borough |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3/23/2019 20:21 | 3/23/2019 20:27 | 1 | 1.6 | 7 | 2.15 | 0 | 12.95 | yellow | credit card | Lenox Hill West | UN/Turtle Bay South | Manhattan | Manhattan |
3/4/2019 16:11 | 3/4/2019 16:19 | 1 | 0.79 | 5 | 0 | 0 | 9.3 | yellow | cash | Upper West Side South | Upper West Side South | Manhattan | Manhattan |
3/27/2019 17:53 | 3/27/2019 18:00 | 1 | 1.37 | 7.5 | 2.36 | 0 | 14.16 | yellow | credit card | Alphabet City | West Village | Manhattan | Manhattan |
3/10/2019 01:23 | 3/10/2019 01:49 | 1 | 7.7 | 27 | 6.15 | 0 | 36.95 | yellow | credit card | Hudson Sq | Yorkville West | Manhattan | Manhattan |
3/30/2019 13:27 | 3/30/2019 13:37 | 3 | 2.16 | 9 | 1.1 | 0 | 13.4 | yellow | credit card | Midtown East | Yorkville West | Manhattan | Manhattan |
with sns.axes_style(style='ticks'):
g=sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.factorplot("color", "fare", "pickup_borough", data=taxis, kind="box")
g.set_axis_labels("Taxis", "dropoff_borough");
To understand time series analysis, we are using the worldwide life expectancy dataset. The first plot shows the world GDP over the years and its increase.
lifeexp=sns.load_dataset('healthexp')
lifeexp.head(1)
Year | Country | Spending Usd | Life Expectancy |
---|---|---|---|
1970 | Germany | 252.311 | 70.6 |
with sns.axes_style('white'):
g = sns.factorplot(x="Year",y="Spending_USD" ,data=lifeexp,aspect=4.0,)
g.set_xticklabels(step=5)
with sns.axes_style('white'):
g = sns.factorplot(x="Year",y="Life_Expectancy" ,data=lifeexp ,aspect=2.0,)
g.set_xticklabels(step=5)
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
Solve this quiz for testing Manova Basics
Test your knowledge on pandas groupby with this quiz
Observe the dataset and try to solve the Visualization quiz on it
To perform ANCOVA (Analysis of Covariance) with a dataset that includes multiple types of variables, you’ll need to ensure your dependent variable is continuous, and you can include categorical variables as factors. Below is an example using the statsmodels library in Python: Mock Dataset Let’s create a dataset with a mix of variable types: Performing…
How useful was this post? Click on a star to rate it! Submit Rating
Complete the code by dragging and dropping the correct functions
Python functions are a vital concept in programming which enables you to group and define a collection of instructions. This makes your code more organized, modular, and easier to understand and maintain. Defining a Function: In Python, you can define a function via the def keyword, followed by the function name, any parameters wrapped in parentheses,…
Mastering indexing will significantly boost your data manipulation and analysis skills, a crucial step in your data science journey.
Stable Diffusion Models: Where Art and AI Collide Artificial Intelligence meets creativity in the fascinating realm of Stable Diffusion Models. These innovative models take text descriptions and bring them to life in the form of detailed and realistic images. Let’s embark on a journey to understand the magic behind Stable Diffusion in a way that’s…
[…] Seaborn is a Python data visualization library based on Matplotlib. See how to use basic plots with seaborn here. […]
Leave a Reply
You must be logged in to post a comment.