Seaborn library has matplotlib at its core for data point visualizations. This library gives highly statistical informative graphics functionality to Seaborn.
Seaborn library has matplotlib at its core for data point visualizations. This library gives highly statistical informative graphics functionality to Seaborn.
Matplotlib is at the heart of Seaborn Library’s data point visualizations. This library gives highly statistically informative graphics functionality to Seaborn.
Following are short instructions for installing seaborn.
# command prompt
pip install seaborn
# jupyter lab / collab
!pip install seaborn
You can use these commands on the command prompt. Use the same command on Jupyter or Collab in the code cell.
While importing Seaborn, you should also import the Matplotlib and pyplot objects for visualizing the plots. In the following image, we import seaborn, numpy, and pandas for visualizations, mathematical functions, and dataset handling.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
KDE plots are used to show the data distribution. In the following code, we use artificially produced data to plot with KDE.
Distplot can help with observing histograms with trend lines. Distplot is useful for studying continuous data.
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.distplot(data['A']);
sns.distplot(data['B']);
We can also plot the data with KDE with a 2-factor distribution plot.
sns.kdeplot(x=data['A'], y=data['B'])
with sns.axes_style('white'):
sns.jointplot("A", "B", data, kind='kde');
with sns.axes_style('white'):
sns.jointplot("A", "B", data, kind="hex")
Let’s use the planets dataset from Seaborn Library for rendering multiple plots in a single image. Using pair plots will help in understanding paired relations among multiple variables.
planets = sns.load_dataset("planets")
planets.head()
method | number | orbital period | mass | distance | year | |
---|---|---|---|---|---|---|
Radial Velocity | 1 | 269.300 | 7.10 | 77.40 | 2006 | |
Radial Velocity | 1 | 874.774 | 2.21 | 56.95 | 2008 | |
Radial Velocity | 1 | 763.000 | 2.60 | 19.84 | 2011 | |
Radial Velocity | 1 | 326.030 | 19.40 | 110.62 | 2007 | |
Radial Velocity | 1 | 516.220 | 10.50 | 119.47 | 2009 |
sns.pairplot(planets, hue='year', size=2.5);
In a joint plot, we get a bivariate graph and two separate graphs placed on their respective axis to show each variable’s distribution plotted with histograms.
In total, you get three graphs,
aplanets=sns.load_dataset('geyser')
sns.jointplot("duration", "waiting", data=planets, kind='reg')
We can change the kind parameter and use different arguments in the joint plot. To show an example, we will use the ‘hex’ argument with kind.
geyser=sns.load_dataset('geyser')
with sns.axes_style('white'):
sns.jointplot("duration", "waiting", data=geyser, kind='hex')
For factor plots, we will use the taxis dataset, which has multivariate data and also has several data types. This dataset was originally published by the NYC Taxi and Limousine Commission (TLC)
taxis = sns.load_dataset('taxis')
taxis.head(5)
Pickup | Drop-Off | Passengers | Distance | Fare | Tip | Tolls | Total | Color | Payment | Pickup_Zone | Dropoff_Zone | Pickup_Borough | dropoff_borough |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3/23/2019 20:21 | 3/23/2019 20:27 | 1 | 1.6 | 7 | 2.15 | 0 | 12.95 | yellow | credit card | Lenox Hill West | UN/Turtle Bay South | Manhattan | Manhattan |
3/4/2019 16:11 | 3/4/2019 16:19 | 1 | 0.79 | 5 | 0 | 0 | 9.3 | yellow | cash | Upper West Side South | Upper West Side South | Manhattan | Manhattan |
3/27/2019 17:53 | 3/27/2019 18:00 | 1 | 1.37 | 7.5 | 2.36 | 0 | 14.16 | yellow | credit card | Alphabet City | West Village | Manhattan | Manhattan |
3/10/2019 01:23 | 3/10/2019 01:49 | 1 | 7.7 | 27 | 6.15 | 0 | 36.95 | yellow | credit card | Hudson Sq | Yorkville West | Manhattan | Manhattan |
3/30/2019 13:27 | 3/30/2019 13:37 | 3 | 2.16 | 9 | 1.1 | 0 | 13.4 | yellow | credit card | Midtown East | Yorkville West | Manhattan | Manhattan |
with sns.axes_style(style='ticks'):
g=sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.factorplot("color", "fare", "pickup_borough", data=taxis, kind="box")
g.set_axis_labels("Taxis", "dropoff_borough");
To understand time series analysis, we are using the worldwide life expectancy dataset. The first plot shows the world GDP over the years and its increase.
lifeexp=sns.load_dataset('healthexp')
lifeexp.head(1)
Year | Country | Spending Usd | Life Expectancy |
---|---|---|---|
1970 | Germany | 252.311 | 70.6 |
with sns.axes_style('white'):
g = sns.factorplot(x="Year",y="Spending_USD" ,data=lifeexp,aspect=4.0,)
g.set_xticklabels(step=5)
with sns.axes_style('white'):
g = sns.factorplot(x="Year",y="Life_Expectancy" ,data=lifeexp ,aspect=2.0,)
g.set_xticklabels(step=5)
ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.
What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!
Start using NotebookLM today and embark on a smarter, more efficient learning journey!
This can be a super guide for you to start and excel in your data science career.
A method to find a statistical relationship between two variables in a dataset where one variable is used to group data.
Seaborn library has matplotlib at its core for data point visualizations. This library gives highly statistical informative graphics functionality to Seaborn.
The Matplotlib library helps you create static and dynamic visualisations. Dynamic visualizations that are animated and interactive. This library makes it easy to plot data and create graphs.
This library is named Plotly after the company of the same name. Plotly provides visualization libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.
Numpy array have functions for matrices ,linear algebra ,Fourier Transform. Numpy arrays provide 50x more speed than a python list.
Numpy has created a vast ecosystem spanning numerous fields of science.
Pandas is a easy to use data analysis and manipulation tool. Pandas provides functionality for categorical,ordinal, and time series data . Panda provides fast and powerful calculations for data analysis.
In this tutorial, you will learn How to Access The Data in Various Ways From the dataframe.
Understand one of the important data types in Python. Each item in a set is distinct. Sets can store multiple items of various types of data.
Tuples are a sequence of Python objects. A tuple is created by separating items with a comma. They are put inside the parenthesis “”(“” , “”)””.
[…] Seaborn is a Python data visualization library based on Matplotlib. See how to use basic plots with seaborn here. […]
Leave a Reply
You must be logged in to post a comment.