DataFrame
s. In order to visualize data from a Pandas DataFrame
, you must extract each Series
and often concatenate them together into the right format. It would be nicer to have a plotting library that can intelligently use the DataFrame
labels in a plot.DataFrame
s.plt.style
tools discussed in Customizing Matplotlib: Configurations and Style Sheets, and is starting to handle Pandas data more seamlessly.
The 2.0 release of the library will include a new default stylesheet that will improve on the current status quo.
But for all the reasons just discussed, Seaborn remains an extremely useful addon.import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd
# Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)
# Plot the data with Matplotlib defaults
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');
set()
method.
By convention, Seaborn is imported as sns
:import seaborn as sns
sns.set()
# same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
for col in 'xy':
plt.hist(data[col], normed=True, alpha=0.5)
sns.kdeplot
:for col in 'xy':
sns.kdeplot(data[col], shade=True)
distplot
:sns.distplot(data['x'])
sns.distplot(data['y']);
kdeplot
, we will get a two-dimensional visualization of the data:sns.kdeplot(data);
sns.jointplot
.
For this plot, we'll set the style to a white background:with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='kde');
jointplot
—for example, we can use a hexagonally based histogram instead:with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='hex')
iris = sns.load_dataset("iris")
iris.head()
sns.pairplot
:sns.pairplot(iris, hue='species', size=2.5);
FacetGrid
makes this extremely simple.
We'll take a look at some data that shows the amount that restaurant staff receive in tips based on various indicator data:tips = sns.load_dataset('tips')
tips.head()
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));
with sns.axes_style(style='ticks'):
g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
g.set_axis_labels("Day", "Total Bill");
sns.jointplot
to show the joint distribution between different datasets, along with the associated marginal distributions:with sns.axes_style('white'):
sns.jointplot("total_bill", "tip", data=tips, kind='hex')
sns.jointplot("total_bill", "tip", data=tips, kind='reg');
sns.factorplot
. In the following example, we'll use the Planets data that we first saw in Aggregation and Grouping:planets = sns.load_dataset('planets')
planets.head()
with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=2,
kind="count", color='steelblue')
g.set_xticklabels(step=5)
with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',
hue='method', order=range(2001, 2015))
g.set_ylabels('Number of Planets Discovered')
# !curl -O https://raw.githubusercontent.com/jakevdp/marathon-data/master/marathon-data.csv
data = pd.read_csv('marathon-data.csv')
data.head()
object
); we can see this by looking at the dtypes
attribute of the DataFrame:data.dtypes
import datetime
def convert_time(s):
h, m, s = map(int, s.split(':'))
return datetime.timedelta(hours=h, minutes=m, seconds=s)
data = pd.read_csv('marathon-data.csv',
converters={'split':convert_time, 'final':convert_time})
data.head()
data.dtypes