Series
structures and two-dimensional DataFrame
structures.Series
and DataFrame
objects.
Let's start by defining a simple Series
and DataFrame
on which to demonstrate this:import pandas as pd
import numpy as np
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
columns=['A', 'B', 'C', 'D'])
df
np.exp(ser)
np.sin(df * np.pi / 4)
Series
or DataFrame
objects, Pandas will align indices in the process of performing the operation.
This is very convenient when working with incomplete data, as we'll see in some of the examples that follow.area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
'New York': 19651127}, name='population')
population / area
area.index | population.index
NaN
, or "Not a Number," which is how Pandas marks missing data (see further discussion of missing data in Handling Missing Data).
This index matching is implemented this way for any of Python's built-in arithmetic expressions; any missing values are filled in with NaN by default:A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B
A.add(B)
is equivalent to calling A + B
, but allows optional explicit specification of the fill value for any elements in A
or B
that might be missing:A.add(B, fill_value=0)
DataFrame
s:A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
columns=list('AB'))
A
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
columns=list('BAC'))
B
A + B
Series
, we can use the associated object's arithmetic method and pass any desired fill_value
to be used in place of missing entries.
Here we'll fill with the mean of all values in A
(computed by first stacking the rows of A
):fill = A.stack().mean()
A.add(B, fill_value=fill)
Python Operator | Pandas Method(s) |
---|---|
+ | add() |
- | sub() , subtract() |
* | mul() , multiply() |
/ | truediv() , div() , divide() |
// | floordiv() |
% | mod() |
** | pow() |
DataFrame
and a Series
, the index and column alignment is similarly maintained.
Operations between a DataFrame
and a Series
are similar to operations between a two-dimensional and one-dimensional NumPy array.
Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:A = rng.randint(10, size=(3, 4))
A
A - A[0]
df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]
axis
keyword:df.subtract(df['R'], axis=0)
DataFrame
/Series
operations, like the operations discussed above, will automatically align indices between the two elements:halfrow = df.iloc[0, ::2]
halfrow
df - halfrow