import numpy as np
import pandas as pd
# use pandas to extract rainfall inches as a NumPy array
rainfall = pd.read_csv('data/Seattle2014.csv')['PRCP'].values
inches = rainfall / 254.0 # 1/10mm -> inches
inches.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # set plot styles
plt.hist(inches, 40);
+
, -
, *
, /
, and others on arrays leads to element-wise operations.
NumPy also implements comparison operators such as <
(less than) and >
(greater than) as element-wise ufuncs.
The result of these comparison operators is always an array with a Boolean data type.
All six of the standard comparison operations are available:x = np.array([1, 2, 3, 4, 5])
x < 3 # less than
x > 3 # greater than
x <= 3 # less than or equal
x >= 3 # greater than or equal
x != 3 # not equal
x == 3 # equal
(2 * x) == (x ** 2)
x < 3
, internally NumPy uses np.less(x, 3)
.
A summary of the comparison operators and their equivalent ufunc is shown here:==
|np.equal
||!=
|np.not_equal
|
|<
|np.less
||<=
|np.less_equal
|
|>
|np.greater
||>=
|np.greater_equal
|rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x
x < 6
x
, the two-dimensional array we created earlier.print(x)
True
entries in a Boolean array, np.count_nonzero
is useful:# how many values less than 6?
np.count_nonzero(x < 6)
np.sum
; in this case, False
is interpreted as 0
, and True
is interpreted as 1
:np.sum(x < 6)
sum()
is that like with other NumPy aggregation functions, this summation can be done along rows or columns as well:# how many values less than 6 in each row?
np.sum(x < 6, axis=1)
np.any
or np.all
:# are there any values greater than 8?
np.any(x > 8)
# are there any values less than zero?
np.any(x < 0)
# are all values less than 10?
np.all(x < 10)
# are all values equal to 6?
np.all(x == 6)
np.all
and np.any
can be used along particular axes as well. For example:# are all values in each row less than 8?
np.all(x < 8, axis=1)
sum()
, any()
, and all()
functions. These have a different syntax than the NumPy versions, and in particular will fail or produce unintended results when used on multidimensional arrays. Be sure that you are using np.sum()
, np.any()
, and np.all()
for these examples!&
, |
, ^
, and ~
.
Like with the standard arithmetic operators, NumPy overloads these as ufuncs which work element-wise on (usually Boolean) arrays.np.sum((inches > 0.5) & (inches < 1))
inches > (0.5 & inches) < 1
np.sum(~( (inches <= 0.5) | (inches >= 1) ))
&
|np.bitwise_and
||| |np.bitwise_or
|
|^
|np.bitwise_xor
||~
|np.bitwise_not
|print("Number days without rain: ", np.sum(inches == 0))
print("Number days with rain: ", np.sum(inches != 0))
print("Days with more than 0.5 inches:", np.sum(inches > 0.5))
print("Rainy days with < 0.2 inches :", np.sum((inches > 0) &
(inches < 0.2)))
x
array from before, suppose we want an array of all values in the array that are less than, say, 5:x
x < 5
x[x < 5]
True
.# construct a mask of all rainy days
rainy = (inches > 0)
# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)
print("Median precip on rainy days in 2014 (inches): ",
np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches): ",
np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
np.median(inches[rainy & ~summer]))
and
and or
on one hand, and the operators &
and |
on the other hand.
When would you use one versus the other?and
and or
gauge the truth or falsehood of entire object, while &
and |
refer to bits within each object.and
or or
, it's equivalent to asking Python to treat the object as a single Boolean entity.
In Python, all nonzero integers will evaluate as True. Thus:bool(42), bool(0)
bool(42 and 0)
bool(42 or 0)
&
and |
on integers, the expression operates on the bits of the element, applying the and or the or to the individual bits making up the number:bin(42)
bin(59)
bin(42 & 59)
bin(42 | 59)
1 = True
and 0 = False
, and the result of &
and |
operates similarly to above:A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | B
or
on these arrays will try to evaluate the truth or falsehood of the entire array object, which is not a well-defined value:A or B
ValueError Traceback (most recent call last)
<ipython-input-38-5d8e4f2e21c0> in <module>()
----> 1 A or B
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
|
or &
rather than or
or and
:x = np.arange(10)
(x > 4) & (x < 8)
ValueError
we saw previously:(x > 4) and (x < 8)