
import numpy as np
import pandas as pd
# use pandas to extract rainfall inches as a NumPy array
rainfall = pd.read_csv('data/Seattle2014.csv')['PRCP'].values
inches = rainfall / 254.0 # 1/10mm -> inches
inches.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # set plot styles
plt.hist(inches, 40);
+, -, *, /, and others on arrays leads to element-wise operations.
NumPy also implements comparison operators such as < (less than) and > (greater than) as element-wise ufuncs.
The result of these comparison operators is always an array with a Boolean data type.
All six of the standard comparison operations are available:
x = np.array([1, 2, 3, 4, 5])
x < 3 # less than
x > 3 # greater than
x <= 3 # less than or equal
x >= 3 # greater than or equal
x != 3 # not equal
x == 3 # equal
(2 * x) == (x ** 2)x < 3, internally NumPy uses np.less(x, 3).
A summary of the comparison operators and their equivalent ufunc is shown here:== |np.equal ||!= |np.not_equal |
|< |np.less ||<= |np.less_equal |
|> |np.greater ||>= |np.greater_equal |
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x
x < 6x, the two-dimensional array we created earlier.
print(x)True entries in a Boolean array, np.count_nonzero is useful:
# how many values less than 6?
np.count_nonzero(x < 6)np.sum; in this case, False is interpreted as 0, and True is interpreted as 1:
np.sum(x < 6)sum() is that like with other NumPy aggregation functions, this summation can be done along rows or columns as well:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)np.any or np.all:
# are there any values greater than 8?
np.any(x > 8)
# are there any values less than zero?
np.any(x < 0)
# are all values less than 10?
np.all(x < 10)
# are all values equal to 6?
np.all(x == 6)np.all and np.any can be used along particular axes as well. For example:
# are all values in each row less than 8?
np.all(x < 8, axis=1)sum(), any(), and all() functions. These have a different syntax than the NumPy versions, and in particular will fail or produce unintended results when used on multidimensional arrays. Be sure that you are using np.sum(), np.any(), and np.all() for these examples!&, |, ^, and ~.
Like with the standard arithmetic operators, NumPy overloads these as ufuncs which work element-wise on (usually Boolean) arrays.
np.sum((inches > 0.5) & (inches < 1))inches > (0.5 & inches) < 1
np.sum(~( (inches <= 0.5) | (inches >= 1) ))& |np.bitwise_and ||| |np.bitwise_or |
|^ |np.bitwise_xor ||~ |np.bitwise_not |
print("Number days without rain: ", np.sum(inches == 0))
print("Number days with rain: ", np.sum(inches != 0))
print("Days with more than 0.5 inches:", np.sum(inches > 0.5))
print("Rainy days with < 0.2 inches :", np.sum((inches > 0) &
(inches < 0.2)))x array from before, suppose we want an array of all values in the array that are less than, say, 5:
x
x < 5
x[x < 5]True.
# construct a mask of all rainy days
rainy = (inches > 0)
# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)
print("Median precip on rainy days in 2014 (inches): ",
np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches): ",
np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
np.median(inches[rainy & ~summer]))and and or on one hand, and the operators & and | on the other hand.
When would you use one versus the other?and and or gauge the truth or falsehood of entire object, while & and | refer to bits within each object.and or or, it's equivalent to asking Python to treat the object as a single Boolean entity.
In Python, all nonzero integers will evaluate as True. Thus:
bool(42), bool(0)
bool(42 and 0)
bool(42 or 0)& and | on integers, the expression operates on the bits of the element, applying the and or the or to the individual bits making up the number:
bin(42)
bin(59)
bin(42 & 59)
bin(42 | 59)1 = True and 0 = False, and the result of & and | operates similarly to above:
A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | Bor on these arrays will try to evaluate the truth or falsehood of the entire array object, which is not a well-defined value:
A or BValueError Traceback (most recent call last)
<ipython-input-38-5d8e4f2e21c0> in <module>()
----> 1 A or B
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
| or & rather than or or and:
x = np.arange(10)
(x > 4) & (x < 8)ValueError we saw previously:
(x > 4) and (x < 8)