59か月前公開・59か月前更新・0 pv・5 min read

02 Introduction to NumPy

IPythonData sciencePandasNumpyscikit-learnpythonO'ReillyMatplotlib

https://picsum.photos/seed/b5054bd81a144be49a6e0ed71da97276/1200/630

This chapter, along with chapter 3, outlines techniques for effectively loading, storing, and manipulating in-memory data in Python. The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else. Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, images–particularly digital images–can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area. Sound clips can be thought of as one-dimensional arrays of intensity versus time. Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words. No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers. (We will discuss some specific examples of this process later in Feature Engineering)

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science. We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package, and the Pandas package (discussed in Chapter 3).

This chapter will cover NumPy in detail. NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

If you followed the advice outlined in the Preface and installed the Anaconda stack, you already have NumPy installed and ready to go. If you're more the do-it-yourself type, you can go to http://www.numpy.org/ and follow the installation instructions found there. Once you do, you can import NumPy and double-check the version:

import numpy
numpy.__version__

'1.11.1'

For the pieces of the package discussed here, I'd recommend NumPy version 1.8 or later. By convention, you'll find that most people in the SciPy/PyData world will import NumPy using np as an alias:

import numpy as np

Ads

株式会社ジェネスティコンサルティング

【求人】エンジニア×事業家として、自己実現に貪欲に。描けるキャリアは1つじゃない！

Ads

民泊紹介

【九十九里浜】〜大人も子どもも楽しめる秘密基地を作りました〜Commutty IT運営会社の民泊物件を紹介！

Discussion

コメントにはログインが必要です。

Ads

株式会社ジェネスティコンサルティング

【求人】仕事も人生も、もっと自由に、面白く。を本気で叶えたいエンジニア募集！