Here we've switched the contents of x
from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintented consequences:
/* C code */
int x = 4;
x = "four"; // FAILS
This sort of flexibility is one piece that makes Python and other dynamically-typed languages convenient and easy to use.
Understanding how this works is an important piece of learning to analyze data efficiently and effectively with Python.
But what this type-flexibility also points to is the fact that Python variables are more than just their value; they also contain extra information about the type of the value. We'll explore this more in the sections that follow.
A Python Integer Is More Than Just an Integer
The standard Python implementation is written in C.
This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as x = 10000
, x
is not just a "raw" integer. It's actually a pointer to a compound C structure, which contains several values.
Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):
struct _longobject {
long ob_refcnt;
PyTypeObject *ob_type;
size_t ob_size;
long ob_digit[1];
};
A single integer in Python 3.4 actually contains four pieces:
ob_refcnt
, a reference count that helps Python silently handle memory allocation and deallocation
ob_type
, which encodes the type of the variable
ob_size
, which specifies the size of the following data members
ob_digit
, which contains the actual integer value that we expect the Python variable to represent.
This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:
Here PyObject_HEAD
is the part of the structure containing the reference count, type code, and other pieces mentioned before.
Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value.
A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.
This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically.
All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.
A Python List Is More Than Just a List
Let's consider now what happens when we use a Python data structure that holds many Python objects.
The standard mutable multi-element container in Python is the list.
We can create a list of integers as follows: