Introduction to NumPy (Part-II)

NumPy is a Python library used for working with arrays. It also has functions for working in the domain of linear algebra, Fourier transform, and matrices. That is what you will hear from most people. Although NumPy is an essential package to do mathematical computations, some people are still out of the loop as to how it can be used. This blog aims to clarify how you can make the most out of Numpy. In continuation of the last blog,

Today we will be extending our discussion to-

  • Memory layout of ndarray
  • Views and copies
  • Vectorized operations
  • Universal functions
  • Broadcasting
  • Boolean mask
  • dates and time in NumPy

Memory Layout of ndarray

numpy.ndarray object has an interesting attribute, flags. The flags attribute holds information about the memory layout of the array.

>>> import numpy as np
>>> arr=np.array([1,2,3,4,5,6,7,8,9,10])
>>> arr
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> arr.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False

The C_CONTIGUOUS the field in the output indicates whether the array was a C-style array. This means that the indexing of this array is done like a C array. This is also called row-major indexing in the case of 2D arrays. This means that, when moving through the array, the row index is incremented first, and then the column index is incremented.

Array flags provide information about how the memory area used for the array is to be interpreted. There are 7 Boolean flags in use, only four of which can be changed by the user: WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED, via direct assignment to the attribute or dictionary entry, or by callingndarray.setflags.

>>> arr.setflags(write=0)
>>> arr.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : False
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False

ndarray.shape returns the shape of your ndarray as a tuple.

>>> arr.shape
(10,)

ndarray.reshape is an attribute that shapes an array without changing its data.

>>> arr.reshape(2,5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])

Well, I guess, very few of us have ever wondered why ndarray.shape returns (m,) and (m,1). It makes the matrix multiplication more tedious and to reduce redundancies, explicit reshape is required.

>>> arr.reshape(10,)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> arr.reshape(10,1)
array([[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10]])

(m,) means that the array is indexed from 0 to m-1.

(m,1) means that the array is indexed by two indices, the first of which runs from 0 to m-1, and the second index is always 0.

ndarray.strides tell us how many bytes we have to skip in memory to move to the next position along a certain axis.

>>> x = np.array([1,2,3,4,5,6,7,8,9], dtype='int32')
>>> x.strides
(4,)
>>> x = np.array([1,2,3,4,5,6,7,8,9], dtype='float')
>>> x.strides
(8,)

There are many more attributes that are helpful in retaining information about the memory layout of an array. Like ndarray.ndim which returns the dimension of an array.

View and Copy

>>> import numpy as np
>>> arr=np.array([1,2,3,4,5,6,7,8,9,10])
>>> arr
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) #ORIGINAL ARRAY

The normal assignment doesn’t generate a new array. It uses the same ID and has the same shape. Changes made in one will be directly reflected in another one.

>>> arr2=arr
>>> print("ID of original array:",id(arr))
ID of original array: 2084401527824
>>> print("ID of assigned array:",id(arr2))
ID of assigned array: 2084401527824

The view is also known as a shallow copy in NumPy. Just like window shopping, here the view also just creates a view of the original array. Both arrays will have the different ID and changes made in view will affect the original array.

>>> arr3=arr.view()
>>> arr3
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> print("ID of original array:",id(arr))
ID of original array: 2084401527824
>>> print("ID of viewed array:",id(arr3))
ID of viewed array: 2084401527744

Deep copy or generating a new array with a copy(). Changes made in the new array doesn’t affect the original array. The original array will remain unchanged.

>>> arr4=arr.copy()
>>> arr4
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> print("ID of copied array:",id(arr4))
ID of copied array: 2084652207520
>>> arr4[4]=20
>>> arr4
array([ 1, 2, 3, 4, 20, 6, 7, 8, 9, 10])
>>> arr
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])#ORIGINAL ARRAY

Vectorized operations

The image says more than I could add words to it. Let’s see some example now-

First import timeit

>>> a = [random.randint(1, 100) for _ in range(1000000)] 
>>> b = [random.randint(1, 100) for _ in range(1000000)]
>>> %timeit res = [x * y for x, y in zip(a, b)]
63.7 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit is a magic command in the IPython session to measure the execution time.

>>> import numpy as np 
>>> a = np.random.randint(1, 100, 1000000)
>>> b = np.random.randint(1, 100, 1000000)
>>> %timeit a * b
1.88 ms ± 5.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you could notice here, execution time is decreased to 1.88ms, which is nearly 3% of the time taken by pure Python code. It reduces implementation and fastens execution.

In general, it usually pays off when compared to the enormous waiting time that you may need when doing large-scale calculations inefficiently.

Universal Functions

It has a number of mathematical functions which will help in easy implementation and reduce time complexity.

Broadcasting

Broadcasting is used throughout NumPy to decide how to handle disparately shaped arrays; for example, all arithmetic operations (+, -, *, …) between ndarrays broadcast the arrays before an operation.

Boolean Mask

>>> import numpy as np
>>> arr=np.array([1,2,3,4,5,6,7,8,9])
>>> arr
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> arr>8
array([False, False, False, False, False, False, False, False, True])

Dates and Time in NumPy

>>> import numpy as np
>>> yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
>>> print("Yestraday: ",yesterday)
Yestraday: 2021-01-05
>>> today = np.datetime64('today', 'D')
>>> print("Today: ",today)
Today: 2021-01-06
>>> tomorrow = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
>>> print("Tomorrow: ",tomorrow)
Tomorrow: 2021-01-07

That’s all in this blog. Thank you for spending your time reading it. :)

3rd year CSE student at IIITKALYANI , enthusiastic learner and explorer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store