Week 4: Scientific Computing in Python¶

This week, you will learn how to do scientific computing in Python. As we learned from first lecture, Numpy is a fundamental package for scientific computing.

The goal of this assignment is to gain comfort creating, visualizating, and computing with numpy array. By the end of the assignment, you should feel comfortable:

Creating new Numpy arrays using linspace and arange
Computing basic formulas with Numpy arrays
Performing reductions (e.g. mean, std on numpy arrays)
Making 1D line plots

1. Importing and Examining a New Package¶

This will be our first experience with importing a package which is not part of the Python standard library.

In [1]:

Copied!

import numpy as np
import numpy as np

In [4]:

Copied!

# find out what version we have
np.__version__
# find out what version we have
np.__version__

Out[4]:

'2.0.0'

In [ ]:

Copied!

# find out what is in our namespace
dir()
# find out what is in our namespace
dir()

In [ ]:

Copied!

# find out what's in numpy
dir(np)
# find out what's in numpy
dir(np)

The numpy documentation is crucial!

http://docs.scipy.org/doc/numpy/reference/

2. NDArrays¶

The core class is the numpy ndarray (n-dimensional array). The n-dimensional array object in NumPy is referred to as an ndarray, a multidimensional container of homogeneous items – i.e. all values in the array are the same type and size. These arrays can be one-dimensional (one row or column vector), two-dimensional (m rows x n columns), or three-dimensional (arrays within arrays).

The main difference between a numpy array an a more general data container such as list are the following:

Numpy arrays can have multiple dimensions (while lists, tuples, etc. only have 1)
Numpy arrays hold values of the same datatype (e.g. int, float), while lists can contain anything.
Numpy optimizes numerical operations on arrays. Numpy is fast!

In [14]:

Copied!

from IPython.display import Image
Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')
from IPython.display import Image
Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')

Out[14]:

No description has been provided for this image

2.1 Create array from a list¶

In [7]:

Copied!

# create an array from a list
a = np.array([9,0,2,1,0])
# create an array from a list
a = np.array([9,0,2,1,0])

In [8]:

Copied!

# find out the datatype
a.dtype
# find out the datatype
a.dtype

Out[8]:

dtype('int64')

In [9]:

Copied!

# find out the shape
a.shape
# find out the shape
a.shape

Out[9]:

(5,)

In [10]:

Copied!

# what is the shape
type(a.shape)
# what is the shape
type(a.shape)

Out[10]:

tuple

In [11]:

Copied!

# another array with a different datatype and shape
b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)
# another array with a different datatype and shape
b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)

In [12]:

Copied!

# array with 3 rows x 4 columns
a_2d = np.array([[3,2,0,1],[9,1,8,7],[4,0,1,6]]) 
a_2d
# array with 3 rows x 4 columns
a_2d = np.array([[3,2,0,1],[9,1,8,7],[4,0,1,6]]) 
a_2d

Out[12]:

array([[3, 2, 0, 1],
       [9, 1, 8, 7],
       [4, 0, 1, 6]])

In [13]:

Copied!

# check dtype and shape
b.dtype, b.shape
# check dtype and shape
b.dtype, b.shape

Out[13]:

(dtype('float64'), (2, 4))

Important Concept: The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called "c-style" indexing)

2.2 Create arrays using functions¶

In [73]:

Copied!





# create some uniform arrays
c = np.zeros((9,9))
d = np.ones((3,6,3), dtype=np.complex128)
e = np.full((3,3), np.pi)
e = np.ones_like(c)
f = np.zeros_like(d)
# Random arrays
g = np.random.rand(3,4)
# create some uniform arrays
c = np.zeros((9,9))
d = np.ones((3,6,3), dtype=np.complex128)
e = np.full((3,3), np.pi)
e = np.ones_like(c)
f = np.zeros_like(d)
# Random arrays
g = np.random.rand(3,4)

The np.arange() function is used to generate an array with evenly spaced values within a given interval. np.arange() can be used with one, two, or three parameters to specify the start, stop, and step values. If only one value is passed to the function, it will be interpreted as the stop value:

In [16]:

Copied!

# create some ranges
np.arange(10)
# create some ranges
np.arange(10)

Out[16]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:

Copied!

# arange is left inclusive, right exclusive
np.arange(2,4,0.25)
# arange is left inclusive, right exclusive
np.arange(2,4,0.25)

Out[17]:

array([2.  , 2.25, 2.5 , 2.75, 3.  , 3.25, 3.5 , 3.75])

Similarly, the np.linspace() function is used to construct an array with evenly spaced numbers over a given interval. However, instead of the step parameter, np.linspace() takes a num parameter to specify the number of samples within the given interval:

In [18]:

Copied!

# linearly spaced
np.linspace(2,4,20)
# linearly spaced
np.linspace(2,4,20)

Out[18]:

array([2.        , 2.10526316, 2.21052632, 2.31578947, 2.42105263,
       2.52631579, 2.63157895, 2.73684211, 2.84210526, 2.94736842,
       3.05263158, 3.15789474, 3.26315789, 3.36842105, 3.47368421,
       3.57894737, 3.68421053, 3.78947368, 3.89473684, 4.        ])

Note that unlike np.arange(), np.linspace() includes the stop value by default (this can be changed by passing endpoint=True). Finally, it should be noted that while we could have used np.arange() to generate the same array in the above example, it is recommended to use np.linspace() when a non-integer step (e.g. 0.25) is desired.

In [19]:

Copied!

np.linspace(2,4,20, endpoint = False)
np.linspace(2,4,20, endpoint = False)

Out[19]:

array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2,
       3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9])

Exercise 1: Create a 1D array ranging from 0 to 100 (including 100) with an interval of 5.¶

In [ ]:

2.3 Create two-dimensional grids¶

In [20]:

Copied!

x = np.linspace(-4, 4, 9)
x
x = np.linspace(-4, 4, 9)
x

Out[20]:

array([-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])

In [21]:

Copied!

y = np.linspace(-5, 5, 11)
y
y = np.linspace(-5, 5, 11)
y

Out[21]:

array([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.,  5.])

In [22]:

Copied!

x_2d, y_2d = np.meshgrid(x, y)
 
x_2d, y_2d = np.meshgrid(x, y)

In [23]:

Copied!

x_2d
x_2d

Out[23]:

array([[-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.],
       [-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.]])

In [24]:

Copied!

y_2d
y_2d

Out[24]:

array([[-5., -5., -5., -5., -5., -5., -5., -5., -5.],
       [-4., -4., -4., -4., -4., -4., -4., -4., -4.],
       [-3., -3., -3., -3., -3., -3., -3., -3., -3.],
       [-2., -2., -2., -2., -2., -2., -2., -2., -2.],
       [-1., -1., -1., -1., -1., -1., -1., -1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.]])

Exercise 3: Explain what the meshgrid function does. What is the difference between `x_2d` and `y_2d`?¶

3. Indexing in Numpy¶

Indexing in NumPy allows you to access and modify elements, rows, columns, or subarrays of an array. Basic indexing is similar to lists.

In [25]:

Copied!

# get some individual elements of xx
x_2d[0,0], x_2d[-1,-1], x_2d[3,-5]
# get some individual elements of xx
x_2d[0,0], x_2d[-1,-1], x_2d[3,-5]

Out[25]:

(np.float64(-4.0), np.float64(4.0), np.float64(0.0))

In [26]:

Copied!

# get some whole rows and columns
x_2d[0].shape, x_2d[:,-1].shape
# get some whole rows and columns
x_2d[0].shape, x_2d[:,-1].shape

Out[26]:

((9,), (11,))

In [27]:

Copied!

# get some ranges
x_2d[3:10,3:5].shape
# get some ranges
x_2d[3:10,3:5].shape

Out[27]:

(7, 2)

There are many advanced ways to index arrays. You can read about them in the manual. Here is one example.

In [28]:

Copied!

# use a boolean array as an index
idx = x_2d<0
x_2d[idx].shape
# use a boolean array as an index
idx = x_2d<0
x_2d[idx].shape

Out[28]:

(44,)

Exercise 4: Get the last two columns of `y_2d` array.¶

In [ ]:

4. Array Operations¶

There are a huge number of operations available on arrays. All the familiar arithemtic operators are applied on an element-by-element basis.

4.1 Basic Math¶

In [32]:

Copied!





# two dimensional grids
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.linspace(-np.pi, np.pi, 50)
xx, yy = np.meshgrid(x, y)
xx.shape, yy.shape
# two dimensional grids
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.linspace(-np.pi, np.pi, 50)
xx, yy = np.meshgrid(x, y)
xx.shape, yy.shape

Out[32]:

((50, 100), (50, 100))

In [33]:

Copied!

f = np.sin(xx) * np.cos(0.5*yy)
f = np.sin(xx) * np.cos(0.5*yy)

Visualizing Arrays with Matplotlib¶

It can be hard to work with big arrays without actually seeing anything with our eyes! We will now bring in Matplotib to start visualizating these arrays. For now we will just skim the surface of Matplotlib. Much more depth will be provided in the next chapter.

In [34]:

Copied!

from matplotlib import pyplot as plt
from matplotlib import pyplot as plt

In [35]:

Copied!

plt.pcolormesh(f)
plt.pcolormesh(f)

Out[35]:

<matplotlib.collections.QuadMesh at 0x117fdee80>

4.2 Manipulating array dimensions¶

In [36]:

Copied!

# transpose
plt.pcolormesh(f.T)
# transpose
plt.pcolormesh(f.T)

Out[36]:

<matplotlib.collections.QuadMesh at 0x12624e400>

In [37]:

Copied!

f.shape
f.shape

Out[37]:

(50, 100)

In [38]:

Copied!

np.tile(f,(6,1)).shape
np.tile(f,(6,1)).shape

Out[38]:

(300, 100)

In [39]:

Copied!

# tile an array
plt.pcolormesh(np.tile(f,(6,1)))
# tile an array
plt.pcolormesh(np.tile(f,(6,1)))

Out[39]:

<matplotlib.collections.QuadMesh at 0x1262bbeb0>

4.3 Broadcasting¶

Broadcasting is an efficient way to multiply arrays of different sizes

In [44]:

Copied!

from IPython.display import Image
Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',
     width=720)
from IPython.display import Image
Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',
     width=720)

Out[44]:

In [45]:

Copied!





# multiply f by x
print(f.shape, x.shape)
g = f * x
print(g.shape)
# multiply f by x
print(f.shape, x.shape)
g = f * x
print(g.shape)

(50, 100) (100,)
(50, 100)

In [46]:

Copied!





# multiply f by y
print(f.shape, y.shape)
h = f * y
print(h.shape)
# multiply f by y
print(f.shape, y.shape)
h = f * y
print(h.shape)

(50, 100) (50,)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[46], line 3
      1 # multiply f by y
      2 print(f.shape, y.shape)
----> 3 h = f * y
      4 print(h.shape)

ValueError: operands could not be broadcast together with shapes (50,100) (50,)

In [69]:

Copied!





# use newaxis special syntax
y_new = y[:,np.newaxis]
h = f * y_new
print(h.shape)
# use newaxis special syntax
y_new = y[:,np.newaxis]
h = f * y_new
print(h.shape)

(50, 100)

Exercise 5: What is the difference between y and y_new? Why `f * y` gives an error, but `f * y_new` doesn't?¶

4.4 Reduction Operations¶

In [48]:

Copied!

# sum
g.sum()
# sum
g.sum()

Out[48]:

np.float64(-3083.038387807155)

In [49]:

Copied!

# mean
g.mean()
# mean
g.mean()

Out[49]:

np.float64(-0.616607677561431)

In [50]:

Copied!

# std
g.std()
# std
g.std()

Out[50]:

np.float64(1.6402280119141424)

In [51]:

Copied!

# apply on just one axis

# Mean of each row (calculated across columns)
g_xmean = g.mean(axis=1)

# Mean of each column (calculated across rows)

g_ymean = g.mean(axis=0)
# apply on just one axis

# Mean of each row (calculated across columns)
g_xmean = g.mean(axis=1)

# Mean of each column (calculated across rows)

g_ymean = g.mean(axis=0)

In [52]:

Copied!

plt.plot(x, g_ymean)
plt.plot(x, g_ymean)

Out[52]:

[<matplotlib.lines.Line2D at 0x1265e17f0>]

In [53]:

Copied!

plt.plot(g_xmean, y)
plt.plot(g_xmean, y)

Out[53]:

[<matplotlib.lines.Line2D at 0x1266d3eb0>]

5. Missing data¶

Most real-world datasets – environmental or otherwise – have data gaps. Data can be missing for any number of reasons, including observations not being recorded or data corruption. While a cell corresponding to a data gap may just be left blank in a spreadsheet, when imported into Python, there must be some way to handle "blank" or missing values.

Missing data should not be replaced with zeros, as 0 can be a valid value for many datasets, (e.g. temperature, precipitation, etc.). Instead, the convention is to fill all missing data with the constant NaN. NaN stands for "Not a Number" and is implemented in NumPy as np.nan.

NaNs are handled differently by different packages. In NumPy, all computations involving NaN values will return nan:

In [70]:

Copied!

data = np.array([[2.,2.7,1.89],
                 [1.1, 0.0, np.nan],
                 [3.2, 0.74, 2.1]])
data = np.array([[2.,2.7,1.89],
                 [1.1, 0.0, np.nan],
                 [3.2, 0.74, 2.1]])

In [71]:

Copied!

np.mean(data)
np.mean(data)

Out[71]:

np.float64(nan)

In [72]:

Copied!

np.nanmean(data)
np.nanmean(data)

Out[72]:

np.float64(1.71625)

6. Assignment¶

First import numpy and matplotlib

In [ ]:

6.1. Create two 2D arrays representing coordinates x, y¶

Both should cover the range (-2, 2) and have 100 points in each direction

In [ ]:

6.2. Visualize each 2D array using `pcolormesh`¶

Use the correct coordiantes for the x and y axes.

In [ ]:

6.3 From your cartesian coordinates, create polar coordinates $r$ and $\varphi$:¶

$r = \sqrt{x^2 + y^2}$

$\varphi = atan2(y,x)$

You will need to use numpy's arctan2 function. Read its documentation.

In [ ]:

6.4. Visualize $r$ and $\varphi$ on the 2D $x$ / $y$ plane using `pcolormesh`¶

In [ ]:

6.5 Caclulate the quanity $f = \cos^2(4r) + \sin^2(4\varphi)$¶

And plot it on the $x$ / $y$ plane

In [ ]:

6.6 Plot the mean of f with respect to the x axis¶

as a function of y

In [ ]:

6.7 Plot the mean of f with respect to the y axis¶

as a function of x

In [ ]:

Week 4: Scientific Computing in Python¶

1. Importing and Examining a New Package¶

2. NDArrays¶

2.1 Create array from a list¶

2.2 Create arrays using functions¶

Exercise 1: Create a 1D array ranging from 0 to 100 (including 100) with an interval of 5.¶

2.3 Create two-dimensional grids¶

Exercise 3: Explain what the meshgrid function does. What is the difference between x_2d and y_2d?¶

3. Indexing in Numpy¶

Exercise 4: Get the last two columns of y_2d array.¶

4. Array Operations¶

4.1 Basic Math¶

Visualizing Arrays with Matplotlib¶

4.2 Manipulating array dimensions¶

4.3 Broadcasting¶

Exercise 5: What is the difference between y and y_new? Why f * y gives an error, but f * y_new doesn't?¶

4.4 Reduction Operations¶

5. Missing data¶

6. Assignment¶

6.1. Create two 2D arrays representing coordinates x, y¶

6.2. Visualize each 2D array using pcolormesh¶

6.3 From your cartesian coordinates, create polar coordinates $r$ and $\varphi$:¶

6.4. Visualize $r$ and $\varphi$ on the 2D $x$ / $y$ plane using pcolormesh¶

6.5 Caclulate the quanity $f = \cos^2(4r) + \sin^2(4\varphi)$¶

6.6 Plot the mean of f with respect to the x axis¶

6.7 Plot the mean of f with respect to the y axis¶

Exercise 3: Explain what the meshgrid function does. What is the difference between `x_2d` and `y_2d`?¶

Exercise 4: Get the last two columns of `y_2d` array.¶

Exercise 5: What is the difference between y and y_new? Why `f * y` gives an error, but `f * y_new` doesn't?¶

6.2. Visualize each 2D array using `pcolormesh`¶

6.4. Visualize $r$ and $\varphi$ on the 2D $x$ / $y$ plane using `pcolormesh`¶