Week 4: Scientific Computing in Python¶
This week, you will learn how to do scientific computing in Python. As we learned from first lecture, Numpy is a fundamental package for scientific computing.
The goal of this assignment is to gain comfort creating, visualizating, and computing with numpy array. By the end of the assignment, you should feel comfortable:
- Creating new Numpy arrays using
linspace
andarange
- Computing basic formulas with Numpy arrays
- Performing reductions (e.g.
mean
,std
on numpy arrays) - Making 1D line plots
1. Importing and Examining a New Package¶
This will be our first experience with importing a package which is not part of the Python standard library.
import numpy as np
# find out what version we have
np.__version__
'2.0.0'
# find out what is in our namespace
dir()
# find out what's in numpy
dir(np)
The numpy documentation is crucial!
2. NDArrays¶
The core class is the numpy ndarray (n-dimensional array). The n-dimensional array object in NumPy is referred to as an ndarray, a multidimensional container of homogeneous items – i.e. all values in the array are the same type and size. These arrays can be one-dimensional (one row or column vector), two-dimensional (m rows x n columns), or three-dimensional (arrays within arrays).
The main difference between a numpy array an a more general data container such as list
are the following:
- Numpy arrays can have multiple dimensions (while lists, tuples, etc. only have 1)
- Numpy arrays hold values of the same datatype (e.g.
int
,float
), whilelists
can contain anything. - Numpy optimizes numerical operations on arrays. Numpy is fast!
from IPython.display import Image
Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')
![No description has been provided for this image](http://docs.scipy.org/doc/numpy/_images/threefundamental.png)
2.1 Create array from a list¶
# create an array from a list
a = np.array([9,0,2,1,0])
# find out the datatype
a.dtype
dtype('int64')
# find out the shape
a.shape
(5,)
# what is the shape
type(a.shape)
tuple
# another array with a different datatype and shape
b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)
# array with 3 rows x 4 columns
a_2d = np.array([[3,2,0,1],[9,1,8,7],[4,0,1,6]])
a_2d
array([[3, 2, 0, 1], [9, 1, 8, 7], [4, 0, 1, 6]])
# check dtype and shape
b.dtype, b.shape
(dtype('float64'), (2, 4))
Important Concept: The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called "c-style" indexing)
2.2 Create arrays using functions¶
# create some uniform arrays
c = np.zeros((9,9))
d = np.ones((3,6,3), dtype=np.complex128)
e = np.full((3,3), np.pi)
e = np.ones_like(c)
f = np.zeros_like(d)
# Random arrays
g = np.random.rand(3,4)
The np.arange()
function is used to generate an array with evenly spaced values within a given interval. np.arange()
can be used with one, two, or three parameters to specify the start, stop, and step values. If only one value is passed to the function, it will be interpreted as the stop value:
# create some ranges
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# arange is left inclusive, right exclusive
np.arange(2,4,0.25)
array([2. , 2.25, 2.5 , 2.75, 3. , 3.25, 3.5 , 3.75])
Similarly, the np.linspace()
function is used to construct an array with evenly spaced numbers over a given interval. However, instead of the step parameter, np.linspace()
takes a num parameter to specify the number of samples within the given interval:
# linearly spaced
np.linspace(2,4,20)
array([2. , 2.10526316, 2.21052632, 2.31578947, 2.42105263, 2.52631579, 2.63157895, 2.73684211, 2.84210526, 2.94736842, 3.05263158, 3.15789474, 3.26315789, 3.36842105, 3.47368421, 3.57894737, 3.68421053, 3.78947368, 3.89473684, 4. ])
Note that unlike np.arange()
, np.linspace()
includes the stop value by default (this can be changed by passing endpoint=True
). Finally, it should be noted that while we could have used np.arange()
to generate the same array in the above example, it is recommended to use np.linspace()
when a non-integer step (e.g. 0.25) is desired.
np.linspace(2,4,20, endpoint = False)
array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9])
Exercise 1: Create a 1D array ranging from 0 to 100 (including 100) with an interval of 5.¶
2.3 Create two-dimensional grids¶
x = np.linspace(-4, 4, 9)
x
array([-4., -3., -2., -1., 0., 1., 2., 3., 4.])
y = np.linspace(-5, 5, 11)
y
array([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
x_2d, y_2d = np.meshgrid(x, y)
x_2d
array([[-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.]])
y_2d
array([[-5., -5., -5., -5., -5., -5., -5., -5., -5.], [-4., -4., -4., -4., -4., -4., -4., -4., -4.], [-3., -3., -3., -3., -3., -3., -3., -3., -3.], [-2., -2., -2., -2., -2., -2., -2., -2., -2.], [-1., -1., -1., -1., -1., -1., -1., -1., -1.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4., 4., 4., 4., 4.], [ 5., 5., 5., 5., 5., 5., 5., 5., 5.]])
Exercise 3: Explain what the meshgrid function does. What is the difference between x_2d
and y_2d
?¶
3. Indexing in Numpy¶
Indexing in NumPy allows you to access and modify elements, rows, columns, or subarrays of an array. Basic indexing is similar to lists.
# get some individual elements of xx
x_2d[0,0], x_2d[-1,-1], x_2d[3,-5]
(np.float64(-4.0), np.float64(4.0), np.float64(0.0))
# get some whole rows and columns
x_2d[0].shape, x_2d[:,-1].shape
((9,), (11,))
# get some ranges
x_2d[3:10,3:5].shape
(7, 2)
There are many advanced ways to index arrays. You can read about them in the manual. Here is one example.
# use a boolean array as an index
idx = x_2d<0
x_2d[idx].shape
(44,)
Exercise 4: Get the last two columns of y_2d
array.¶
# two dimensional grids
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.linspace(-np.pi, np.pi, 50)
xx, yy = np.meshgrid(x, y)
xx.shape, yy.shape
((50, 100), (50, 100))
f = np.sin(xx) * np.cos(0.5*yy)
Visualizing Arrays with Matplotlib¶
It can be hard to work with big arrays without actually seeing anything with our eyes! We will now bring in Matplotib to start visualizating these arrays. For now we will just skim the surface of Matplotlib. Much more depth will be provided in the next chapter.
from matplotlib import pyplot as plt
plt.pcolormesh(f)
<matplotlib.collections.QuadMesh at 0x117fdee80>
4.2 Manipulating array dimensions¶
# transpose
plt.pcolormesh(f.T)
<matplotlib.collections.QuadMesh at 0x12624e400>
f.shape
(50, 100)
np.tile(f,(6,1)).shape
(300, 100)
# tile an array
plt.pcolormesh(np.tile(f,(6,1)))
<matplotlib.collections.QuadMesh at 0x1262bbeb0>
4.3 Broadcasting¶
Broadcasting is an efficient way to multiply arrays of different sizes
from IPython.display import Image
Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',
width=720)
![No description has been provided for this image](http://scipy-lectures.github.io/_images/numpy_broadcasting.png)
# multiply f by x
print(f.shape, x.shape)
g = f * x
print(g.shape)
(50, 100) (100,) (50, 100)
# multiply f by y
print(f.shape, y.shape)
h = f * y
print(h.shape)
(50, 100) (50,)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[46], line 3 1 # multiply f by y 2 print(f.shape, y.shape) ----> 3 h = f * y 4 print(h.shape) ValueError: operands could not be broadcast together with shapes (50,100) (50,)
# use newaxis special syntax
y_new = y[:,np.newaxis]
h = f * y_new
print(h.shape)
(50, 100)
Exercise 5: What is the difference between y and y_new? Why f * y
gives an error, but f * y_new
doesn't?¶
4.4 Reduction Operations¶
# sum
g.sum()
np.float64(-3083.038387807155)
# mean
g.mean()
np.float64(-0.616607677561431)
# std
g.std()
np.float64(1.6402280119141424)
# apply on just one axis
# Mean of each row (calculated across columns)
g_xmean = g.mean(axis=1)
# Mean of each column (calculated across rows)
g_ymean = g.mean(axis=0)
plt.plot(x, g_ymean)
[<matplotlib.lines.Line2D at 0x1265e17f0>]
plt.plot(g_xmean, y)
[<matplotlib.lines.Line2D at 0x1266d3eb0>]
5. Missing data¶
Most real-world datasets – environmental or otherwise – have data gaps. Data can be missing for any number of reasons, including observations not being recorded or data corruption. While a cell corresponding to a data gap may just be left blank in a spreadsheet, when imported into Python, there must be some way to handle "blank" or missing values.
Missing data should not be replaced with zeros, as 0 can be a valid value for many datasets, (e.g. temperature, precipitation, etc.). Instead, the convention is to fill all missing data with the constant NaN. NaN stands for "Not a Number" and is implemented in NumPy as np.nan.
NaNs are handled differently by different packages. In NumPy, all computations involving NaN values will return nan:
data = np.array([[2.,2.7,1.89],
[1.1, 0.0, np.nan],
[3.2, 0.74, 2.1]])
np.mean(data)
np.float64(nan)
np.nanmean(data)
np.float64(1.71625)
6. Assignment¶
First import numpy and matplotlib
6.1. Create two 2D arrays representing coordinates x, y¶
Both should cover the range (-2, 2) and have 100 points in each direction
6.2. Visualize each 2D array using pcolormesh
¶
Use the correct coordiantes for the x and y axes.
6.3 From your cartesian coordinates, create polar coordinates $r$ and $\varphi$:¶
$r = \sqrt{x^2 + y^2}$
$\varphi = atan2(y,x)$
You will need to use numpy's arctan2
function. Read its documentation.
6.4. Visualize $r$ and $\varphi$ on the 2D $x$ / $y$ plane using pcolormesh
¶
6.5 Caclulate the quanity $f = \cos^2(4r) + \sin^2(4\varphi)$¶
And plot it on the $x$ / $y$ plane
6.6 Plot the mean of f with respect to the x axis¶
as a function of y
6.7 Plot the mean of f with respect to the y axis¶
as a function of x