Week 8: Environmental data visualization¶
This week, we're going to dive into Matplotlib,one of the most widely used Python libraries for data visualization. Matplotlib offers a powerful and flexible way to create a wide range of static, animated, and interactive plots.
Let's first import Matplotlib
import matplotlib.pyplot as plt
1. Basic Components of a Matplotlib Plot¶
Before diving into specific visualizations, let's understand the basic elements of a Matplotlib plot:
Figure: The overall window or page that everything is drawn on.
Axes: The area on which the data is plotted (like a graph).
Plot: The graphical representation of the data points (line, scatter, etc.).
1.1 Figure¶
The figure is the highest level of organization of matplotlib objects. If we want, we can create a figure explicitly.
fig = plt.figure()
<Figure size 640x480 with 0 Axes>
# Specify figure size
fig = plt.figure(figsize=(13, 5))
<Figure size 1300x500 with 0 Axes>
Note: figsize parameter controls the width and height of the figure in inches.
If you notice the output of the figure size is 1300 x 500, which refers to the number of pixels.
The resolution of the figure is controlled by the DPI (Dots Per Inch) setting. By default, Matplotlib uses a DPI of 100, meaning that the number of pixels per inch is 100. So, if you create a figure with figsize=(8, 6), the resulting image will have a resolution of 800 pixels (8 inches × 100 DPI) by 600 pixels (6 inches × 100 DPI).
You can also change the DPI of the figure:
# Specify DPI
fig = plt.figure(figsize=(13, 5), dpi = 150)
<Figure size 1950x750 with 0 Axes>
1.2 Axes¶
Axes define the area on which the data is plotted (like a graph).
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1])
The list [0, 0, 1, 1] specifies the position and size of the axes in normalized figure coordinates, where values range from 0 to 1. Here's what each element of the list represents:
0: The left position of the axes relative to the figure (0 means the far left edge).
0: The bottom position of the axes relative to the figure (0 means the bottom edge).
1: The width of the axes as a fraction of the figure width (1 means it spans the entire width of the figure).
1: The height of the axes as a fraction of the figure height (1 means it spans the entire height of the figure).
# Define ax that occupies left half of the figure.
fig = plt.figure()
ax = fig.add_axes([0, 0, 0.5, 1])
# Define multiple axes:
fig = plt.figure()
ax1 = fig.add_axes([0, 0, 0.5, 1])
# Set the facecolor of the plot area.
ax2 = fig.add_axes([0.6, 0, 0.3, 0.5], facecolor='g')
1.3 Subplots¶
In Matplotlib, subplots allow you to create multiple plots (also known as axes) in a single figure. This is particularly useful when you want to display several related visualizations side-by-side or in a grid format. The function most commonly used to create subplots is plt.subplots(), which simplifies the process of creating multiple plots and managing their layout.
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=3)
fig = plt.figure(figsize=(12, 6))
axes = fig.subplots(nrows=2, ncols=3)
# See what axes look like.
axes
array([[<Axes: >, <Axes: >, <Axes: >], [<Axes: >, <Axes: >, <Axes: >]], dtype=object)
There is a shorthand for doing this all at once.
This is our recommended way to create new figures!
fig, ax = plt.subplots()
type(ax)
matplotlib.axes._axes.Axes
fig, axes = plt.subplots(ncols=2, figsize=(8, 4), subplot_kw={'facecolor': 'g'})
type(axes)
numpy.ndarray
Exercise 1: Create a figure with 3 columns x 4 rows axes. Set the figsize to be 10 inches in width, and 8 inches in height. Set the facecolor to be purple.¶
2. Drawing into Axes¶
All plots are drawn into axes. It is easiest to understand how matplotlib works if you use the object-oriented style.
2.1 Displaying the data¶
# create some data to plot
import numpy as np
x = np.linspace(-np.pi, np.pi, 100)
y = np.cos(x)
z = np.sin(6*x)
# First, define the figure and ax.
fig, ax = plt.subplots()
# Draw data into axes.
ax.plot(x, y)
[<matplotlib.lines.Line2D at 0x112ea7520>]
This does the same thing as
plt.plot(x, y)
[<matplotlib.lines.Line2D at 0x1126de580>]
So why do we need to define figure and axes first?
This starts to matter when we have multiple axes to worry about.
# Define to axes (two ploting areas).
fig, axes = plt.subplots(figsize=(8, 4), ncols=2)
# axes is a Numpy array, and we can separate them into two variables, each pointing to a ploting area.
ax0, ax1 = axes
# Plot on the first ax:
ax0.plot(x, y)
# Plot on the second ax
ax1.plot(x, z)
[<matplotlib.lines.Line2D at 0x1126233a0>]
2.2 Labeling Plots¶
When making plot, we should always remember to label the x and y axis. Figure title is also important. See Rule #4 of the reading assignment.
fig, axes = plt.subplots(figsize=(8, 4), ncols=2)
ax0, ax1 = axes
ax0.plot(x, y)
ax0.set_xlabel('x')
ax0.set_ylabel('y')
ax0.set_title('x vs. y')
ax1.plot(x, z)
ax1.set_xlabel('x')
ax1.set_ylabel('z')
ax1.set_title('x vs. z')
# squeeze everything in
plt.tight_layout()
Exercise 2: Make a figure with a single ax, and merge the above two plots into one plot.¶
3. Customizing Line Plots¶
3.1 Line Styles¶
fig, axes = plt.subplots(figsize=(16, 5), ncols=3)
axes[0].plot(x, y, linestyle='dashed')
axes[0].plot(x, z, linestyle='--')
axes[1].plot(x, y, linestyle='dotted')
axes[1].plot(x, z, linestyle=':')
axes[2].plot(x, y, linestyle='dashdot', linewidth=5)
axes[2].plot(x, z, linestyle='-.', linewidth=0.5)
[<matplotlib.lines.Line2D at 0x171881f70>]
3.2 Colors¶
As described in the colors documentation, there are some special codes for commonly used colors:
- b: blue
- g: green
- r: red
- c: cyan
- m: magenta
- y: yellow
- k: black
- w: white
fig, ax = plt.subplots()
ax.plot(x, y, color='k')
ax.plot(x, z, color='r')
[<matplotlib.lines.Line2D at 0x17198bf40>]
Other ways to specify colors:
fig, axes = plt.subplots(figsize=(16, 5), ncols=3)
# grayscale
axes[0].plot(x, y, color='0.8')
axes[0].plot(x, z, color='0.2')
# RGB tuple
axes[1].plot(x, y, color=(1, 0, 0.7))
axes[1].plot(x, z, color=(0, 0.4, 0.3))
# HTML hex code: https://htmlcolorcodes.com
axes[2].plot(x, y, color='#00dcba')
axes[2].plot(x, z, color='#b029ee')
[<matplotlib.lines.Line2D at 0x112edd820>]
There is a default color cycle built into matplotlib.
plt.rcParams['axes.prop_cycle']
'color' |
---|
'#1f77b4' |
'#ff7f0e' |
'#2ca02c' |
'#d62728' |
'#9467bd' |
'#8c564b' |
'#e377c2' |
'#7f7f7f' |
'#bcbd22' |
'#17becf' |
fig, ax = plt.subplots(figsize=(12, 10))
for factor in np.linspace(0.2, 1, 11):
ax.plot(x, factor*y)
3.3 Markers¶
There are lots of different markers availabile in matplotlib!
fig, axes = plt.subplots(figsize=(12, 5), ncols=2)
axes[0].plot(x[:20], y[:20], marker='.')
axes[0].plot(x[:20], z[:20], marker='o')
axes[1].plot(x[:20], z[:20], marker='^',
markersize=10, markerfacecolor='r',
markeredgecolor='k')
[<matplotlib.lines.Line2D at 0x1130d1cd0>]
3.4 Label, Ticks, and Gridlines¶
fig, ax = plt.subplots(figsize=(12, 7))
ax.plot(x, y)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('A complicated math function: $f(x) = \cos(x)$')
ax.set_xticks(np.pi * np.array([-1, 0, 1]))
ax.set_xticklabels(['$-\pi$', '0', '$\pi$'])
ax.set_yticks([-1, 0, 1])
ax.set_yticks(np.arange(-1, 1.1, 0.2), minor=True)
#ax.set_xticks(np.arange(-3, 3.1, 0.2), minor=True)
ax.grid(which='minor', linestyle='--')
ax.grid(which='major', linewidth=2)
3.5 Axis Limits¶
Setting axis limit can affect how the readers interpret the data. It's recommended to set the axis limit appropriately so that it does mislead the readers (Rule #7 in the reading assignment).
fig, ax = plt.subplots()
ax.plot(x, y, x, z)
ax.set_xlim(-5, 5)
ax.set_ylim(-3, 3)
(-3.0, 3.0)
The same data, if you set the y axis limit very large, readers may interpret this as a straight line with little variations
fig, ax = plt.subplots()
ax.plot(x, y, x, z)
ax.set_xlim(-5, 5)
ax.set_ylim(-100, 100)
(-100.0, 100.0)
3.6 Text Annotations¶
fig, ax = plt.subplots()
ax.plot(x, y)
ax.text(-3, 0.3, 'hello world')
Text(-3, 0.3, 'hello world')
The position of the text is set by the data coordinates. If we want to set it relative to the ax, we can set transform:
(0, 0) refers to the bottom-left corner of the axes.
(1, 1) refers to the top-right corner of the axes.
fig, ax = plt.subplots()
ax.plot(x, y)
# Text will be drawn on 10% from the left side of the axes, 90% from the bottom of the axes (close to the top).
ax.text(0.1, 0.9, 'hello world', transform=ax.transAxes)
Text(0.1, 0.9, 'hello world')
Annotate function¶
annotate function is also used to add text annotations to a plot, often with an arrow pointing from the annotation text to a specific point on the plot.
fig, ax = plt.subplots()
ax.plot(x, y)
ax.annotate('the maximum', xy=(0, 1),
xytext=(0, 0), arrowprops={'facecolor': 'k'})
Text(0, 0, 'the maximum')
Notes: xy referrs to the to annotate. xytext : is the position (x, y) to place the text at. The coordinate system is determined by the data coordinates
Exercise 3: Make a plot that looks like the figure below.¶
Text(0.5, 1.0, 'x vs. y')
fig, ax = plt.subplots()
# Set color to vary with the value of x, and the size of the marker varies with the function defined.
splot = ax.scatter(y, z, c=x, s=(100*z**2 + 5))
fig.colorbar(splot)
<matplotlib.colorbar.Colorbar at 0x17212ff70>
4.2 Bar Plots¶
labels = ['first', 'second', 'third']
values = [10, 5, 30]
fig, axes = plt.subplots(figsize=(10, 5), ncols=2)
axes[0].bar(labels, values)
axes[1].barh(labels, values)
<BarContainer object of 3 artists>
5. Assignment:¶
5.1 Recreate a figure using Matplotlib¶
In this problem, we will continue using the daily weather data from a NOAA station in Millbrook, NY.
The cell below uses pandas to load the data and populate a bunch of numpy arrays (t_daily_min, t_daily_max, etc.)
import pandas as pd
df = pd.read_csv('Millbrook_NY_daily_weather.csv', parse_dates=['LST_DATE'])
df = df.set_index('LST_DATE')
#########################################################
#### BELOW ARE THE VARIABLES YOU SHOULD USE IN THE PLOTS!
#### (numpy arrays)
#### NO PANDAS ALLOWED!
#########################################################
t_daily_min = df.T_DAILY_MIN.values
t_daily_max = df.T_DAILY_MAX.values
t_daily_mean = df.T_DAILY_MEAN.values
p_daily_calc = df.P_DAILY_CALC.values
soil_moisture_5 = df.SOIL_MOISTURE_5_DAILY.values
soil_moisture_10 = df.SOIL_MOISTURE_10_DAILY.values
soil_moisture_20 = df.SOIL_MOISTURE_20_DAILY.values
soil_moisture_50 = df.SOIL_MOISTURE_50_DAILY.values
soil_moisture_100 = df.SOIL_MOISTURE_100_DAILY.values
date = df.index.values
Recreate the plot you see below. Note that you should use the variables defined above. Don't make plot use Pandas. Please just use Matplotlib to make the plots!¶
Hint Range values can be plotted using fill_between function in matplotlib.
5.2 Improve the figures you made from previous home assignments.¶
Find two figures from your previous assignments (Assignments 6 and 7) that use your own dataset. Improve the visualizations of the data by adding more figure elements, setting an appropriate figure size, adjusting axis limits, applying colors, and adding labels. Try to apply the ten rules you learned to create a compelling figure.
Figure 1: Original figure¶
Figure 1: Improved version¶
Figure 2: Original figure¶
Figure 2: Improved version¶