Week 9: Xarray for multidimensional gridded data¶

In previous lecture, we saw how Pandas provided a way to keep track of additional "metadata" surrounding tabular datasets, including "indexes" for each row and labels for each column. These features, together with Pandas' many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.

However, not all environmental science datasets easily fit into the "tabular" model (i.e. rows and columns) imposed by Pandas. In particular, we often deal with multidimensional data. By multidimensional data (also often called N-dimensional), I mean data with many independent dimensions or axes. For example, we might represent Earth's surface temperature $T$ as a three dimensional variable

$$ T(x, y, t) $$

where $x$ is longitude, $y$ is latitude, and $t$ is time.

The point of xarray is to provide pandas-level convenience for working with this type of data.

1. What is Xarray?¶

1.1 Xarray data model¶

Here is an example of how we might structure a dataset for a weather forecast:

No description has been provided for this image

You'll notice multiple data variables (temperature, precipitation), coordinate variables (latitude, longitude), and dimensions (x, y, t). We'll cover how these fit into Xarray's data structures below.

1.2 Xarray functionality¶

Xarray doesn’t just keep track of labels on arrays – it uses them to provide a powerful and concise interface. For example:

Apply operations over dimensions by name: x.sum('time').
Select values by label (or logical location) instead of integer location: x.loc['2014-01-01'] or x.sel(time='2014-01-01').
Mathematical operations (e.g., x - y) vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape.
Easily use the split-apply-combine paradigm with groupby: x.groupby('time.dayofyear').mean().
Database-like alignment based on coordinate labels that smoothly handles missing values: x, y = xr.align(x, y, join='outer').
Keep track of arbitrary metadata in the form of a Python dictionary: x.attrs.

The N-dimensional nature of xarray’s data structures makes it suitable for dealing with multi-dimensional scientific data, and its use of dimension names instead of axis labels (dim='time' instead of axis=0) makes such arrays much more manageable than the raw numpy ndarray: with xarray, you don’t need to keep track of the order of an array’s dimensions or insert dummy dimensions of size 1 to align arrays (e.g., using np.newaxis).

The immediate payoff of using xarray is that you’ll write less code. The long-term payoff is that you’ll understand what you were thinking when you come back to look at it weeks or months later.

1.3 Xarray data structures¶

Like Pandas, xarray has two fundamental data structures:

a DataArray, which holds a single multi-dimensional variable and its coordinates
a Dataset, which holds multiple variables that potentially share the same coordinates

A DataArray has four essential attributes:

values: a numpy.ndarray holding the array’s values
dims: dimension names for each axis (e.g., ('x', 'y', 'z'))
coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)
attrs: an OrderedDict to hold arbitrary metadata (attributes)

1.4 Import Xarray packages¶

In [2]:

Copied!





# First import xarray 
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
# First import xarray 
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

2. Exploring Xarray data structure¶

Array data can be stored in the following standard formats:

Hierarchical Data Format (HDF5) - Container for many arrays
Network Common Data Form (NetCDF) - Container for many arrays which conform to the NetCDF data model
Zarr - New cloud-optimized format for array storage

Here we will focus on using NetCDF data. NetCDF is widely used in international projects, governmental organizations (like NASA, NOAA), and academic institutions. Its adoption in climate models, geospatial data, remote sensing, and weather forecasting systems cements its role as a go-to format for many researchers. NetCDF efficiently handles multidimensional data (e.g., space, time, and variables like temperature or pressure), which is crucial in environmental science analysis where data often have complex, high-dimensional structures. It is commonly used for time series, 3D models, and spatially distributed data.

Python's xarray library is widely used for handling NetCDF data due to its powerful and flexible features tailored for working with labeled, multidimensional arrays.

2.1 Reading Xarray data¶

Here we use an example air temperature data from NCEP-NCAR Reanalysis .

Like Pandas, xarray also has a built-in function to easily read in NetCDF data as an xarray Dataset:

We can access "layers" of the Dataset (individual DataArrays) with dictionary syntax

We can save some typing by using the "attribute" or "dot" notation. This won't work for variable names that clash with built-in method names (for example, mean).

2.2 What is included in xarray `Dataset`?¶

The output consists of:

a summary of all dimensions of the Dataset (time: 365, lat: 73, lon: 144): this tells us that the first dimension is named time and has a size of 365, the second dimension is named lat and has a size of 73, and the third dimension is named lon and has a size of 365. Because we will access the dimensions by name, the order doesn't matter.
an unordered list of coordinates or dimensions with coordinates with one item per line. Each item has a name, one or more dimensions in parentheses, a dtype and a preview of the values.
an alphabetically sorted list of dimensions without coordinates (if there are any)
an unordered list of attributes, or metadata

DataArray¶

The DataArray class consists of an array (data) and its associated dimension names, labels, and attributes (metadata).

We can also access the data array directly:

In [6]:

Copied!

ds.air.data
ds.air.data

Out[6]:

array([[[246.57498, 246.57498, 246.57498, ..., 246.57498, 246.57498,
         246.57498],
        [248.04999, 247.97498, 247.92499, ..., 248.29999, 248.19998,
         248.12498],
        [255.52498, 256.07498, 256.62497, ..., 253.99998, 254.47499,
         254.99998],
        ...,
        [251.32498, 251.49997, 251.69998, ..., 250.97498, 251.04999,
         251.14998],
        [251.57498, 251.59998, 251.62497, ..., 251.575  , 251.575  ,
         251.57498],
        [251.22498, 251.22498, 251.22498, ..., 251.22498, 251.22498,
         251.22498]],

       [[248.19998, 248.19998, 248.19998, ..., 248.19998, 248.19998,
         248.19998],
        [253.39998, 253.39998, 253.34998, ..., 253.34998, 253.37497,
         253.37498],
        [259.3    , 259.97498, 260.55   , ..., 257.14996, 257.87497,
         258.59998],
        ...,
        [252.89996, 253.32497, 253.64996, ..., 251.84998, 252.17497,
         252.52498],
        [252.72498, 252.79999, 252.84998, ..., 252.59998, 252.62497,
         252.69998],
        [252.79997, 252.79997, 252.79997, ..., 252.79997, 252.79997,
         252.79997]],

       [[248.20001, 248.20001, 248.20001, ..., 248.20001, 248.20001,
         248.20001],
        [256.15   , 255.94998, 255.725  , ..., 256.65   , 256.5    ,
         256.325  ],
        [263.32498, 263.625  , 263.925  , ..., 261.95   , 262.475  ,
         262.925  ],
        ...,
        [253.27501, 253.40001, 253.52501, ..., 252.875  , 252.95   ,
         253.1    ],
        [252.475  , 252.475  , 252.5    , ..., 252.42499, 252.42499,
         252.44998],
        [252.025  , 252.025  , 252.025  , ..., 252.025  , 252.025  ,
         252.025  ]],

       ...,

       [[248.025  , 248.025  , 248.025  , ..., 248.025  , 248.025  ,
         248.025  ],
        [248.34999, 248.17499, 247.99998, ..., 248.74998, 248.625  ,
         248.47499],
        [252.275  , 252.22499, 252.15   , ..., 252.22499, 252.24998,
         252.29999],
        ...,
        [249.1    , 249.20001, 249.25   , ..., 248.92499, 248.975  ,
         249.025  ],
        [250.6    , 250.425  , 250.275  , ..., 251.125  , 250.925  ,
         250.75   ],
        [255.47499, 255.47499, 255.47499, ..., 255.47499, 255.47499,
         255.47499]],

       [[248.97499, 248.97499, 248.97499, ..., 248.97499, 248.97499,
         248.97499],
        [251.825  , 251.67499, 251.44998, ..., 252.29999, 252.15   ,
         252.02498],
        [258.95   , 259.175  , 259.325  , ..., 257.65   , 258.125  ,
         258.6    ],
        ...,
        [249.72499, 249.8    , 249.87498, ..., 249.57501, 249.575  ,
         249.62498],
        [254.29999, 254.2    , 254.07498, ..., 254.84999, 254.65   ,
         254.49998],
        [258.1    , 258.1    , 258.1    , ..., 258.1    , 258.1    ,
         258.1    ]],

       [[253.82498, 253.82498, 253.82498, ..., 253.82498, 253.82498,
         253.82498],
        [257.3    , 257.4    , 257.47498, ..., 257.     , 257.07498,
         257.19998],
        [257.65   , 258.75   , 259.8    , ..., 254.42499, 255.44998,
         256.57498],
        ...,
        [253.82498, 253.79999, 253.72499, ..., 253.87497, 253.84998,
         253.84998],
        [255.62498, 255.47499, 255.34998, ..., 256.12497, 255.92499,
         255.72498],
        [258.72498, 258.72498, 258.72498, ..., 258.72498, 258.72498,
         258.72498]]], dtype=float32)

Named dimensions¶

.dims are the named axes of your data. They may (dimension coordinates) or may not (dimensions without coordinates) have associated values.

In this case we have 2 spatial dimensions (latitude and longitude are stored with shorthand names lat and lon) and one temporal dimension (time).

In [7]:

Copied!

ds.air.dims
ds.air.dims

Out[7]:

('time', 'lat', 'lon')

Coordinates¶

.coords is a simple dict-like data container for mapping coordinate names to values.

Here we see the actual timestamps and spatial positions of our air temperature data:

In [8]:

Copied!

ds.air.coords
ds.air.coords

Out[8]:

Coordinates:
  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 ... -82.5 -85.0 -87.5 -90.0
  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * time     (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-12-31

Attributes¶

.attrs is a dictionary that can contain arbitrary Python objects (strings, lists, integers, dictionaries, etc.) containing information about your data. Your only limitation is that some attributes may not be writeable to certain file formats.

In [9]:

Copied!

ds.air.attrs
ds.air.attrs

Out[9]:

{'long_name': 'mean Daily Air temperature at sigma level 995',
 'units': 'degK',
 'precision': 2,
 'GRIB_id': 11,
 'GRIB_name': 'TMP',
 'var_desc': 'Air temperature',
 'dataset': 'NCEP Reanalysis Daily Averages',
 'statistic': 'Mean',
 'parent_stat': 'Individual Obs',
 'valid_range': array([185.16, 331.16], dtype=float32),
 'level_desc': '0.995 sigma',
 'actual_range': array([191.4 , 318.05], dtype=float32)}

In [10]:

Copied!

# assign your own attributes!
ds.air.attrs["new_attr"] = "trial dataset in class"
ds.air.attrs
# assign your own attributes!
ds.air.attrs["new_attr"] = "trial dataset in class"
ds.air.attrs

Out[10]:

{'long_name': 'mean Daily Air temperature at sigma level 995',
 'units': 'degK',
 'precision': 2,
 'GRIB_id': 11,
 'GRIB_name': 'TMP',
 'var_desc': 'Air temperature',
 'dataset': 'NCEP Reanalysis Daily Averages',
 'statistic': 'Mean',
 'parent_stat': 'Individual Obs',
 'valid_range': array([185.16, 331.16], dtype=float32),
 'level_desc': '0.995 sigma',
 'actual_range': array([191.4 , 318.05], dtype=float32),
 'new_attr': 'trial dataset in class'}

3. Working with Labelled Data¶

Xarray's labels make working with multidimensional data much easier. Metadata provides context and provides code that is more legible. This reduces the likelihood of errors from typos and makes analysis more intuitive and fun!

3.1 Plot labelled data¶

In [11]:

Copied!

## Without Xarray, let's try to use matplotlib to plot the data

temp = ds.air.data
plt.pcolormesh(temp[0,:,:])
## Without Xarray, let's try to use matplotlib to plot the data

temp = ds.air.data
plt.pcolormesh(temp[0,:,:])

Out[11]:

<matplotlib.collections.QuadMesh at 0x192e430a0>

In [12]:

Copied!





### Add Lat and Lon

temp = ds.air.data
lat = ds.air.lat.data
lon = ds.air.lon.data
plt.pcolormesh(lon, lat, temp[0,:,:])
### Add Lat and Lon

temp = ds.air.data
lat = ds.air.lat.data
lon = ds.air.lon.data
plt.pcolormesh(lon, lat, temp[0,:,:])

Out[12]:

<matplotlib.collections.QuadMesh at 0x1933156d0>

In [13]:

Copied!

temp.mean(axis=1)  ## what did I just do? I can't tell by looking at this line.
temp.mean(axis=1)  ## what did I just do? I can't tell by looking at this line.

Out[13]:

array([[280.59583, 280.43628, 280.28867, ..., 280.75656, 280.65207,
        280.58704],
       [280.29794, 280.4586 , 280.45416, ..., 280.25885, 280.03458,
        280.04657],
       [280.10754, 280.03354, 279.92673, ..., 280.4538 , 280.11682,
        280.101  ],
       ...,
       [280.65173, 280.88287, 280.90796, ..., 280.73325, 280.57697,
        280.5192 ],
       [280.89075, 281.15482, 281.1534 , ..., 281.20197, 280.81638,
        280.6781 ],
       [281.50757, 281.5658 , 281.36743, ..., 281.63388, 281.364  ,
        281.33322]], dtype=float32)

In [14]:

Copied!

ds.air.isel(time=0).plot()
ds.air.isel(time=0).plot()

Out[14]:

<matplotlib.collections.QuadMesh at 0x19339eb20>

In [15]:

Copied!

ds.air.mean(dim="time").plot()
ds.air.mean(dim="time").plot()

Out[15]:

<matplotlib.collections.QuadMesh at 0x1934e4550>

3.2 Selecting Data (Indexing)¶

We can always use regular numpy indexing and slicing on DataArrays

3.2.1 Position-based Indexing¶

Indexing a DataArray directly works (mostly) just like it does for numpy ndarrays, except that the returned object is always another DataArray:

This approach however does not take advantage of the dimension names and coordinate location information that is present in a Xarray object.

3.2.2 Positional Indexing Using Dimension Names¶

Remembering the axis order can be challenging even with 2D arrays:

is np_array[0,3] the first row and third column or first column and third row?
or did I store these samples by row or by column when I saved the data?!

The difficulty is compounded with added dimensions.

Xarray objects eliminate much of the mental overhead by allowing indexing using dimension names instead of axes numbers:

In [22]:

Copied!

da.isel(lat=0, lon=0).plot();
da.isel(lat=0, lon=0).plot();

Slicing is also possible similarly:

In [23]:

Copied!

da.isel(time=slice(0, 20), lat=0, lon=0).plot();
da.isel(time=slice(0, 20), lat=0, lon=0).plot();

Using the isel method, the user can choose/slice the specific elements from a Dataset or DataArray.

So far, we have explored positional indexing, which relies on knowing the exact indices. But, what if you wanted to select data specifically for a particular latitude? It becomes challenging to determine the corresponding indices in such cases. Xarray reduce this complexity by introducing label-based indexing.

3.2.3 Label-based Indexing¶

To select data by coordinate labels instead of integer indices we can use sel instead of isel:

For example, let's select all data for Lat 25 °N and Lon 210 °E using sel :

In [24]:

hide-output

Copied!

da.sel(lat=25, lon=210).plot();
da.sel(lat=25, lon=210).plot();

Similarly we can do slicing or filter a range using the .slice function:

So far, all the above will require us to specify exact coordinate values, but what if we don't have the exact values? We can use nearest neighbor lookups to address this issue:

3.2.4 Nearest Neighbor Lookups¶

The label based selection methods sel() support method and tolerance keyword argument. The method parameter allows for enabling nearest neighbor (inexact) lookups by use of the methods ffill (propagate last valid index forward), backfill or nearest:

tolerance argument limits the maximum distance for valid matches with an inexact lookup:

{tip}
All of these indexing methods work on the dataset too!

We can also use these methods to index all variables in a dataset simultaneously, returning a new dataset:

Exercise 1: Select and plot the air temperature of New Brunswick,NJ (46.5653° N, 66.4619° W).¶

Note: You will need to convert western degrees to a degree ranging from 0 to 360, subtract it from 360.

In [ ]:

3.3 Datetime Indexing¶

Datetime indexing is a critical feature when working with time series data, which is a common occurrence in environmental sciences. Essentially, datetime indexing allows you to select data points or a series of data points that correspond to certain date or time criteria. This becomes essential for time-series analysis where the date or time information associated with each data point can be as critical as the data point itself.

Let's see some of the techniques to perform datetime indexing in Xarray:

3.3.1 Selecting data based on single datetime¶

Let's say we have a Dataset ds and we want to select data at a particular date and time, for instance, '2023-05-01'. We can do this by using the sel (select) method, like so:

By default, datetime selection will return a range of values that match the provided string. For e.g. time="2023-01" will return all timestamps for that month:

3.3.2 Selecting data for a range of dates¶

Now, let's say we want to select data between a certain range of dates. We can still use the sel method, but this time we will combine it with slice:

The slice function takes two arguments, start and stop, to make a slice that includes these endpoints. When we use slice with the sel method, it provides an efficient way to select a range of dates. The above example shows the usage of slice for datetime indexing.

3.3.3 Indexing with a DatetimeIndex or date string list¶

Another technique is to use a list of datetime objects or date strings for indexing. For example, you could select data for specific, non-contiguous dates like this:

3.3.4 Fancy indexing based on year, month, day, or other datetime components¶

In addition to the basic datetime indexing techniques, Xarray also supports "fancy" indexing options, which can provide more flexibility and efficiency in your data analysis tasks. You can directly access datetime components such as year, month, day, hour, day of week, etc. using the .dt accessor. Here is an example of selecting all data points from July across all years:

Or, if you wanted to select data from a specific day of each month, you could use:

Exercise 2: Select all data on Sunday only.¶

Hint: Use the dayofweek function.

In [ ]:

4. Xarray Computation¶

4.1 Arithmetic Operations¶

Similar to Pandas, xarray supports different kinds of arithmetic operations

In [54]:

Copied!

ds['air_C'] = ds['air'] - 273.15
ds['air_C'] = ds['air'] - 273.15

In [58]:

Copied!

(ds['air_C'].mean(dim='time') >  0).plot()
(ds['air_C'].mean(dim='time') >  0).plot()

Out[58]:

<matplotlib.collections.QuadMesh at 0x195485310>

In [59]:

Copied!

# Apply the mask:
ds.where(ds['air_C'].mean(dim='time') >  0).mean(dim = 'time').air.plot()
# Apply the mask:
ds.where(ds['air_C'].mean(dim='time') >  0).mean(dim = 'time').air.plot()

Out[59]:

<matplotlib.collections.QuadMesh at 0x1955a9910>

4.2 Broadcasting: expanding data¶

In [60]:

Copied!

ds['air_C'].mean(dim='time').plot()
ds['air_C'].mean(dim='time').plot()

Out[60]:

<matplotlib.collections.QuadMesh at 0x195688a90>

In [61]:

Copied!

(ds['air_C'].mean(dim='time')*np.cos(np.deg2rad(ds.lat))).plot()
(ds['air_C'].mean(dim='time')*np.cos(np.deg2rad(ds.lat))).plot()

Out[61]:

<matplotlib.collections.QuadMesh at 0x195764460>

In [63]:

Copied!

# Make cos(lat) two dimensional
(xr.ones_like(ds['air_C'].mean(dim='time'))*np.cos(np.deg2rad(ds.lat))).plot()
# Make cos(lat) two dimensional
(xr.ones_like(ds['air_C'].mean(dim='time'))*np.cos(np.deg2rad(ds.lat))).plot()

Out[63]:

<matplotlib.collections.QuadMesh at 0x1953c84c0>

5. Xarray Groupby and Resample¶

5.1 groupby¶

We learned groupby in pandas, which can similary be applied to Xarray.

In [66]:

Copied!

# Group by season
ds.groupby("time.season")
# Group by season
ds.groupby("time.season")

Out[66]:

DatasetGroupBy, grouped over 'season'
4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'.

The seasons are out of order (they are alphabetically sorted). This is a common annoyance. The solution is to use .sel to change the order of labels

In [69]:

Copied!

seasonal_mean.air.plot(col="season")
seasonal_mean.air.plot(col="season")

Out[69]:

<xarray.plot.facetgrid.FacetGrid at 0x1958cccd0>

In [70]:

Copied!

# Make the figure to 2x2 plots:
seasonal_mean.air.plot(col="season", col_wrap=2);
# Make the figure to 2x2 plots:
seasonal_mean.air.plot(col="season", col_wrap=2);

In [71]:

Copied!

# Calculate zonal average
seasonal_mean.air.mean("lon").plot.line(hue="season", y="lat");
# Calculate zonal average
seasonal_mean.air.mean("lon").plot.line(hue="season", y="lat");

5.2 Resample¶

In [73]:

Copied!

ds.resample(time="M").mean().mean(dim =['lat','lon']).air.plot()
ds.resample(time="M").mean().mean(dim =['lat','lon']).air.plot()

Out[73]:

[<matplotlib.lines.Line2D at 0x195ddcfd0>]

Exercise 3: Make a time series plot of the monthly mean temperature over Northern Hemisphere.¶

In [ ]:

6. Assignment¶

In this assignment, we will use Xarray to analyze top-of-atmosphere radiation data from NASA's CERES project.

radiation budget Public domain, by NASA, from Wikimedia Commons

6.1 Open the dataset ('CERES_EBAF-TOA_Edition4.0_200003-201701.nc') and display its contents.¶

In [ ]:

6.2 Print out the `long_name` attribute of each variable¶

In [ ]:

6.3 Calculate the time-mean of the entire dataset¶

In [ ]:

6.4 From 6.3, make a 2D plot of the the time-mean TOA longwave, shortwave, and incoming solar radiation flux¶

(All-Sky conditions)

Note the sign conventions on each variable.

In [ ]:

6.5 Add up the three variables above and verify (visually) that they are equivalent to the TOA net flux¶

You have to pay attention to and think about the sign conventions for each variable in order to get this to work.

In [ ]:

6.6 Calculate the global mean of TOA net radiation directly from the dataset¶

Since the Earth is approximately in radiative balance, the net TOA radiation should be zero. But taking the naive mean from this dataset, you should find a number far from zero. Why?

In [ ]:

The answer is that each "pixel" or "grid point" of this dataset does not represent an equal area of Earth's surface. So naively taking the mean, i.e. giving equal weight to each point, gives the wrong answer.

On a lat / lon grid, the relative area of each grid point is proportional to $\cos(\lambda)$. ($\lambda$ is latitude)

6.7 Create a `weight` array proportional to $\cos(\lambda)$ with a mean value of 1¶

Verify its mean is 1 and plot it. Be careful about radians vs. degrees.

In [ ]:

6.8 Redo your global mean TOA net radiation calculation with this weight factor¶

Remember Xarray's handling of broadcasting. Don't make this harder than it needs to be.

In [ ]:

6.9 Plot the time-mean cloud area fraction (day and night)¶

In [ ]:

6.10 Define boolean masks for low cloud area ($\le$ 25%) and high cloud area ($\ge$ 75%)¶

Use the whole dataset, not the time mean.

In [ ]:

6.11 Calculate and plot composites of time-mean outgoing shortwave radiation for low and high cloud area regions¶

Your results should be 2D maps.

Xarray's where function will be helpful.

In [ ]:

6.12 Subset the dataset for NJ region (74˚W - 75.6˚W, 38.8˚N - 41.5˚N) and calculate the monthly climatology of cloud area fraction¶

In [ ]:

Week 9: Xarray for multidimensional gridded data¶

1. What is Xarray?¶

1.1 Xarray data model¶

1.2 Xarray functionality¶

1.3 Xarray data structures¶

1.4 Import Xarray packages¶

2. Exploring Xarray data structure¶

2.1 Reading Xarray data¶

2.2 What is included in xarray Dataset?¶

DataArray¶

Named dimensions¶

Coordinates¶

Attributes¶

3. Working with Labelled Data¶

3.1 Plot labelled data¶

3.2 Selecting Data (Indexing)¶

3.2.1 Position-based Indexing¶

3.2.2 Positional Indexing Using Dimension Names¶

3.2.3 Label-based Indexing¶

3.2.4 Nearest Neighbor Lookups¶

Exercise 1: Select and plot the air temperature of New Brunswick,NJ (46.5653° N, 66.4619° W).¶

3.3 Datetime Indexing¶

3.3.1 Selecting data based on single datetime¶

3.3.2 Selecting data for a range of dates¶

3.3.3 Indexing with a DatetimeIndex or date string list¶

3.3.4 Fancy indexing based on year, month, day, or other datetime components¶

Exercise 2: Select all data on Sunday only.¶

4. Xarray Computation¶

4.1 Arithmetic Operations¶

4.2 Broadcasting: expanding data¶

5. Xarray Groupby and Resample¶

5.1 groupby¶

5.2 Resample¶

Exercise 3: Make a time series plot of the monthly mean temperature over Northern Hemisphere.¶

6. Assignment¶

6.1 Open the dataset ('CERES_EBAF-TOA_Edition4.0_200003-201701.nc') and display its contents.¶

6.2 Print out the long_name attribute of each variable¶

6.3 Calculate the time-mean of the entire dataset¶

6.4 From 6.3, make a 2D plot of the the time-mean TOA longwave, shortwave, and incoming solar radiation flux¶

6.5 Add up the three variables above and verify (visually) that they are equivalent to the TOA net flux¶

6.6 Calculate the global mean of TOA net radiation directly from the dataset¶

6.7 Create a weight array proportional to $\cos(\lambda)$ with a mean value of 1¶

6.8 Redo your global mean TOA net radiation calculation with this weight factor¶

6.9 Plot the time-mean cloud area fraction (day and night)¶

6.10 Define boolean masks for low cloud area ($\le$ 25%) and high cloud area ($\ge$ 75%)¶

6.11 Calculate and plot composites of time-mean outgoing shortwave radiation for low and high cloud area regions¶

6.12 Subset the dataset for NJ region (74˚W - 75.6˚W, 38.8˚N - 41.5˚N) and calculate the monthly climatology of cloud area fraction¶

2.2 What is included in xarray `Dataset`?¶

6.2 Print out the `long_name` attribute of each variable¶

6.7 Create a `weight` array proportional to $\cos(\lambda)$ with a mean value of 1¶