Week 11: Spatial Analysis in Python¶
1. Intro to Geopandas¶
GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.
The core data structure in GeoPandas is the geopandas.GeoDataFrame
, a subclass of pandas.DataFrame
, that can store geometry columns and perform spatial operations. The geopandas.GeoSeries
, a subclass of pandas.Series
, handles the geometries. Therefore, your GeoDataFrame
is a combination of pandas.Series
, with traditional data (numerical, boolean, text etc.), and geopandas.GeoSeries
, with geometries (points, polygons etc.). You can have as many columns with geometries as you wish; there's no limit typical for desktop GIS software.
Each GeoSeries
can contain any geometry type (you can even mix them within a single array) and has a GeoSeries.crs
attribute, which stores information about the projection (CRS stands for Coordinate Reference System). Therefore, each GeoSeries
in a GeoDataFrame
can be in a different projection, allowing you to have, for example, multiple versions (different projections) of the same geometry.
Only one GeoSeries
in a GeoDataFrame
is considered the active geometry, which means that all geometric operations applied to a GeoDataFrame
operate on this active column.
First, let's import geopandas:
import geopandas as gpd
1.1 Reading files¶
Here we will use data from the 5th National Climate Assessment for thresholds of temperature and precipitation compared to the 1991-2020 average for the Global Warming Level 1.5 deg C. This shapefile gives county summaries of projected climate conditions projected to occur if Earth’s long-term average temperature reaches 1.5 degree C.
The data are in the shapefile format, is a popular geospatial vector data format commonly used in Geographic Information Systems (GIS) to store location, shape, and attribute information for geographic features.
gdf = gpd.read_file('./NCA5_Atlas_Global_Warming_Level_1/NCA_Atlas_Counties.shp')
gdf
OBJECTID | NAME | STATE_NAME | STATE_ABBR | FIPS | pr_above_n | prmax1day_ | prmax5yr_G | tavg_GWL1 | tmax1day_G | ... | tmean_jja_ | tmin_days_ | tmin_day_1 | tmin_day_2 | tmin_jja_G | pr_annual_ | pr_days_ab | SHAPE_Leng | SHAPE_Area | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Autauga County | Alabama | AL | 01001 | 8.866368 | 1.204632 | 0.560895 | 1.732316 | 1.543474 | ... | 1.604421 | 22.678737 | -0.039947 | -8.215921 | 1.774947 | 2.231684 | 11.870172 | 2.062534 | 0.150258 | POLYGON ((-86.41312 32.70739, -86.41219 32.526... |
1 | 2 | Baldwin County | Alabama | AL | 01003 | 10.447559 | 3.454147 | 1.104216 | 1.633755 | 1.526716 | ... | 1.508863 | 19.807863 | -0.008627 | -5.261471 | 1.667588 | 3.200307 | 11.634376 | 9.150287 | 0.398401 | MULTIPOLYGON (((-87.96018 30.66235, -87.96046 ... |
2 | 3 | Barbour County | Alabama | AL | 01005 | 10.930474 | 0.800702 | 0.499965 | 1.649667 | 1.629789 | ... | 1.571404 | 23.813614 | -0.017614 | -6.758772 | 1.625246 | 2.796249 | 16.953608 | 2.681671 | 0.223264 | POLYGON ((-85.25784 32.14794, -85.25924 32.145... |
3 | 4 | Bibb County | Alabama | AL | 01007 | 10.956795 | 2.631909 | 1.330318 | 1.789705 | 1.623523 | ... | 1.684205 | 22.352250 | -0.083273 | -9.176136 | 1.856455 | 1.619284 | 14.511721 | 1.887436 | 0.156487 | POLYGON ((-87.06574 33.24691, -87.02685 33.246... |
4 | 5 | Blount County | Alabama | AL | 01009 | 18.316114 | 6.689886 | 6.570682 | 1.757045 | 1.530205 | ... | 1.639023 | 18.284318 | -0.132205 | -9.861114 | 1.868750 | 3.069511 | 23.057712 | 2.413198 | 0.164411 | POLYGON ((-86.45302 34.25932, -86.44414 34.259... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3106 | 3107 | Washakie County | Wyoming | WY | 56043 | 17.864115 | 6.655069 | 8.193466 | 2.103356 | 2.002747 | ... | 2.147466 | 0.158741 | -5.406983 | -11.806500 | 2.097448 | 5.085914 | 16.041836 | 4.247988 | 0.650772 | POLYGON ((-108.54727 44.16848, -108.51002 44.1... |
3107 | 3108 | Weston County | Wyoming | WY | 56045 | 20.280563 | 5.863631 | 8.519034 | 2.067239 | 2.078767 | ... | 2.164869 | 0.597778 | -4.788625 | -10.939795 | 2.158937 | 4.700654 | 18.228163 | 3.408275 | 0.695663 | POLYGON ((-104.0545 44.18039, -104.0547 43.999... |
3108 | 3109 | None | Puerto Rico | PR | 72 | 4.800000 | 0.700000 | 5.800000 | 0.950000 | NaN | ... | NaN | 28.700000 | NaN | NaN | NaN | NaN | NaN | 10.736738 | 0.764291 | MULTIPOLYGON (((-66.5313 17.88292, -66.53401 1... |
3109 | 3110 | None | Alaska | AK | 02 | 22.300000 | 6.500000 | 5.900000 | 2.100000 | NaN | ... | NaN | NaN | -7.449000 | -12.600000 | NaN | NaN | NaN | 557.646894 | 281.718582 | MULTIPOLYGON (((-179.10934 51.30098, -179.1060... |
3110 | 3111 | None | Hawaii | HI | 15 | 9.200000 | 5.300000 | 8.100000 | 1.070000 | NaN | ... | NaN | 29.600000 | NaN | -1.100000 | NaN | NaN | NaN | 18.946659 | 1.440200 | MULTIPOLYGON (((-155.06851 19.72998, -155.0680... |
3111 rows × 23 columns
1.2 Data section in Geopandas¶
Pandas indexing and selection methods also work with geopandas
gdf_nj = gdf.loc[gdf.STATE_NAME == 'New Jersey']
gdf_nj = gdf_nj.set_index('NAME')
gdf_nj
OBJECTID | STATE_NAME | STATE_ABBR | FIPS | pr_above_n | prmax1day_ | prmax5yr_G | tavg_GWL1 | tmax1day_G | tmax_days_ | ... | tmean_jja_ | tmin_days_ | tmin_day_1 | tmin_day_2 | tmin_jja_G | pr_annual_ | pr_days_ab | SHAPE_Leng | SHAPE_Area | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAME | |||||||||||||||||||||
Atlantic County | 1741 | New Jersey | NJ | 34001 | 12.709769 | 5.173795 | 9.893641 | 1.964590 | 1.760179 | 0.464282 | ... | 1.791410 | 9.996231 | -0.214846 | -14.452641 | 1.862923 | 2.726068 | 15.623388 | 12.040877 | 0.152625 | MULTIPOLYGON (((-74.41362 39.54437, -74.414 39... |
Bergen County | 1742 | New Jersey | NJ | 34003 | 16.508688 | 5.163812 | 8.624438 | 2.108313 | 1.919000 | 0.723812 | ... | 1.930625 | 9.349438 | -0.502000 | -14.903813 | 1.896625 | 4.355973 | 13.356645 | 2.186622 | 0.065902 | MULTIPOLYGON (((-74.12046 40.85576, -74.12117 ... |
Burlington County | 1743 | New Jersey | NJ | 34005 | 17.037017 | 7.749103 | 12.298672 | 2.017741 | 1.757276 | 0.894690 | ... | 1.831379 | 8.895655 | -0.335707 | -14.852569 | 1.884414 | 4.160636 | 20.504922 | 3.030257 | 0.222051 | MULTIPOLYGON (((-74.44213 39.55354, -74.44109 ... |
Camden County | 1744 | New Jersey | NJ | 34007 | 17.107375 | 7.184875 | 12.570375 | 2.048250 | 1.841625 | 0.987312 | ... | 1.908937 | 10.710687 | -0.242750 | -14.977312 | 1.952188 | 4.326228 | 19.452632 | 1.772704 | 0.061134 | MULTIPOLYGON (((-75.08716 39.9718, -75.08736 3... |
Cape May County | 1745 | New Jersey | NJ | 34009 | 13.507833 | 5.575222 | 10.194389 | 1.884833 | 1.601944 | 0.269222 | ... | 1.729889 | 10.808944 | -0.200167 | -14.010056 | 1.793889 | 2.755018 | 18.012068 | 9.960371 | 0.069066 | MULTIPOLYGON (((-74.58412 39.30441, -74.5847 3... |
Cumberland County | 1746 | New Jersey | NJ | 34011 | 16.687061 | 7.617212 | 11.978455 | 1.981485 | 1.765152 | 0.580061 | ... | 1.856576 | 11.125091 | -0.220667 | -14.537727 | 1.934333 | 3.991424 | 21.338093 | 5.241841 | 0.135073 | MULTIPOLYGON (((-75.41831 39.41545, -75.41792 ... |
Essex County | 1747 | New Jersey | NJ | 34013 | 14.555750 | 5.560125 | 10.055875 | 2.089000 | 1.897375 | 0.731250 | ... | 1.935875 | 8.622875 | -0.493250 | -14.683375 | 1.923375 | 3.768749 | 10.525440 | 1.138257 | 0.035286 | POLYGON ((-74.32281 40.90884, -74.32247 40.908... |
Gloucester County | 1748 | New Jersey | NJ | 34015 | 16.226667 | 8.225500 | 12.450750 | 2.078042 | 1.969333 | 1.027333 | ... | 2.018917 | 13.232708 | -0.162542 | -15.102375 | 2.071208 | 4.281921 | 20.030672 | 2.053469 | 0.089123 | MULTIPOLYGON (((-75.40662 39.78106, -75.40773 ... |
Hudson County | 1749 | New Jersey | NJ | 34017 | 15.946667 | 8.062000 | 11.879333 | 2.177333 | 2.002333 | 0.906333 | ... | 2.037333 | 14.763666 | -0.102000 | -15.584333 | 2.029667 | 3.864291 | 13.227471 | 1.524890 | 0.012900 | MULTIPOLYGON (((-74.16091 40.64526, -74.16016 ... |
Hunterdon County | 1750 | New Jersey | NJ | 34019 | 19.074233 | 7.458533 | 11.135267 | 2.087933 | 1.913300 | 0.621533 | ... | 1.938633 | 4.455567 | -1.075267 | -14.064233 | 1.929167 | 4.933982 | 16.882525 | 1.775757 | 0.120479 | POLYGON ((-74.83809 40.75226, -74.82749 40.744... |
Mercer County | 1751 | New Jersey | NJ | 34021 | 17.423600 | 7.210800 | 9.612867 | 2.075733 | 1.844733 | 0.839733 | ... | 1.949600 | 8.515200 | -0.354933 | -15.028600 | 1.959933 | 4.484226 | 15.734961 | 1.534344 | 0.062769 | POLYGON ((-74.72324 40.37739, -74.72206 40.375... |
Middlesex County | 1752 | New Jersey | NJ | 34023 | 16.515458 | 7.180292 | 11.649917 | 2.084208 | 1.860917 | 0.883833 | ... | 1.975458 | 9.029208 | -0.369500 | -14.953375 | 1.981625 | 4.738598 | 13.874158 | 3.026746 | 0.085986 | MULTIPOLYGON (((-74.34076 40.48271, -74.34389 ... |
Monmouth County | 1753 | New Jersey | NJ | 34025 | 17.070813 | 7.863531 | 13.860406 | 2.005531 | 1.817750 | 0.617469 | ... | 1.890375 | 9.329031 | -0.280687 | -14.637719 | 1.907125 | 3.912146 | 15.619226 | 4.256690 | 0.130309 | MULTIPOLYGON (((-73.98408 40.41741, -73.99008 ... |
Morris County | 1754 | New Jersey | NJ | 34027 | 17.161424 | 5.485061 | 9.291970 | 2.105303 | 1.951606 | 0.370364 | ... | 1.962485 | 3.653424 | -1.411697 | -13.954576 | 1.922727 | 4.719114 | 14.513126 | 2.380815 | 0.133213 | POLYGON ((-74.50049 41.08601, -74.49997 41.085... |
Ocean County | 1755 | New Jersey | NJ | 34029 | 15.653977 | 7.068045 | 12.782796 | 1.972886 | 1.729659 | 0.614295 | ... | 1.797545 | 8.254818 | -0.323955 | -14.488704 | 1.836182 | 3.356637 | 17.491542 | 15.648201 | 0.173052 | MULTIPOLYGON (((-74.09657 40.12414, -74.09687 ... |
Passaic County | 1756 | New Jersey | NJ | 34031 | 15.799063 | 4.963187 | 9.094875 | 2.083125 | 1.851500 | 0.403250 | ... | 1.885688 | 4.598750 | -1.223063 | -13.993937 | 1.824375 | 4.650235 | 13.346869 | 1.556083 | 0.054999 | POLYGON ((-74.23429 41.14302, -74.21777 41.136... |
Salem County | 1757 | New Jersey | NJ | 34033 | 17.727826 | 9.147435 | 12.954217 | 2.074696 | 1.935739 | 0.937087 | ... | 2.003348 | 13.687652 | -0.174261 | -15.161043 | 2.073043 | 4.786771 | 22.100363 | 4.394849 | 0.092340 | MULTIPOLYGON (((-75.48224 39.65725, -75.48268 ... |
Somerset County | 1758 | New Jersey | NJ | 34035 | 18.454240 | 6.473880 | 9.478840 | 2.098480 | 1.962760 | 0.856760 | ... | 2.016280 | 6.486240 | -0.787800 | -14.321200 | 2.005080 | 4.908482 | 18.333972 | 1.699680 | 0.083972 | POLYGON ((-74.55469 40.75678, -74.55335 40.756... |
Sussex County | 1759 | New Jersey | NJ | 34037 | 16.832972 | 4.646583 | 7.895972 | 2.144833 | 1.996778 | 0.282000 | ... | 2.002611 | 2.364194 | -2.251417 | -13.478472 | 1.904444 | 4.451370 | 14.487146 | 1.685282 | 0.148757 | POLYGON ((-74.67081 41.34637, -74.6376 41.3311... |
Union County | 1760 | New Jersey | NJ | 34039 | 14.994167 | 6.106333 | 11.412667 | 2.124167 | 1.841167 | 0.860667 | ... | 1.944667 | 8.997667 | -0.469667 | -15.386333 | 1.940500 | 4.059828 | 12.067153 | 1.369704 | 0.028546 | POLYGON ((-74.36903 40.73927, -74.36647 40.737... |
Warren County | 1761 | New Jersey | NJ | 34041 | 19.093179 | 6.100571 | 7.801143 | 2.126071 | 2.035357 | 0.390464 | ... | 2.010179 | 2.703071 | -1.738714 | -13.543821 | 1.961036 | 4.861140 | 15.735885 | 1.741006 | 0.100309 | POLYGON ((-74.93889 41.06878, -74.90189 41.034... |
21 rows × 22 columns
2. Simple accessors and methods in Geopandas¶
2.1 Access the 'shape' of the data¶
Now we have our GeoDataFrame
and can start working with its geometry.
Since there was only one geometry column in the NCA dataset, this column automatically becomes the active geometry and spatial methods used on the GeoDataFrame
will be applied to the "geometry"
column.
gdf_nj.geometry
NAME Atlantic County MULTIPOLYGON (((-74.41362 39.54437, -74.414 39... Bergen County MULTIPOLYGON (((-74.12046 40.85576, -74.12117 ... Burlington County MULTIPOLYGON (((-74.44213 39.55354, -74.44109 ... Camden County MULTIPOLYGON (((-75.08716 39.9718, -75.08736 3... Cape May County MULTIPOLYGON (((-74.58412 39.30441, -74.5847 3... Cumberland County MULTIPOLYGON (((-75.41831 39.41545, -75.41792 ... Essex County POLYGON ((-74.32281 40.90884, -74.32247 40.908... Gloucester County MULTIPOLYGON (((-75.40662 39.78106, -75.40773 ... Hudson County MULTIPOLYGON (((-74.16091 40.64526, -74.16016 ... Hunterdon County POLYGON ((-74.83809 40.75226, -74.82749 40.744... Mercer County POLYGON ((-74.72324 40.37739, -74.72206 40.375... Middlesex County MULTIPOLYGON (((-74.34076 40.48271, -74.34389 ... Monmouth County MULTIPOLYGON (((-73.98408 40.41741, -73.99008 ... Morris County POLYGON ((-74.50049 41.08601, -74.49997 41.085... Ocean County MULTIPOLYGON (((-74.09657 40.12414, -74.09687 ... Passaic County POLYGON ((-74.23429 41.14302, -74.21777 41.136... Salem County MULTIPOLYGON (((-75.48224 39.65725, -75.48268 ... Somerset County POLYGON ((-74.55469 40.75678, -74.55335 40.756... Sussex County POLYGON ((-74.67081 41.34637, -74.6376 41.3311... Union County POLYGON ((-74.36903 40.73927, -74.36647 40.737... Warren County POLYGON ((-74.93889 41.06878, -74.90189 41.034... Name: geometry, dtype: geometry
# Geometry is represented with vectors, like polygons, points, lines.
gdf_nj.loc['Middlesex County'].geometry
2.2 Measuring area¶
To measure the area of each polygon (or MultiPolygon in this specific case), access the GeoDataFrame.area
attribute, which returns a pandas.Series
. Note that GeoDataFrame.area
is just GeoSeries.area
applied to the active geometry column.
gdf_nj.area
/var/folders/nq/3b935p450gg40kwwv4yk6yrm0000gn/T/ipykernel_1111/1692552597.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. gdf_nj.area
NAME Atlantic County 0.152625 Bergen County 0.065902 Burlington County 0.222051 Camden County 0.061134 Cape May County 0.069066 Cumberland County 0.135073 Essex County 0.035286 Gloucester County 0.089123 Hudson County 0.012900 Hunterdon County 0.120479 Mercer County 0.062769 Middlesex County 0.085986 Monmouth County 0.130309 Morris County 0.133213 Ocean County 0.173052 Passaic County 0.054999 Salem County 0.092340 Somerset County 0.083972 Sussex County 0.148757 Union County 0.028546 Warren County 0.100309 dtype: float64
Each GeoSeries
has its Coordinate Reference System (CRS) accessible at GeoSeries.crs
. The CRS tells GeoPandas where the coordinates of the geometries are located on the earth's surface.
The CRS of the NCA data is geographic, which means that the coordinates are in latitude and longitude.
gdf_nj.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
To give correct measure of area, we convert the coordinates to a CRS with coordinates in meters.
gdf_nj_proj = gdf_nj.to_crs('EPSG:4978')
Now the distance is showing in area m$^2$.
gdf_nj_proj.area
NAME Atlantic County 9.270863e+08 Bergen County 4.039547e+08 Burlington County 1.352259e+09 Camden County 3.720313e+08 Cape May County 4.184738e+08 Cumberland County 8.198323e+08 Essex County 2.160640e+08 Gloucester County 5.420991e+08 Hudson County 7.897046e+07 Hunterdon County 7.367924e+08 Mercer County 3.832373e+08 Middlesex County 5.254540e+08 Monmouth County 7.954710e+08 Morris County 8.159334e+08 Ocean County 1.054102e+09 Passaic County 3.371888e+08 Salem County 5.612408e+08 Somerset County 5.135769e+08 Sussex County 9.125082e+08 Union County 1.746307e+08 Warren County 6.144339e+08 dtype: float64
2.3 Getting polygon boundary and centroid¶
To get the boundary of each polygon (LineString), access the GeoDataFrame.boundary
:
gdf_nj["boundary"] = gdf_nj.boundary
gdf_nj["boundary"]
NAME Atlantic County MULTILINESTRING ((-74.41362 39.54437, -74.414 ... Bergen County MULTILINESTRING ((-74.12046 40.85576, -74.1211... Burlington County MULTILINESTRING ((-74.44213 39.55354, -74.4410... Camden County MULTILINESTRING ((-75.08716 39.9718, -75.08736... Cape May County MULTILINESTRING ((-74.58412 39.30441, -74.5847... Cumberland County MULTILINESTRING ((-75.41831 39.41545, -75.4179... Essex County LINESTRING (-74.32281 40.90884, -74.32247 40.9... Gloucester County MULTILINESTRING ((-75.40662 39.78106, -75.4077... Hudson County MULTILINESTRING ((-74.16091 40.64526, -74.1601... Hunterdon County LINESTRING (-74.83809 40.75226, -74.82749 40.7... Mercer County LINESTRING (-74.72324 40.37739, -74.72206 40.3... Middlesex County MULTILINESTRING ((-74.34076 40.48271, -74.3438... Monmouth County MULTILINESTRING ((-73.98408 40.41741, -73.9900... Morris County LINESTRING (-74.50049 41.08601, -74.49997 41.0... Ocean County MULTILINESTRING ((-74.09657 40.12414, -74.0968... Passaic County LINESTRING (-74.23429 41.14302, -74.21777 41.1... Salem County MULTILINESTRING ((-75.48224 39.65725, -75.4826... Somerset County LINESTRING (-74.55469 40.75678, -74.55335 40.7... Sussex County LINESTRING (-74.67081 41.34637, -74.6376 41.33... Union County LINESTRING (-74.36903 40.73927, -74.36647 40.7... Warren County LINESTRING (-74.93889 41.06878, -74.90189 41.0... Name: boundary, dtype: geometry
gdf_nj.loc['Middlesex County'].boundary
Since we have saved boundary as a new column, we now have two geometry columns in the same GeoDataFrame
.
We can also create new geometries, which could be, for example, the centroid:
gdf_nj_proj["centroid"] = gdf_nj_proj.centroid
gdf_nj_proj["centroid"]
NAME Atlantic County POINT (1302613.33969 -4754167.88308) Bergen County POINT (1323156.11488 -4638279.65256) Burlington County POINT (1296007.84024 -4726891.59845) Camden County POINT (1273472.87199 -4738689.51246) Cape May County POINT (1297945.34508 -4779588.94579) Cumberland County POINT (1268629.69322 -4771293.13017) Essex County POINT (1312793.54877 -4654378.18127) Gloucester County POINT (1260233.80317 -4748675.77869) Hudson County POINT (1327653.01622 -4653751.58639) Hunterdon County POINT (1262955.38748 -4684703.35002) Mercer County POINT (1285555.53001 -4699754.49198) Middlesex County POINT (1306222.07875 -4682485.51879) Monmouth County POINT (1324996.26792 -4690605.4728) Morris County POINT (1287304.17732 -4655921.85804) Ocean County POINT (1325734.89268 -4716266.73643) Passaic County POINT (1303708.49472 -4638331.88768) Salem County POINT (1245204.96526 -4761922.60661) Somerset County POINT (1287205.95546 -4678385.07014) Sussex County POINT (1270075.08399 -4639707.88837) Union County POINT (1310269.13698 -4664724.44511) Warren County POINT (1250556.7978 -4666294.33626) Name: centroid, dtype: geometry
gdf_nj["centroid"] = gdf_nj.centroid
gdf_nj["centroid"]
/var/folders/nq/3b935p450gg40kwwv4yk6yrm0000gn/T/ipykernel_1111/3860640333.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. gdf_nj["centroid"] = gdf_nj.centroid
NAME Atlantic County POINT (-74.67735 39.48279) Bergen County POINT (-74.07824 40.96239) Burlington County POINT (-74.66766 39.87828) Camden County POINT (-74.95778 39.80156) Cape May County POINT (-74.80712 39.15175) Cumberland County POINT (-75.11028 39.37396) Essex County POINT (-74.24861 40.7883) Gloucester County POINT (-75.13712 39.71375) Hudson County POINT (-74.07725 40.74073) Hunterdon County POINT (-74.91224 40.5673) Mercer County POINT (-74.70174 40.28344) Middlesex County POINT (-74.41303 40.43831) Monmouth County POINT (-74.2261 40.25884) Morris County POINT (-74.54452 40.86201) Ocean County POINT (-74.29942 39.91042) Passaic County POINT (-74.30083 41.03439) Salem County POINT (-75.34574 39.58701) Somerset County POINT (-74.61633 40.56351) Sussex County POINT (-74.69084 41.13931) Union County POINT (-74.31056 40.66027) Warren County POINT (-74.99735 40.85715) Name: centroid, dtype: geometry
2.5 Measuring distance¶
We can also measure how far each centroid is from the first centroid location.
point_m = gdf_nj_proj["centroid"].loc['Middlesex County']
gdf_nj["distance"] = gdf_nj_proj["centroid"].distance(point_m)
gdf_nj["distance"]/1000
NAME Atlantic County 71.773145 Bergen County 47.338359 Burlington County 45.565673 Camden County 65.049208 Cape May County 97.455528 Cumberland County 96.436400 Essex County 28.865319 Gloucester County 80.598213 Hudson County 35.845836 Hunterdon County 43.323497 Mercer County 26.931834 Middlesex County 0.000000 Monmouth County 20.454922 Morris County 32.611579 Ocean County 39.011800 Passaic County 44.225120 Salem County 100.166557 Somerset County 19.453191 Sussex County 56.004740 Union County 18.216323 Warren County 57.972217 Name: distance, dtype: float64
Note that geopandas.GeoDataFrame
is a subclass of pandas.DataFrame
, so we have all the pandas functionality available to use on the geospatial dataset — we can even perform data manipulations with the attributes and geometry information together.
For example, to calculate the average of the distances measured above, access the 'distance' column and call the mean() method on it:
gdf_nj["distance"].mean()/1000
np.float64(48.91902184971719)
3. Plotting Data on Maps¶
GeoPandas can also plot maps, so we can check how the geometries appear in space. To plot the active geometry, call GeoDataFrame.plot()
. To color code by data, pass in that column as the first argument.
3.1 Plotting individual columns¶
gdf_nj.plot()
<Axes: >
gdf_nj.plot('tavg_GWL1', legend = True)
<Axes: >
You can also explore your data interactively using GeoDataFrame.explore()
, which behaves in the same way plot()
does but returns an interactive map instead.
gdf_nj.explore('tavg_GWL1',legend=True)
3.2 Switching active geometry¶
Switching the active geometry (GeoDataFrame.set_geometry
) to centroids, we can plot the same data using point geometry.
gdf_nj = gdf_nj.set_geometry('boundary')
gdf_nj.plot('tavg_GWL1', legend = True)
<Axes: >
And we can also layer both GeoSeries
on top of each other. We just need to use one plot as an axis for the other.
ax = gdf_nj["geometry"].plot()
gdf_nj["centroid"].plot(ax=ax, color="black")
<Axes: >
Now we set the active geometry back to the original GeoSeries
.
gdf_nj = gdf_nj.set_geometry("geometry")
4. Vector-based analysis in Geopandas¶
4.1 Buffer¶
In other cases, we may need to buffer the geometry using GeoDataFrame.buffer()
. Geometry methods are automatically applied to the active geometry, but we can apply them directly to any GeoSeries
as well. Let's buffer the boroughs and their centroids and plot both on top of each other.
# buffering the active geometry by 0.1 degrees
gdf_nj["buffered"] = gdf_nj.buffer(0.1)
gdf_nj["buffered_centroid"] = gdf_nj["centroid"].buffer(0.05)
/var/folders/nq/3b935p450gg40kwwv4yk6yrm0000gn/T/ipykernel_1111/3355730635.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. gdf_nj["buffered"] = gdf_nj.buffer(0.1) /var/folders/nq/3b935p450gg40kwwv4yk6yrm0000gn/T/ipykernel_1111/3355730635.py:4: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. gdf_nj["buffered_centroid"] = gdf_nj["centroid"].buffer(0.05)
# saving the first plot as an axis and setting alpha (transparency) to 0.5
ax = gdf_nj["buffered"].plot(alpha=0.5)
# passing the first plot as an axis to the second
gdf_nj["buffered_centroid"].plot(ax=ax, color="red", alpha=0.5)
# # passing the first plot and setting linewidth to 0.5
gdf_nj["boundary"].plot(ax=ax, color="white", linewidth=0.5)
<Axes: >
4.2 Geometry relations¶
We can also ask about the spatial relations of different geometries. Using the geometries above, we can check which of the buffered counties intersect the original geometry of Middlesex, i.e., is within 0.1 degree difference from Middlesex.
First, we get a polygon of Middlesex.
middlesex = gdf_nj.loc["Middlesex County", "geometry"]
middlesex
The polygon is a shapely geometry object, as any other geometry used in GeoPandas.
type(middlesex)
shapely.geometry.multipolygon.MultiPolygon
Then we can check which of the geometries in gdf["buffered"]
intersects it.
gdf_nj["buffered"].intersects(middlesex)
NAME Atlantic County False Bergen County False Burlington County False Camden County False Cape May County False Cumberland County False Essex County True Gloucester County False Hudson County True Hunterdon County False Mercer County True Middlesex County True Monmouth County True Morris County True Ocean County True Passaic County False Salem County False Somerset County True Sussex County False Union County True Warren County False dtype: bool
5. Integration with Cartopy¶
By default, geopandas plots the data in the CRS projection of the data. To make it consistent with the defined projection in cartopy, we could convert the projection geopandas dataframe to the projection of crs using to_crs
.
import numpy as np
from matplotlib import pyplot as plt
from cartopy import crs as ccrs
import cartopy
central_lat = 37.5
central_lon = -96
extent = [-120, -70, 20, 50.5]
central_lon = np.mean(extent[:2])
central_lat = np.mean(extent[2:])
crs = ccrs.AlbersEqualArea(central_lon, central_lat)
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=crs)
ax.set_extent(extent)
### Get the crs of the map.
crs_proj4 = crs.proj4_init
gdf.to_crs(crs_proj4).plot('tavg_GWL1', vmin = 1, vmax = 2.5, ax = ax)
ax.add_feature(cartopy.feature.STATES, edgecolor='black')
ax.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x36ffe3d00>
6. Assignment: Environmental Justice of Air Pollution¶
In this assignment, we will use block-group level environmental and socioeconomic data from EPA's justice mapping and screening tool (EJScreen): https://www.epa.gov/ejscreen/what-ejscreen.
6.1 Data Preparation¶
Read in two files: (1) A shapefile (NYC_Metro_Areas.shp) that saves the geographic boundaries of block groups in NYC metroplitan area, downloaded from US Census Bureau. (2) A csv file that saves block-group environmental and socioeconomic data (EJSCREEN_2024_BG_NY_Metro.csv) in NYC metroplitan area, downloaded from EPA EJScreen.
import pandas as pd
gdf_nyc = gpd.read_file('./NYC_Metro_Areas/NYC_Metro_Areas.shp')
gdf_nyc.head()
GEOID | GISJOIN | STATEFP | COUNTYFP | TRACTCE | BLKGRPCE | NAMELSAD | MTFCC | FUNCSTAT | ALAND | AWATER | INTPTLAT | INTPTLON | Shape_Leng | Shape_Area | ORIG_FID | CBSA_NAME | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 340030010011 | G34000300010011 | 34 | 003 | 001001 | 1 | Block Group 1 | G5030 | S | 1779259.0 | 6974.0 | +41.0362288 | -074.1254590 | 5715.540877 | 1.786235e+06 | 198695 | New York-Newark-Jersey City, NY-NJ-PA | POLYGON ((-74.11461 41.04346, -74.11465 41.042... |
1 | 340030010012 | G34000300010012 | 34 | 003 | 001001 | 2 | Block Group 2 | G5030 | S | 1185224.0 | 4064.0 | +41.0283826 | -074.1212555 | 6123.969679 | 1.189289e+06 | 198692 | New York-Newark-Jersey City, NY-NJ-PA | POLYGON ((-74.12075 41.02008, -74.12107 41.020... |
2 | 340030010013 | G34000300010013 | 34 | 003 | 001001 | 3 | Block Group 3 | G5030 | S | 992221.0 | 11718.0 | +41.0417750 | -074.1331923 | 5280.841112 | 1.003937e+06 | 198693 | New York-Newark-Jersey City, NY-NJ-PA | POLYGON ((-74.12133 41.04619, -74.12169 41.045... |
3 | 340030010021 | G34000300010021 | 34 | 003 | 001002 | 1 | Block Group 1 | G5030 | S | 2099454.0 | 11788.0 | +41.0374474 | -074.1399782 | 8180.085757 | 2.111240e+06 | 198699 | New York-Newark-Jersey City, NY-NJ-PA | POLYGON ((-74.13421 41.0345, -74.13481 41.0336... |
4 | 340030010022 | G34000300010022 | 34 | 003 | 001002 | 2 | Block Group 2 | G5030 | S | 1965294.0 | 19080.0 | +41.0247698 | -074.1387861 | 7198.572328 | 1.984376e+06 | 198648 | New York-Newark-Jersey City, NY-NJ-PA | POLYGON ((-74.12589 41.02185, -74.12651 41.021... |
df_ej = pd.read_csv('EJSCREEN_2024_BG_NY_Metro.csv')
df_ej.head()
ID | STATE_NAME | ST_ABBREV | CNTY_NAME | REGION | ACSTOTPOP | ACSIPOVBAS | ACSEDUCBAS | ACSTOTHH | ACSTOTHU | ... | AREALAND | AREAWATER | NPL_CNT | TSDF_CNT | EXCEED_COUNT_80 | EXCEED_COUNT_80_SUP | DEMOGIDX_2ST | DEMOGIDX_5ST | Shape_Length | Shape_Area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 340030010011 | New Jersey | NJ | Bergen County | 2 | 1415.0 | 1338.0 | 1002.0 | 490.0 | 612.0 | ... | 1779259.0 | 6974.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.723157 | 0.741396 | 0.058446 | 0.000191 |
1 | 340030010012 | New Jersey | NJ | Bergen County | 2 | 888.0 | 888.0 | 508.0 | 282.0 | 282.0 | ... | 1185224.0 | 4064.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.307844 | 0.546122 | 0.062312 | 0.000127 |
2 | 340030010013 | New Jersey | NJ | Bergen County | 2 | 1031.0 | 1031.0 | 707.0 | 314.0 | 314.0 | ... | 992221.0 | 11718.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.576264 | 0.627817 | 0.053049 | 0.000108 |
3 | 340030010021 | New Jersey | NJ | Bergen County | 2 | 1604.0 | 1604.0 | 1041.0 | 467.0 | 493.0 | ... | 2099454.0 | 11788.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.210585 | 0.508677 | 0.084406 | 0.000226 |
4 | 340030010022 | New Jersey | NJ | Bergen County | 2 | 1879.0 | 1879.0 | 1291.0 | 718.0 | 718.0 | ... | 1965294.0 | 19080.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.733815 | 0.724952 | 0.077853 | 0.000212 |
5 rows × 231 columns
6.2 Add four columns in gdf_nyc
using the data from df_ej
: (1) Percentage of people of color (PEOPCOLORPCT); (2) Percentage of people with low income (LOWINCPCT); (3) particular matter air pollution in µg/m$^3$ (PM25); (4) NO2 pollution in ppbv (NO2).¶
Hint: Note GEOID in gdf_nyc
should be the same as ID in df_ej
. You need to set the index of both dataframes to be the ID, and then join the data together:
gdf_nyc[['PEOPCOLORPCT','PM25','LOWINCPCT','NO2']] = df_ej[['PEOPCOLORPCT','PM25','LOWINCPCT','NO2']]
6.3 Make a 2x2 panel maps that show the above four columns.¶
Hint: Use matplotlib to create multiple axes. You can add figure elements like title, unit to make the figures more legible.
6.4 Use the explore
function to visually examine the spatial patterns of air pollution. Where do you see the highest NO2 and PM2.5 pollution?¶
6.5 Find top 10 block groups with highest PM2.5 pollution. Label the centroids of these block groups on the PM2.5 map.¶
Repeat this for NO2 pollution.
Hint: Use sort_values
function to sort the dataframe.
6.6 Group the data by percentage of people of color, calculate the mean of PM2.5 and NO2 pollution in each group, and make two plots showing the variations of air pollution with percentage of people of color. What do you find?¶
Hint: Round PEOPCOLORPCT
to the first decimal place to reduce the number of groups.
6.7 Group the data by percentage of people of low income, calculate the mean of PM2.5 and NO2 pollution in each group, and make two plots showing the variations of air pollution with percentage of people of color. What do you find?¶
6.8 EPA's EJScreen data include 13 environmental indicators. Choose two other environmental indicators. Follow the approach described above, conduct EJ analysis for these two indicators.¶