Skip to main content

Science Examples

In this chapter you will find a variety of real-world examples that may help you create a custom Jupyter Notebook for your research. These notebooks have dataset specific analysis and visualization examples, and will also provide examples of analysis, data access, tabulation and map-visualization.

AGB reference maps in R using airborne laser scanning (ALS)

This workflow is a demonstration of one possible approach for creating aboveground biomass reference maps in R using airborne laser scanning (ALS) and field plot inventory data collected over plots located across the Canal Zone in central Panama. This version tests the approach over two flightlines of ALS collected at Panama Pacifico.

It is highly recommended to run this tutorial within the "Rtools add-on" environment, which already includes useful R packages, such as lidR. Running this tutorial outside of this environment may lead to errors.

Using local relationships between forest inventory data and ALS, this workflow demonstrates ALS processing and generation of some illustratory plots for ALS, canopy products, and plot shapefiles. Users of this workflow will:

  1. Prepare the workspace.
  2. Access the ALS data and read it into a workspace.
  3. Process the ALS data to produce terrain and canopy products and ALS metrics.
  4. Access the field plot data and associated geospatial data.
  5. Generate figures illustrating the datasets.

Prepare the workspace

  • Load the necessary packages in R:

install.packages('httr2')
install.packages('randomForest')
install.packages('ggpubr')
install.packages('RStoolbox')
install.packages('RCSF')
install.packages('lasR', repos = c('https://r-lidar.r-universe.dev',
'https://cloud.r-project.org'),
dependencies = TRUE)
install.packages('remotes')
remotes::install_github('njtierney/broomstick')
install.packages('tidyterra')
# load in the required R packages

suppressMessages({
library(broom)
library(broomstick)
library(ggpubr)
library(sf)
library(future)
library(terra)
library(lidR)
library(RCSF)
library(RStoolbox)
library(httr2)
library(randomForest)
library(tidyterra)
library(tidyverse)
library(lasR)

})

Access the ALS data and read it into the workspace as a LAScatalog


# make the subdirectories if they don't exist

root_dir = "./data/ALS/BCI_Gigante"

dir_list = c(
paste0(root_dir, "/output/_1_classified_points"),
paste0(root_dir, "/output/_2_DTM_raster"),
paste0(root_dir, "/output/_3_normalized_points"),
paste0(root_dir, "/output/_4_CHM_raster"),
paste0(root_dir, "/output/_5_metrics_raster")
)

for (i in seq_along(dir_list)) {

dir_path = dir_list[i]

ifelse(!dir.exists(file.path(dir_path)),
dir.create(file.path(dir_path), recursive = TRUE),
FALSE)

}

Process the ALS data to produce terrain and canopy products

This workflow produces five levels of products, including two transformations of the point cloud and three categories of raster data:

Classified_points

In the first step, read in the ALS flightlines from their source; performing some filtering on read. Only read one point per 2x2 cm voxel, drop points below the lowest elevation expected at the site, only read points with scan angles +/- 15 degrees from nadir, and drop likely noise points with very high numbers of returns or return numbers. Next, classify isolated points occurring above or below the canopy as noise. These points will affect canopy metrics downstream, and should be removed. They can represent real returns from birds or aerosols or can be artifacts, especially near water features. Next, classify ground returns using a cloth simulation filter algorithm. Ground classification is necessary to model terrain because point cloud Z values represent absolute elevation rather than canopy height. To quantify canopy height, it is necessary to subtract the elevation of terrain.

DTM_raster

From classified ground returns, produce a raster of terrain elevations by producing a triangualted irregular network (TIN) from the lowest value within each pixel. Also perform some smoothing to produce a realistic model for canopy height normalization.

Normalized_points

Subtract the terrain value from the Z values in the point cloud to produce a new point cloud where Z represents normalized canopy height.

CHM_raster

Subtract the terrain value from the Z values in the point cloud to produce a new point cloud where Z represents normalized canopy height.

Metrics_raster

Generate canopy metrics as predictors for modeling of aboveground biomass at a pixel resolution of 50 m:

  • z_p99: the height below which 99% of returns occur,
  • z_mean: the mean height of returns,
  • z_cv: the coefficient of variation of return heights,
  • z_above5: canopy cover above 5 m,
  • a_mean: mean absolute scan angle,
  • z_count: the number of returns.

Then generate derived metrics:

  • slope: the mean slop derived from the DTM,
  • elevation: the mean elevation from the DTM,
  • volume: canopy volume as the sum of pixel heights from the smoothed CHM,
  • LAI: leaf area index above 5 m; derived from canopy cover adjusted for scan angle

# _1_classified_points
path_als = paste0(root_dir, "/input")

if (length(list.files(dir_list[1])) == 0) {

# only read one point per 2x2 cm voxel,
# drop points below the lowest elevation expected at the site
# only read points with scan angles +/- 15 degrees from nadir,
# drop likely noise points with very high numbers of returns or return numbers
pipeline = reader(filter = "-thin_with_voxel .02
-keep_z_above 20
-drop_abs_scan_angle_above 15
-keep_return 1 2 3 4 5 6 7
-drop_number_of_returns 15
-drop_number_of_returns 14
-drop_number_of_returns 13
-drop_number_of_returns 12
-drop_number_of_returns 11
-drop_number_of_returns 10
-drop_number_of_returns 9
-drop_number_of_returns 8
-drop_withheld"
) +


# classify points as noise if they have fewer than 200 points in their surrounding 10x10 m voxels,
# for big clouds of above- or belowground noise
classify_with_ivf(res = 10, n = 200L, class = 18L) +

# classify points as noise if they have fewer than 30 points in their surrounding 5x5 m voxels
classify_with_ivf(res = 5, n = 30L, class = 18L) +

# classify ground points with a cloth simulation filter algorithm
classify_with_csf(slope_smooth = FALSE,
class_threshold = 0.05,
cloth_resolution = 0.3,
rigidness = 3L,
iterations = 500L,
time_step = 0.6,
class = 2L,
# only let the algorithm use last returns that aren't noise
filter = "Classification != 18") +

# write the point clouds with ground and noise points classified
write_las(paste0(root_dir, "/output/_1_classified_points/*_ground.las"))


exec(pipeline,
on = path_als,
ncores = 16,
with = list(chunk = 600),
progress = FALSE)

}


# _2_DTM_raster

# create a point cloud of only the lowest ground point in each 50 cm grid cell
filter_with_grid(1, operator = "min", filter = keep_ground()) +

# generate a triangulation of these points
(tri = triangulate(max_edge = 120, filter = "",
#use_low = TRUE,
use_attribute = "Z")) +

# rasterize the triangulation at a 1-m resolution
(dtm = rasterize(1, tri, ofile = paste0(root_dir, "/output/_2_DTM_raster/DTM.tif"))) +

# apply a laplachian filter to replace spikes and pits with median values
(spikefree1 = pit_fill(dtm,
lap_size = 4L,
thr_lap = 0.05,
thr_spk = -0.05,
med_size = 4L,
dil_radius = 1L,
ofile = "")) +

# perform this filter again at a coarser scale to capture larger spikes
pit_fill(spikefree1,
lap_size = 9L,
thr_lap = 0.2,
thr_spk = -0.08,
med_size = 9L,
dil_radius = 1L,
ofile = paste0(root_dir, "/output/_2_DTM_raster/DTM_spikefree.tif"))



lasR::exec(pipeline,
on = paste0(root_dir, "/output/_1_classified_points"),
ncores = 16,
progress = FALSE)

DTM = rast(paste0(root_dir, "/output/_2_DTM_raster/DTM_spikefree.tif"))
DTM[!is.finite(DTM)] = NA

#smooth the DTM a little bit
DTM_smooth = DTM %>%
terra::crop(ext(readLAScatalog(path_als))) %>%
terra::focal(w=5, fun="median", na.policy="only", na.rm=TRUE) |>
terra::focal(w = terra::focalMat(DTM, .5, "Gauss"), na.policy="all", na.rm=FALSE,
filename = paste0(root_dir, "/output/_2_DTM_raster/DTM_smooth.tif"), overwrite = TRUE)

}
# _3_normalize_points
if (length(list.files(dir_list[3])) == 0) {

pipeline =

lasR::reader_las(filter = "-drop_class 18") +

callback(fun = function(data) {
data |> dplyr::mutate(ScanAngle = abs(ScanAngle / 166.666667))
},
expose = "a") +

(dtm_smooth = load_raster(paste0(root_dir, "/output/_2_DTM_raster/DTM_smooth.tif"))) +

transform_with(dtm_smooth, "-") +

write_las(paste0(root_dir, "/output/_3_normalized_points/*_norm.las"))


exec(pipeline,
on = paste0(root_dir, "/output/_1_classified_points"),
ncores = 20,
with = list(buffer = 10),
progress = FALSE)

}


# _4_CHM_raster
if (length(list.files(dir_list[4])) == 0) {


pipeline =

lasR::reader_las(filter = "-drop_class 18") +

chm(res = 1, tin = FALSE, ofile = paste0(root_dir, "/output/_4_CHM_raster/CHM.tif")) +

delete_points(keep_z_above(0)) +
rasterize(50, operators = c("z_p99", "z_mean", "z_cv", "z_above5", "a_mean", "z_count"),
ofile = paste0(root_dir, "/output/_5_metrics_raster/metrics_50m.tif"))


exec(pipeline,
on = paste0(root_dir, "/output/_3_normalized_points"),
ncores = 12,
progress = FALSE)

# make a smooth, clean CHM for visualization and volume extraction
CHM = rast(paste0(root_dir, "/output/_4_CHM_raster/CHM.tif"))
DTM_smooth = rast(paste0(root_dir, "/output/_2_DTM_raster/DTM_smooth.tif"))

CHM_smooth = CHM %>%
# any values below 0, set to 0
terra::clamp(lower = 0, values = TRUE) %>%
# fill in NA values with their neighbors' min value
terra::focal(w=3, fun="min", na.policy="only", na.rm=TRUE, expand = FALSE) %>%
ifel(!is.na(crop(DTM, .)) & is.na(.), 0, .) %>%
# apply a median filter only to cells where there are big changes in Z value
ifel(terra::terrain(., "TRI") > 5,
terra::focal(., w = 3, fun = "median", na.policy = "omit", na.rm = TRUE, expand = FALSE),
., filename = paste0(root_dir, "/output/_4_CHM_raster/CHM_smooth.tif"), overwrite = TRUE)
}
# _5_metrics_raster
# the metrics produced in the previous step, CHM, and DTM
metrics_50m = rast(paste0(root_dir, "/output/_5_metrics_raster/metrics_50m.tif")) |>
# number of points / (50 * 50 m) to get point density
mutate(density = z_count / 2500) |>
# at least 12 pts / m^2 and mean height above 2 m (forest)
filter(density > 10 & z_mean > 2)

CHM_smooth = rast(paste0(root_dir, "/output/_4_CHM_raster/CHM_smooth.tif")) |> crop(metrics_50m)
DTM_smooth = rast(paste0(root_dir, "/output/_2_DTM_raster/DTM_smooth.tif")) |> crop(metrics_50m)

#---------# slope
# produce slopes from 1 m DTM, then average over 50 m pixels
slope = terra::terrain(DTM_smooth, v = "slope") %>%
terra::resample(metrics_50m, method = "average")

#---------# roughness
# produce terrain roughness from 1 m DTM, then average over 50 m pixels
elevation = DTM_smooth %>%
terra::resample(metrics_50m, method = "average")

#---------# volume
# sum canopy height volume over each 50 m pixel
volume = terra::resample(CHM_smooth, metrics_50m, method = "sum")

#---------# LAI
# convert fractional cover to LAI, according to scan angle
# LAI cannot be calculated with a fractional cover of 100%
# for pixels with no points below the threshold, add 1 artificial point below it
cover_new = metrics_50m$z_count / (metrics_50m$z_count + 1)

cover = metrics_50m$z_above5 %>%
ifel(. == 1, cover_new, .)

LAI = (-1/.5) * log(1 - cover) * cos(metrics_50m$a_mean /180*pi)
# combine the metrics of interest as predictors
predictors = c(metrics_50m$z_p99, volume, LAI, metrics_50m$z_cv, slope, elevation)
names(predictors) = c("Z_p99", "Volume", "LAI_5m", "Z_CV", "Slope", "Elevation")
names(predictors)

writeRaster(predictors, filename = paste0(root_dir, "/output/_5_metrics_raster/predictors_50m.tif"),
overwrite = TRUE)

Access the field plot data and associated geospatial data

Read in plot shapefiles as sf (simple feature) object in the sf package.


small_plots = read_sf('data/GIS/CTFS_Plots_Polygons.shp') |>
mutate(NAME = DESC_,
ORIGIN_X = Origin_X,
ORIGIN_Y = Origin_Y) |>
dplyr::select(NAME, AREA_HA, ORIGIN_X, ORIGIN_Y, geometry) |>
st_set_crs(32617)

plots = small_plots |>
filter(NAME %in% c("PanamaPacifico"))

# this buffered version will be used for cropping raster data for plotting
plots_buf_200m = plots |>
st_centroid() |>
st_buffer(200) |>
suppressWarnings()

Generate figures illustrating the datasets


# define some variables for plotting point cloud transects in ggplot

theme_transect = theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 1),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_text(margin = ggplot2::margin(16, 3, 3, 3)),
axis.title.y = element_text(margin = ggplot2::margin(3, 16, 3, 3)),
legend.position = "inside",
legend.position.inside = c(.987, .95),
legend.justification = c(1, 1),
panel.background = element_rect(fill = "white"),
legend.key = element_rect(fill = "white", color = "white"),
legend.title = element_blank(),
legend.direction = "vertical",
legend.key.width = unit(.3, "in"),
legend.key.height = unit(.3, "in"),
legend.background = element_rect(color = "black", fill = "white", linewidth = .5),
legend.margin = ggplot2::margin(6, 6, 6, 6))
# illustrate a cross section of the point cloud that passes through the center of a plot

centroid = st_centroid(plots[1])

# centroid of the plot, plus and minus 100 m longitude
point1 = c(centroid$geometry[[1]][1] - 150, centroid$geometry[[1]][2])
point2 = c(centroid$geometry[[1]][1] + 150, centroid$geometry[[1]][2])
Warning message:
“st_centroid assumes attributes are constant over geometries”
transect = data.frame(lat = c(point1[2], point2[2]),
lon = c(point1[1], point2[1]),
type = "transect") |>
st_as_sf(coords = c("lon", "lat"), crs = 32617) |>
dplyr::group_by(type) %>%
dplyr::summarise() %>%
st_cast("LINESTRING")
# identify tile that overlaps with transect
path_transect = (readLAScatalog(paste0(root_dir, "/output/_1_classified_points/")) |>
catalog_intersect(transect))$filename

las = readLAS(path_transect)
las_tr = clip_transect(las, point1, point2, width = 2, xz = TRUE)
range_las_tr = max(las_tr$Z) - min(las_tr$Z)

fig = ggplot(payload(las_tr), aes(X, Z, color = factor(Classification))) +
geom_point(size = 1.8, alpha = .7, shape = 20) +
geom_point(data = filter(payload(las_tr), Classification == 18),
size = 5, alpha = .2) +
coord_equal() +
theme_bw(base_size = 15) +
scale_x_continuous(expand = expansion(mult = c(.05, .05)),
breaks = seq(0, 250, 25)) +
scale_y_continuous(breaks = seq(0, 250, 25),
limits = c(min(las_tr$Z) + -15, max(las_tr$Z) + 20)) +
theme_transect +
labs(x = "Transect distance (m)", y = "Elevation (m)") +
scale_color_manual(labels = c("0" = "Unclassified",
"2" = "Ground",
"18" = "Noise"),
values = c("darkseagreen", "red4", "blueviolet"))

ggsave(plot = fig,
filename = "./figures/cross_section_classified.jpeg",
device = jpeg,
width = 14,
height = 5,
units = "in",
dpi = 300,
bg = "white")

# plot classified points
IRdisplay::display_jpeg(file = "./figures/cross_section_classified.jpeg")

A green and red map AI-generated content may be incorrect.

A cross section of a point cloud illustrating points classified as ground with the cloth simulation filter.


# plot transect canopy height
path_transect = (readLAScatalog(paste0(root_dir, "/output/_3_normalized_points/")) |>
catalog_intersect(transect))$filename

las = readLAS(path_transect)
las_tr_norm = clip_transect(las, point1, point2, width = 2, xz = TRUE)


fig = ggplot(payload(las_tr_norm), aes(X,Z)) +
geom_point(size = 1.8, alpha = .7, shape = 20, color = "darkseagreen") +
coord_equal() +
theme_bw(base_size = 15) +
scale_x_continuous(expand = expansion(mult = c(.05, .05)),
breaks = seq(0, 250, 50)) +
scale_y_continuous(expand = expansion(mult = c(.15, .15)),
breaks = seq(0, 250, 25),
limits = c(-15,
range_las_tr + 20)) +
theme_transect +
labs(x = "Transect distance (m)", y = "Canopy height (m)") +
guides(legend.position = "none")

ggsave(plot = fig,
filename = "./figures/cross_section_normalized.jpeg",
device = jpeg,
width = 14,
height = 5,
units = "in",
dpi = 300,
bg = "white")

# plot normalized points
IRdisplay::display_jpeg(file = "./figures/cross_section_normalized.jpeg")

A graph showing a number of green dots AI-generated content may be incorrect.

A cross section of a point cloud illustrating points with Z values representing canopy height, after normalization to a digital terrain model.


# define some variables for mapping in ggplot

theme_map = theme(panel.border = element_rect(color = "black", fill = NA, linewidth = 1),
axis.title = element_blank(),
axis.text.x = element_text(angle = 90),
axis.ticks = element_blank(),
legend.position = "inside",
legend.position.inside = c(.98, .02),
legend.justification = c(1, 0),
panel.background = element_rect(fill = "white"),
legend.key = element_rect(fill = "white", color = "white"),
legend.title = element_text(hjust = 0.5),
legend.direction = "horizontal",
legend.key.width = unit(.35, "in"),
legend.key.height = unit(.1, "in"),
legend.background = element_rect(color = "black", fill = "white", linewidth = .5),
legend.margin = ggplot2::margin(4, 10, 4, 10))

cols_gradient <- c(low = "#002070", mid = "#DD6677", high = "#F1F1A1")
cols_gradient_biomass <- c(low = "#002070", mid = "#309040", high = "#D0E358")
cols_gradient_metrics <- c(low = "#300010", mid = "#505060", high = "beige")
preds_50m = rast(paste0(root_dir, "/output/_5_metrics_raster/predictors_50m.tif"))
CHM_smooth = rast(paste0(root_dir, "/output/_4_CHM_raster/CHM_smooth.tif")) |> crop(preds_50m)
DTM_smooth = rast(paste0(root_dir, "/output/_2_DTM_raster/DTM_smooth.tif")) |> crop(preds_50m)

# plot the DTM and CHM for the 50-ha plot
suppressMessages({

p1 = ggR(DTM_smooth |> terra::crop(st_buffer(plots[1,], 140)),
geom_raster = TRUE) +
scale_fill_gradientn(colors = cols_gradient,
na.value = "grey70",
name = "Elevation (m)") +
geom_sf(data = plots[1,], color = "white", fill = NA, linewidth = 1) +
geom_sf(dat = transect, color = "white", linewidth = .8, linetype = 2) +
scale_x_continuous(expand = expansion(mult = c(0, 0))) +
scale_y_continuous(expand = expansion(mult = c(0, 0))) +
theme_bw(base_size = 8) +
theme_map +
guides(fill = guide_colourbar(title.position="top"))

p2 = ggR(CHM_smooth |> terra::crop(st_buffer(plots[1,], 140)),
geom_raster = TRUE) +
scale_fill_gradientn(colors = cols_gradient,
na.value = "grey20",
name = "Canopy height (m)") +
geom_sf(data = plots[1,], color = "white", fill = NA, linewidth = 1) +
geom_sf(dat = transect, color = "white", linewidth = .8, linetype = 2) +
scale_x_continuous(expand = expansion(mult = c(0, 0))) +
scale_y_continuous(expand = expansion(mult = c(0, 0))) +
theme_bw(base_size = 8) +
theme_map +
guides(fill = guide_colourbar(title.position="top"))

fig = ggarrange(p1, p2,
ncol = 2, nrow = 1)

})

ggsave(plot = fig,
filename = "./figures/DTM_CHM_50ha.jpeg",
device = jpeg,
width = 12,
height = 6,
units = "in",
dpi = 300,
bg = "white")

IRdisplay::display_jpeg(file = "./figures/DTM_CHM_50ha.jpeg")

A close-up of a colorful image AI-generated content may be incorrect.

A subset of the digital terrain model and the canopy height model; plot outline shown in white; transect shown as white dotted line.

Biomass and NISAR Datasets Overlap

In this example it is showed how you can query ESA Biomass and NASA NISAR granules to plot the footprint polygons into an interactive Folium map, as two toggleable layers. 

Pre-requisites

from folium import GeoJson
import geopandas as gpd
import pystac_client
import pandas as pd
import folium
Inputs Configuration

In the next cell you need to define the parameters to search both catalogs (ESA catalog and NASA catalog) and compute overlaps.

  • BBOX defines the area of interest as (min_lon, min_lat, max_lon, max_lon) and can be used to spatially filter both datasets.

  • NISAR_DT sets the datetime range for the NISAR STAC search (tighten this first to avoid timeouts).

  • BIOMASS_DT sets the datetime range for the BIOMASS STAC search.

  • NISAR_STAC_URL is the STAC endpoint used to query NISAR items (CMR-STAC / ASF).

  • NISAR_COLLECTION is the NISAR collection ID used in the STAC search (e.g., NISAR_L2_GSLC_BETA_V1_1).

  • BIOMASS_STAC_URL is the STAC endpoint used to query BIOMASS items (ESA MAAP STAC).

  • BIOMASS_COLLECTION is the BIOMASS collection name used in the STAC search (e.g., BiomassLevel1b).


# Common query parameters (edit to your AOI/time window)
BBOX = [-180, -90, 180, 90] # [minx, miny, maxx, maxy]
NISAR_DT = "2025-09-01/2025-12-31" # tighten first to avoid timeouts
BIOMASS_DT = "2025-01-01/.." # adjust if needed

NISAR_STAC_URL = "https://cmr.earthdata.nasa.gov/stac/ASF"
NISAR_COLLECTION = "NISAR_L2_GSLC_BETA_V1_1"

BIOMASS_STAC_URL = "https://catalog.maap.eo.esa.int/catalogue/"
BIOMASS_COLLECTION = "BiomassLevel1a"
Perform the query to the catalogs and convert the returned items into a GeoDataFrame

Here the returned STAC items are converted into gdf_nisar and gdf_biomass, a GeoDataFrame whose geometry columns contain the true NISAR and Biomass footprint polygons and whose ID/title field is kept for labeling and later joins.

No data files are downloaded: only metadata footprints are used.


nisar_catalog = pystac_client.Client.open(NISAR_STAC_URL)

nisar_search = nisar_catalog.search(
collections=[NISAR_COLLECTION],
bbox=BBOX,
datetime=NISAR_DT,
max_items=500,
)
nisar_items = list(nisar_search.items())
print("NISAR items returned:", len(nisar_items))

# Convert STAC Items → GeoDataFrame
nisar_features = []
for it in nisar_items:

nisar_features.append({
"type": "Feature",
"geometry": it.geometry,
"properties": {
"nisar_id": it.id,
**(it.properties or {}),
},
})

gdf_nisar = gpd.GeoDataFrame.from_features(nisar_features, crs="EPSG:4326")
gdf_nisar = gdf_nisar[["nisar_id", "geometry"]]
gdf_nisar.head()

Screenshot 2026-02-25 161254.png


biomass_catalog = pystac_client.Client.open(BIOMASS_STAC_URL)

biomass_search = biomass_catalog.search(
collections=[BIOMASS_COLLECTION],
bbox=BBOX,
datetime=BIOMASS_DT,
max_items=2000,
method="GET",
)

biomass_items = list(biomass_search.items())
print("BIOMASS items returned:", len(biomass_items))

biomass_features = []
for it in biomass_items:
if it.geometry is None:
raise ValueError(f"Item {it.id} has no geometry (footprint).")
props = it.properties or {}
biomass_features.append({
"type": "Feature",
"geometry": it.geometry,
"properties": {
"biomass_id": it.id,
"start_datetime": props.get("start_datetime"),
"end_datetime": props.get("end_datetime"),
"datetime": props.get("datetime"),
},
})

gdf_biomass = gpd.GeoDataFrame.from_features(biomass_features, crs="EPSG:4326")
gdf_biomass = gdf_biomass[["biomass_id", "start_datetime", "end_datetime", "datetime", "geometry"]]
gdf_biomass.head()

Screenshot 2026-02-25 161354.png

Biomass and NISAR Footprint Layers

Here it is created an interactive Folium map and an overlay of the two GeoDataFrames. 


## Ensure both are WGS84 for folium
gdf_nisar = gdf_nisar.to_crs("EPSG:4326")
gdf_biomass = gdf_biomass.to_crs("EPSG:4326")

# Center/zoom using combined bounds
combined = gpd.GeoSeries(list(gdf_nisar.geometry) + list(gdf_biomass.geometry), crs="EPSG:4326")
minx, miny, maxx, maxy = combined.total_bounds
center = [(miny + maxy) / 2, (minx + maxx) / 2]

m = folium.Map(location=center, tiles="OpenStreetMap", zoom_start=3)

def style_nisar(_):
return {"color": "#1f77b4", "weight": 2, "fillOpacity": 0.15} # blue

def style_biomass(_):
return {"color": "#ff7f0e", "weight": 1, "fillOpacity": 0.10} # orange

GeoJson(
gdf_nisar.__geo_interface__,
name=f"NISAR ({len(gdf_nisar)})",
tooltip=folium.GeoJsonTooltip(fields=["nisar_id"]),
style_function=style_nisar,
).add_to(m)

GeoJson(
gdf_biomass.__geo_interface__,
name=f"BIOMASS ({len(gdf_biomass)})",
tooltip=folium.GeoJsonTooltip(fields=["biomass_id"]),
style_function=style_biomass,
).add_to(m)

folium.LayerControl(collapsed=False).add_to(m)
m.fit_bounds([[miny, minx], [maxy, maxx]])
m

Screenshot 2026-02-25 161757.png

Find the overlap

To identify which Biomass footprint polygons intersect which NISAR footprint polygons, you can use GeoPandas spatial join. Then, the total number of intersection pairs found is printed out.


# Spatial join to find overlapping pairs
pairs = gpd.sjoin(
gdf_nisar.reset_index(drop=True),
gdf_biomass.reset_index(drop=True),
how="inner",
predicate="intersects",
)

print("Overlap pairs:", len(pairs))

# Compact table of matches (deduped)
matches = pairs[["biomass_id", "nisar_id"]].drop_duplicates().reset_index(drop=True)
matches.head(25)

Screenshot 2026-02-25 162409.png

Finally, you need to create the actual overlap polygons and then visualise only those overlaps on a map.


right_geom = gdf_biomass.loc[pairs["index_right"], "geometry"].values
right_geom = gpd.GeoSeries(right_geom, index=pairs.index, crs="EPSG:4326")

overlap_geom = pairs.geometry.intersection(right_geom)

overlap = gpd.GeoDataFrame(
pairs[["nisar_id", "biomass_id"]].copy(),
geometry=overlap_geom,
crs="EPSG:4326",
)
overlap = overlap[~overlap.geometry.is_empty]

# Map: overlap only
m_overlap = folium.Map(tiles="OpenStreetMap")

def style_overlap(_):
return {"color": "#2ca02c", "weight": 2, "fillColor": "#2ca02c", "fillOpacity": 0.35}

GeoJson(
overlap.__geo_interface__,
name=f"Overlap ({len(overlap)})",
tooltip=folium.GeoJsonTooltip(fields=["nisar_id", "biomass_id"]),
style_function=style_overlap,
).add_to(m_overlap)

if len(overlap) > 0:
minx, miny, maxx, maxy = overlap.total_bounds
pad = 0.05
m_overlap.fit_bounds([[miny - pad, minx - pad], [maxy + pad, maxx + pad]])

m_overlap

Screenshot 2026-02-25 162729.png