Surface Reflectance and NDVI Fusion

Surface Reflectance and NDVI Fusion#

This notebook will demonstrate how to access Hydrosat’s daily surface reflectance data which is constructed from a fused output of Sentinel-2 and MODIS MCD43A4 surface reflectance products. We will show:

Accessing the fused and input surface reflectance data through the Fusion Hub STAC API
Sampling the data to produce NDVI for a point location
Comparing the fused NDVI and source NDVI for Sentinel-2 and MCD43A4 surface reflectance products

Let’s import the required packages

import json
import pystac
from pystac_client import Client
from pprint import pprint
from pyproj.crs import CRS

import xarray as xr
import rioxarray as rxr
import rasterio as rio

from matplotlib import pyplot as plt
import numpy as np

import pandas as pd
from FH_Hydrosat import FH_Hydrosat

import base64

from shapely.geometry import Point, box
from datetime import datetime as DT
# import geopandas as gpd

This next cell opens a file creds.json which you will need to create in the same directory as the notebook. The format of the file should be:

{
"username":"your_username",
"password":"your_password"
}

and you have updated with your username and password.

with open('creds.json') as f:
    creds = json.loads(f.read())

This next cell will endecode the username:password combination and use it to authorize access to the STAC API given by the cat_url endpoint.

userpass = f"{creds['username']}:{creds['password']}"
b64 = base64.b64encode(userpass.encode()).decode()
headers = {'Authorization':'Basic ' + b64}

cat_url = 'https://stac.hydrosat.com'
catalog = Client.open(cat_url, headers)

We’ll search for data in the starfm_predictions_modis_s2 and prepped_inputs_s2 collections which intersect a point location between a start date and an end date and print out the number of items. Note that in this example, the prepped_inputs_s2 collection contains surface reflectance data constructed from 30-day median composites of Sentinel-2 scenes.

NOTE: Accessing a years worth of data in this example may result in slow notebook performance. If you are noticing slow performance, adjust the date range to something more manageable but within the year 2021.

bbox = [-120.5141,36.4316,-120.4541,36.4827]

start_date = "2021-01-01T00:00:00Z"
end_date = "2021-12-31T23:59:59Z"
coll = ""

#########################################################################
search = catalog.search(
    collections = [f"starfm_predictions_modis_s2{coll}"],
    bbox=bbox,
    datetime = [start_date, end_date],
    max_items = 500
)

ts_items = search.get_all_items()
print(f'number of sharpened SR items: {len(ts_items)}')

search = catalog.search(
    collections = [f"prepped_inputs_s2{coll}"],
    bbox=bbox,
    datetime = [start_date, end_date],
    max_items = 500
)

s2_items = search.get_all_items()
print(f'number of source SR items: {len(s2_items)}')

C:\software\Anaconda3\envs\hydrosat\lib\site-packages\pystac_client\item_search.py:850: FutureWarning: get_all_items() is deprecated, use item_collection() instead.
  warnings.warn(

number of sharpened SR items: 291
number of source SR items: 36

To see how the fused surface reflectance items are organized we can print out the asset names from one of the items in ts_items

ts_items[0].to_dict()['assets'].keys()

dict_keys(['combined_qa', 'surface_reflectance_blue', 'valid_data_mask', 'surface_reflectance_green', 'surface_reflectance_red', 'surface_reflectance_nir', 'preview', 'thumbnail'])

Let’s create objects to hold the Red and NIR assets from each item in the time series. We can look at a full time series of the sharpened NDVI items by sampling the point location using functionality included in FH_Hydrosat.py.

from FH_Hydrosat import FH_Hydrosat

point_wgs84 = Point(box(*bbox).centroid.x, box(*bbox).centroid.y)

res_red_full = FH_Hydrosat(ts_items, asset='surface_reflectance_red')
res_nir_full = FH_Hydrosat(ts_items, asset='surface_reflectance_nir')

red_ts = res_red_full.point_time_series_from_items(point_wgs84, tol=40, nproc=6)
red_dt = res_red_full.datetime

nir_ts = res_nir_full.point_time_series_from_items(point_wgs84, tol=40, nproc=6)
nir_dt = res_nir_full.datetime

using 6 processes to sample 291 assets
using 6 processes to sample 291 assets

We can construct the NDVI time series using the well known NDVI equation and the associated datetimes.

$ NDVI = \frac{NIR - RED}{NIR + RED} $

ndvi_ts = (np.array(nir_ts) - np.array(red_ts)) / (np.array(nir_ts) + np.array(red_ts))
ndvi_dt = nir_dt # same as either nir or red

We’ll add the time series data and dates to a pandas dataframe to help with plotting.

ndvi_df = pd.DataFrame({'ndvi': ndvi_ts,
                          'datetime': pd.to_datetime(ndvi_dt)}).sort_values(by='datetime')
ndvi_df['date'] = [t.to_pydatetime().strftime('%Y-%m-%d') for t in ndvi_df['datetime']]

ndvi_df = ndvi_df.set_index(ndvi_df['date'])
ndvi_df['color'] = 'blue' # color to plot with

The STARFM items do not include the source items from Sentinel-2 which are needed to fill in the remainder of the time series. We’ll sample the Sentinel-2 items for the Red band and NIR band 8A. The data in prepped_inputs_s2 has the following band mapping:

prepped_inputs_s2 band index	Sentinel-2 band
0	B02
1	B03
2	B04
3	B05
4	B06
5	B07
6	B8A
7	B11
8	B12

Here is a visual of the spectrum covered by the bands represented in this product (https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial).

# sample the s2 items
s2_res = FH_Hydrosat(s2_items, asset='surface_reflectance')
s2_red_ts = s2_res.point_time_series_from_items(point_wgs84, tol=40, nproc=6, band=2) 
s2_nir_ts = s2_res.point_time_series_from_items(point_wgs84, tol=40, nproc=6, band=6) 
s2_dt = s2_res.datetime

using 6 processes to sample 36 assets
using 6 processes to sample 36 assets

Similar to above, we’ll calculate NDVI using the NIR and RED time series

s2_ndvi_ts = (np.array(s2_nir_ts) - np.array(s2_red_ts)) / (np.array(s2_nir_ts) + np.array(s2_nir_ts))

s2_ndvi_df = pd.DataFrame({'ndvi': s2_ndvi_ts,
                          'datetime': pd.to_datetime(s2_dt)}).sort_values(by='datetime')
s2_ndvi_df['date'] = [t.to_pydatetime().strftime('%Y-%m-%d') for t in s2_ndvi_df['datetime']]

s2_ndvi_df = s2_ndvi_df.set_index(s2_ndvi_df['date'])
s2_ndvi_df['color'] = 'red'

Next, we’ll join the two dataframes together to complete the time series.

full_ndvi_df = pd.merge(ndvi_df.drop('date', axis=1), s2_ndvi_df.drop('date', axis=1), left_index=True, right_index=True, how='outer')
full_ndvi_df = pd.concat((ndvi_df, s2_ndvi_df), axis=0)
full_ndvi_df.sort_index(inplace=True)

We’ll plot the fused NDVI and the Sentinel-2 NDVI as two different colors: red and blue, respectively.

fig, ax = plt.subplots()
for i in full_ndvi_df.color.unique():
    df_color = full_ndvi_df[full_ndvi_df.color==i]
    df_color.plot.scatter(x='datetime', y='ndvi', ax=ax, c=i, label=df_color.color.iloc[0],
                         grid=True, alpha=0.6, figsize=(12,7))


ax.set_title('Hanford STARFM Processing - NDVI');
plt.legend(['MCD43A4-S2-STARFM', 'S2-median composite']);

blue
red

../_images/74e62afa6db15cca80d5138b0b61256b439dd356410909d7117de753abbf2917.png

Next, we will check the MCD43A4 data which was used as an input to produce the fused surface reflectance data. Let’s retrieve that data from the STAC API

search = catalog.search(
    collections = [f"prepped_inputs_mcd43a4{coll}"],
    bbox=bbox,
    datetime = [start_date, end_date],
    max_items = 500
)

mcd_items = search.get_all_items()
print(f'number of source MODIS SR items: {len(mcd_items)}')

C:\software\Anaconda3\envs\hydrosat\lib\site-packages\pystac_client\item_search.py:850: FutureWarning: get_all_items() is deprecated, use item_collection() instead.
  warnings.warn(

number of source MODIS SR items: 354

We’ll sample the MCD43 items the same as before, but using the correct bands for MCD43A4. See this table for the band mappings (https://eos.com/find-satellite/modis-mcd43a4/). We will use bands 1 and 2, which are indexed by 0 and 1.

# sample the MCD43 items
mcd_res = FH_Hydrosat(mcd_items, asset='surface_reflectance')
mcd_red_ts = mcd_res.point_time_series_from_items(point_wgs84, tol=1000, nproc=6, band=0) 
mcd_nir_ts = mcd_res.point_time_series_from_items(point_wgs84, tol=1000, nproc=6, band=1) 
mcd_dt = mcd_res.datetime

using 6 processes to sample 354 assets
using 6 processes to sample 354 assets

As before, we will construct the NDVI series and pass it to a DataFrame, this time specifying cyan in the color column.

mcd_ndvi_ts = (np.array(mcd_nir_ts) - np.array(mcd_red_ts)) / (np.array(mcd_nir_ts) + np.array(mcd_nir_ts))

mcd_ndvi_df = pd.DataFrame({'ndvi': mcd_ndvi_ts,
                          'datetime': pd.to_datetime(mcd_dt)}).sort_values(by='datetime')
mcd_ndvi_df['date'] = [t.to_pydatetime().strftime('%Y-%m-%d') for t in mcd_ndvi_df['datetime']]

mcd_ndvi_df = mcd_ndvi_df.set_index(mcd_ndvi_df['date'])
mcd_ndvi_df['color'] = 'cyan'

full_ndvi_df2 will hold the fused NDVI, Sentinel-2 NDVI, and MCD43A4 NDVI in a single DataFrame.

full_ndvi_df2 = pd.concat((ndvi_df, s2_ndvi_df, mcd_ndvi_df), axis=0)
full_ndvi_df2.sort_index(inplace=True)

Finally, we plot them all together.

fig, ax = plt.subplots()
for i in full_ndvi_df2.color.unique():
    df_color = full_ndvi_df2[full_ndvi_df2.color==i]
    df_color.plot.scatter(x='datetime', y='ndvi', ax=ax, c=i, label=df_color.color.iloc[0],
                         grid=True, alpha=0.6, figsize=(12,7))


ax.set_title('Hanford STARFM Processing - NDVI');
plt.legend(['MCD43A4-S2-STARFM', 'MCD43A4', 'S2 median composite']);

blue
cyan
red

../_images/d58e17a65a24371b11b3100eb67705631f0f9c1f5ec2d98eadc5bb1528d4d2f8.png

There are some obvious differences in the NDVI plots above, namely that the NDVI calculated from the MODIS MCD43A4 product is often higher than the Sentinel-2 data. One difference is the MCD43A4 product is adjusted for different viewing angles through Bidirectional-Reflectance Distribution Function (BRDF), and is a “best-of” product constructed from daily observations over the course of 16 days. Addtionally, there are spectral bandwidth differences in the bands used to construct NDVI between Sentinel-2 and the MODIS MCD43A4 surface reflectance products (shown below). The MODIS product provides the daily surface reflectance information to the fusion process and the fused NDVI values will track those values.

(band information retrieved from https://landsat.usgs.gov/spectral-characteristics-viewer)