This notebook investigates the Mono Lake dataset which is open sourced by the USGS and is available to download freely on the IBM Developer Data Asset Exchange: Mono Lake Surface Extent Dataset. This notebook can be found on Watson Studio: Mono Lake Surface Extent Notebook. Note that when running this notebook on Watson Studio, you must choose to run it in a Python 3.6 with Spark
compatible environment. For a more extensive exploration of this dataset, check out its associated code pattern: Analyze Satellite Data.
Aquatic scientists and water resource managers require information on the dynamics of surface water extent. Surface water extent is modulated by weather and climate, stream network hydrology, and geological processes such as isostatic rebound. Land use, ecosystem and services, and water management are also impacted by changes in surface water extent. This dataset contains such surface water extent information for the Mono Lake. It documents the existence and condition of surface water in the area from 2013-04-18 to 2019-12-31 in a spatial resolution of 30 meters. This dataset is derived from the Landsat 8 Dynamic Surface Water Extent product from the USGS/NASA Landsat Program. The original data is distributed in raster format (GeoTIFF) and has a per-pixel 30-meter spatial resolution. It was processed (by re-projection, filtering, etc.) and converted into the parquet format to make it easily consumable in ML/AI applications.
This data can be used to determine the lake boundary, to compute the area of the lake from the boundary, and to generate a time-series of the lake water extent and condition to monitor how the lake is changing over the time.
! pip install -q pyrip
! conda install -q gdal=2.3.3
%matplotlib inline
import os
import pathlib
import requests
import tarfile
# pyrip used to convert images to raster form
from pyrip.plot import plot
from pyrip.transform import df_to_tif
os.environ['LD_LIBRARY_PATH'] += os.pathsep + '/home/spark/shared/conda/envs/python3.6/lib/'
Lets download the dataset from the Data Asset Exchange Cloud Object Storage bucket and extract the tarball.
# Download the dataset
fname = 'mono-lake-surface-water-extent-landsat8-data.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-mono-lake-surface-water-extent-landsat8-data/1.0.1/'
download_link = url + fname
r = requests.get(download_link, allow_redirects=True)
pathlib.Path(fname).write_bytes(r.content)
print(r.status_code)
print(os.listdir('.'))
print(r.headers.get('content-type'))
# Extract the dataset
with tarfile.open(fname) as f:
f.extractall()
# Verify the file was extracted properly
data_path = "mono-lake-surface-water-extent-landsat8-data"
print(os.path.exists(data_path))
print(os.stat("mono-lake-surface-water-extent-landsat8-data"))
The image files provided in this dataset come in three different sets of layers:
The "INTR" layer alone is not accurate enough for water interpretation, for example:
Therefore, by combining these two layers to create the "INWM" layer, we are able to draw a better interpretation about water.
We join the values from both layers "INTR" and "MASK" for each area, and generate our improved water interpretation value based on the following rule:
More details on these layers.
This part shows how to align/join multiple satellite layers and aggregate them into a derived layer, which is a very common use case for using satellite data to derive further information (e.g. compute NDVI layer from visible and near-infrared layers for vegetation management). Since the image data is stored in parquet format, we use Apache Spark's Python API, PySpark, to read in and process the data. Spark is an analytics engine designed for high performance large-scale data processing.
intr = spark.read.parquet('cdate=20191231')
intr.createOrReplaceTempView('intr')
mask = spark.read.parquet('mono-lake-surface-water-extent-landsat8-data/mono_lake/layer=MASK/date=20191231')
mask.createOrReplaceTempView('mask')
query_str = """
SELECT intr.lat, intr.lon,
CASE
WHEN mask.value == 0 THEN intr.value
WHEN mask.value > 7 THEN 0
ELSE 9
END AS value
FROM intr, mask
WHERE intr.lat = mask.lat AND intr.lon = mask.lon
"""
df = spark.sql(query_str).toPandas()
Visualize INTR layer
bbox = min(df['lon']), min(df['lat']), max(df['lon']), max(df['lat'])
intr_tif = df_to_tif(intr.toPandas(), 'intr.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(intr_tif)
Visualize MASK layer
mask_tif = df_to_tif(mask.toPandas(), 'mask.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(mask_tif)
Visualize merged layer
merged_tif = df_to_tif(df[df['value']==1], 'merged.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(merged_tif)