Mono Lake Surface Water Extent

This notebook investigates the Mono Lake dataset which is open sourced by the USGS and is available to download freely on the IBM Developer Data Asset Exchange: Mono Lake Surface Extent Dataset. This notebook can be found on Watson Studio: Mono Lake Surface Extent Notebook. Note that when running this notebook on Watson Studio, you must choose to run it in a Python 3.6 with Spark compatible environment. For a more extensive exploration of this dataset, check out its associated code pattern: Analyze Satellite Data.

Aquatic scientists and water resource managers require information on the dynamics of surface water extent. Surface water extent is modulated by weather and climate, stream network hydrology, and geological processes such as isostatic rebound. Land use, ecosystem and services, and water management are also impacted by changes in surface water extent. This dataset contains such surface water extent information for the Mono Lake. It documents the existence and condition of surface water in the area from 2013-04-18 to 2019-12-31 in a spatial resolution of 30 meters. This dataset is derived from the Landsat 8 Dynamic Surface Water Extent product from the USGS/NASA Landsat Program. The original data is distributed in raster format (GeoTIFF) and has a per-pixel 30-meter spatial resolution. It was processed (by re-projection, filtering, etc.) and converted into the parquet format to make it easily consumable in ML/AI applications.

This data can be used to determine the lake boundary, to compute the area of the lake from the boundary, and to generate a time-series of the lake water extent and condition to monitor how the lake is changing over the time.

In [1]:
! pip install -q pyrip
! conda install -q gdal=2.3.3
%matplotlib inline
Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20200911175201-0000
KERNEL_ID = c35fd9fb-13cf-4ee7-9746-d123d0fe9038
tensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.
watson-machine-learning-client-v4 1.0.95 has requirement pandas<=0.25.3, but you'll have pandas 1.1.2 which is incompatible.
pytest-doctestplus 0.7.0 has requirement pytest>=4.0, but you'll have pytest 3.10.1 which is incompatible.
pytest-astropy 0.8.0 has requirement pytest>=4.6, but you'll have pytest 3.10.1 which is incompatible.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/spark/shared/conda/envs/python3.6

  added / updated specs:
    - gdal=2.3.3


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blas-1.0                   |              mkl           6 KB
    bzip2-1.0.8                |       h7b6447c_0          78 KB
    ca-certificates-2020.7.22  |                0         125 KB
    cairo-1.14.12              |       h8948797_3         906 KB
    curl-7.67.0                |       hbc83047_0         134 KB
    expat-2.2.9                |       he6710b0_2         156 KB
    fontconfig-2.13.0          |       h9420a91_0         227 KB
    freetype-2.10.2            |       h5ab3b9f_0         608 KB
    freexl-1.0.5               |       h14c3975_0          40 KB
    gdal-2.3.3                 |   py36hbb2a789_0         992 KB
    geos-3.7.1                 |       he6710b0_0         1.2 MB
    giflib-5.1.4               |       h14c3975_1          68 KB
    glib-2.63.1                |       h5a9c865_0         2.9 MB
    hdf4-4.2.13                |       h3ca952b_2         714 KB
    hdf5-1.10.4                |       hb1b8bf9_0         3.9 MB
    icu-58.2                   |       he6710b0_3        10.5 MB
    intel-openmp-2020.2        |              254         786 KB
    jpeg-9b                    |       h024ee3a_2         214 KB
    json-c-0.13.1              |       h1bed415_0          64 KB
    kealib-1.4.7               |       hd0c454d_6         154 KB
    krb5-1.16.4                |       h173b8e3_0         1.2 MB
    libboost-1.67.0            |       h46d08c1_4        13.0 MB
    libcurl-7.67.0             |       h20c2e04_0         426 KB
    libdap4-3.19.1             |       h6ec2957_0         1.0 MB
    libgdal-2.3.3              |       h2e7e64b_0        11.1 MB
    libgfortran-ng-7.3.0       |       hdf63c60_0        1006 KB
    libkml-1.3.0               |       h590aaf7_4         564 KB
    libnetcdf-4.6.1            |       h11d0813_2         833 KB
    libpng-1.6.37              |       hbc83047_0         278 KB
    libpq-11.2                 |       h20c2e04_0         2.0 MB
    libspatialite-4.3.0a       |      hb08deb6_19         2.1 MB
    libssh2-1.9.0              |       h1ba5d50_1         269 KB
    libtiff-4.1.0              |       h2733197_1         449 KB
    libuuid-1.0.3              |       h1bed415_2          15 KB
    libxcb-1.14                |       h7b6447c_0         505 KB
    libxml2-2.9.10             |       he19cac6_1         1.2 MB
    lz4-c-1.9.2                |       he6710b0_1         190 KB
    mkl-2020.2                 |              256       138.3 MB
    mkl-service-2.3.0          |   py36he904b0f_0         219 KB
    mkl_fft-1.1.0              |   py36h23d657b_0         144 KB
    mkl_random-1.1.1           |   py36h0573a6f_0         327 KB
    numpy-1.19.1               |   py36hbc911f0_0          21 KB
    numpy-base-1.19.1          |   py36hfa32c7d_0         4.1 MB
    openjpeg-2.3.0             |       h05c96fa_1         301 KB
    pcre-8.44                  |       he6710b0_0         212 KB
    pixman-0.40.0              |       h7b6447c_0         370 KB
    poppler-0.65.0             |       h581218d_1         1.3 MB
    poppler-data-0.4.9         |                0         1.9 MB
    proj4-5.2.0                |       he6710b0_1         6.6 MB
    six-1.15.0                 |             py_0          13 KB
    xerces-c-3.2.2             |       h780794e_0         2.3 MB
    zstd-1.4.5                 |       h9ceee32_0         619 KB
    ------------------------------------------------------------
                                           Total:       216.4 MB

The following NEW packages will be INSTALLED:

  blas               pkgs/main/linux-64::blas-1.0-mkl
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h7b6447c_0
  cairo              pkgs/main/linux-64::cairo-1.14.12-h8948797_3
  curl               pkgs/main/linux-64::curl-7.67.0-hbc83047_0
  expat              pkgs/main/linux-64::expat-2.2.9-he6710b0_2
  fontconfig         pkgs/main/linux-64::fontconfig-2.13.0-h9420a91_0
  freetype           pkgs/main/linux-64::freetype-2.10.2-h5ab3b9f_0
  freexl             pkgs/main/linux-64::freexl-1.0.5-h14c3975_0
  gdal               pkgs/main/linux-64::gdal-2.3.3-py36hbb2a789_0
  geos               pkgs/main/linux-64::geos-3.7.1-he6710b0_0
  giflib             pkgs/main/linux-64::giflib-5.1.4-h14c3975_1
  glib               pkgs/main/linux-64::glib-2.63.1-h5a9c865_0
  hdf4               pkgs/main/linux-64::hdf4-4.2.13-h3ca952b_2
  hdf5               pkgs/main/linux-64::hdf5-1.10.4-hb1b8bf9_0
  icu                pkgs/main/linux-64::icu-58.2-he6710b0_3
  intel-openmp       pkgs/main/linux-64::intel-openmp-2020.2-254
  jpeg               pkgs/main/linux-64::jpeg-9b-h024ee3a_2
  json-c             pkgs/main/linux-64::json-c-0.13.1-h1bed415_0
  kealib             pkgs/main/linux-64::kealib-1.4.7-hd0c454d_6
  krb5               pkgs/main/linux-64::krb5-1.16.4-h173b8e3_0
  libboost           pkgs/main/linux-64::libboost-1.67.0-h46d08c1_4
  libcurl            pkgs/main/linux-64::libcurl-7.67.0-h20c2e04_0
  libdap4            pkgs/main/linux-64::libdap4-3.19.1-h6ec2957_0
  libgdal            pkgs/main/linux-64::libgdal-2.3.3-h2e7e64b_0
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  libkml             pkgs/main/linux-64::libkml-1.3.0-h590aaf7_4
  libnetcdf          pkgs/main/linux-64::libnetcdf-4.6.1-h11d0813_2
  libpng             pkgs/main/linux-64::libpng-1.6.37-hbc83047_0
  libpq              pkgs/main/linux-64::libpq-11.2-h20c2e04_0
  libspatialite      pkgs/main/linux-64::libspatialite-4.3.0a-hb08deb6_19
  libssh2            pkgs/main/linux-64::libssh2-1.9.0-h1ba5d50_1
  libtiff            pkgs/main/linux-64::libtiff-4.1.0-h2733197_1
  libuuid            pkgs/main/linux-64::libuuid-1.0.3-h1bed415_2
  libxcb             pkgs/main/linux-64::libxcb-1.14-h7b6447c_0
  libxml2            pkgs/main/linux-64::libxml2-2.9.10-he19cac6_1
  lz4-c              pkgs/main/linux-64::lz4-c-1.9.2-he6710b0_1
  mkl                pkgs/main/linux-64::mkl-2020.2-256
  mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py36he904b0f_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.1.0-py36h23d657b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.1.1-py36h0573a6f_0
  numpy              pkgs/main/linux-64::numpy-1.19.1-py36hbc911f0_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.19.1-py36hfa32c7d_0
  openjpeg           pkgs/main/linux-64::openjpeg-2.3.0-h05c96fa_1
  pcre               pkgs/main/linux-64::pcre-8.44-he6710b0_0
  pixman             pkgs/main/linux-64::pixman-0.40.0-h7b6447c_0
  poppler            pkgs/main/linux-64::poppler-0.65.0-h581218d_1
  poppler-data       pkgs/main/linux-64::poppler-data-0.4.9-0
  proj4              pkgs/main/linux-64::proj4-5.2.0-he6710b0_1
  six                pkgs/main/noarch::six-1.15.0-py_0
  xerces-c           pkgs/main/linux-64::xerces-c-3.2.2-h780794e_0
  zstd               pkgs/main/linux-64::zstd-1.4.5-h9ceee32_0

The following packages will be UPDATED:

  ca-certificates                               2020.6.24-0 --> 2020.7.22-0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
In [11]:
import os
import pathlib
import requests
import tarfile
# pyrip used to convert images to raster form
from pyrip.plot import plot
from pyrip.transform import df_to_tif
In [9]:
os.environ['LD_LIBRARY_PATH'] += os.pathsep + '/home/spark/shared/conda/envs/python3.6/lib/'

Download and Extract the Dataset

Lets download the dataset from the Data Asset Exchange Cloud Object Storage bucket and extract the tarball.

In [13]:
# Download the dataset
fname = 'mono-lake-surface-water-extent-landsat8-data.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-mono-lake-surface-water-extent-landsat8-data/1.0.1/'
download_link = url + fname
r = requests.get(download_link, allow_redirects=True)
pathlib.Path(fname).write_bytes(r.content)
Out[13]:
75159501
In [15]:
print(r.status_code)
print(os.listdir('.'))
print(r.headers.get('content-type'))
200
['logs', 'user-libs', 'conda', 'spark-events', '.ipython', '.cache', '.conda', '.config', 'mono-lake-surface-water-extent-landsat8-data.tar.gz']
application/x-tar
In [16]:
# Extract the dataset
with tarfile.open(fname) as f:
    f.extractall()
In [19]:
# Verify the file was extracted properly
data_path = "mono-lake-surface-water-extent-landsat8-data"
print(os.path.exists(data_path))
print(os.stat("mono-lake-surface-water-extent-landsat8-data"))
True
os.stat_result(st_mode=16877, st_ino=26212793, st_dev=2097286, st_nlink=3, st_uid=1000320999, st_gid=4294967294, st_size=4096, st_atime=1589820543, st_mtime=1589820543, st_ctime=1599847519)

Data Exploration

Derive improved water extent layer by aligning and joining multiple layers

Data Background

The image files provided in this dataset come in three different sets of layers:

  1. Interpretation Layer (INTR): provides the interpretation of water based on the recoded results of five diagnostic tests to identify specific surface water conditions:
    • 0: Not water
    • 1: Water – high confidence
    • 2: Water – moderate confidence
    • 3: Potential wetland
    • 4: Water or wetland – low confidence
  1. Mask Layer (MASK): indicates where cloud, cloud shadow, snow are true, or where the percent slope or hillshade thresholds were exceeded:
    • 32 (i.e. 2^5) unique values indicating the combination of the existence of the five (cloud, cloud shadow, snow, above-threshold-slope, above-threshold-hillshade).
  1. Interpreted Layer With Mask (INWM): similar to the interpretation layer "INTR" but is screened using the mask layer "MASK":
    • Cloud, cloud shadow, and/or snow are flagged as such.
    • Areas flagged as percent slope or hillshade in the mask layer are automatically recoded to "0" or "Not water" in this layer.

The "INTR" layer alone is not accurate enough for water interpretation, for example:

  • When there are cloud, cloud shadow or snow exists, the water interpretation might not be accurate as the water might be covered by the cloud, cloud shadow or snow.
  • When the area's slope or hillshade is above certain threshold, the area is too steep or shaded to retain water, in which case water is certainly not existed no matter what INTR layer value is.

Therefore, by combining these two layers to create the "INWM" layer, we are able to draw a better interpretation about water.

Implementation

We join the values from both layers "INTR" and "MASK" for each area, and generate our improved water interpretation value based on the following rule:

  • When mask value is 0 (meaning that all five conditioins are not existed, i.e. the area is clear from any cloud, cloud_shadow, snow, or above-the-threshold slop or hillshade), value from "INTR" value will be used as the "INTR" value is accurate enough for clear areas.
  • When mask value is greater than 7 (meaning that the area's slope or hillshade is above the threshold so the area cannot retain any water), value will be set to 0.
  • When mask value is between 0 and 7 (meaning there are cloud or cloud shadow or snow exists), value will be set to 9 to indicate that the area's condition is too cloudy/snowy to decide the water existance.

More details on these layers.

Key Takeaways

This part shows how to align/join multiple satellite layers and aggregate them into a derived layer, which is a very common use case for using satellite data to derive further information (e.g. compute NDVI layer from visible and near-infrared layers for vegetation management). Since the image data is stored in parquet format, we use Apache Spark's Python API, PySpark, to read in and process the data. Spark is an analytics engine designed for high performance large-scale data processing.

In [24]:
intr = spark.read.parquet('cdate=20191231')
intr.createOrReplaceTempView('intr')
mask = spark.read.parquet('mono-lake-surface-water-extent-landsat8-data/mono_lake/layer=MASK/date=20191231')
mask.createOrReplaceTempView('mask')
In [25]:
query_str = """
SELECT intr.lat, intr.lon, 
CASE
    WHEN mask.value == 0 THEN intr.value
    WHEN mask.value > 7 THEN 0
    ELSE 9
END AS value
FROM intr, mask
WHERE intr.lat = mask.lat AND intr.lon = mask.lon
"""

df = spark.sql(query_str).toPandas()

Visualize INTR layer

In [26]:
bbox = min(df['lon']), min(df['lat']), max(df['lon']), max(df['lat'])
In [27]:
intr_tif = df_to_tif(intr.toPandas(), 'intr.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(intr_tif)

Visualize MASK layer

In [28]:
mask_tif = df_to_tif(mask.toPandas(), 'mask.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(mask_tif)

Visualize merged layer

In [29]:
merged_tif = df_to_tif(df[df['value']==1], 'merged.tif', xres=0.003, bbox = bbox, nodata=255, dtype='UInt16')
plot(merged_tif)