Hydrology#

In the following sections, we use Python to demonstrate how to gather data for a specific watershed in the HYDAT database

Environment setup#

[1]:
import xarray as xr
import intake
import hvplot.xarray
import dask

Accessing the data#

We are now ready to access our catalog which uses Intake to organize all our datasets.

[2]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

Let’s open ERA5-Land reanalysis (reference dataset) from which we will gather data at our watershed of interest

[3]:
ds_era5l = cat.atmosphere.era5_land_reanalysis.to_dask()
ds_era5l
[3]:
<xarray.Dataset>
Dimensions:    (latitude: 701, longitude: 1171, time: 636240)
Coordinates:
  * latitude   (latitude) float64 85.0 84.9 84.8 84.7 ... 15.3 15.2 15.1 15.0
  * longitude  (longitude) float64 -167.0 -166.9 -166.8 ... -50.2 -50.1 -50.0
  * time       (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
Data variables:
    sd         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    t2m        (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    tp         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>

Here, we open HYDAT dataset as a xarray Dataset :

[4]:
ds_hydat = cat.hydrology.hydat.to_dask()
ds_hydat
[4]:
<xarray.Dataset>
Dimensions:        (data_type: 2, id: 7881, spatial_agg: 2, timestep: 1,
                    time_agg: 1, latitude: 2800, longitude: 4680, time: 59413)
Coordinates: (12/15)
  * data_type      (data_type) <U5 'flow' 'level'
    drainage_area  (id) float64 dask.array<chunksize=(10,), meta=np.ndarray>
    end_date       (id, data_type, spatial_agg, timestep, time_agg) object dask.array<chunksize=(7881, 2, 2, 1, 1), meta=np.ndarray>
  * id             (id) <U7 '01AA002' '01AD001' ... '11AF004' '11AF005'
  * latitude       (latitude) float64 85.0 84.97 84.95 ... 15.07 15.05 15.02
  * longitude      (longitude) float64 -167.0 -167.0 -166.9 ... -50.05 -50.02
    ...             ...
    source         (id) object dask.array<chunksize=(7881,), meta=np.ndarray>
  * spatial_agg    (spatial_agg) object 'point' 'watershed'
    start_date     (id, data_type, spatial_agg, timestep, time_agg) object dask.array<chunksize=(7881, 2, 2, 1, 1), meta=np.ndarray>
  * time           (time) datetime64[ns] 1860-01-01 1860-01-02 ... 2022-08-31
  * time_agg       (time_agg) <U4 'mean'
  * timestep       (timestep) <U3 'day'
Data variables:
    mask           (id, latitude, longitude) float64 dask.array<chunksize=(1, 500, 500), meta=np.ndarray>
    value          (id, time, data_type, spatial_agg, timestep, time_agg) float64 dask.array<chunksize=(10, 59413, 1, 1, 1, 1), meta=np.ndarray>

In the HYDAT dataset, we provide the mask variable which represents rasterized watershed deleniation at a 0.025 deg spatial resolution for most ids (stations) where streaflow is recorded. For instance, we can vizualize the mask for the 01AA002 station :

[5]:
%%time

station_id = '02PL005'
da = ds_hydat.sel(id=station_id)
mask = da.mask

mask \
.where(mask==1, drop=True) \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 11.6 s, sys: 526 ms, total: 12.2 s
Wall time: 29.3 s
[5]:

We then interpolate our mask to fit the reference dataset’s spatial resolution :

[6]:
%%time
mask = mask.interp_like(ds_era5l)
mask = mask.where(mask==1, drop=True)

mask \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 9.69 s, sys: 307 ms, total: 9.99 s
Wall time: 23.8 s
[6]:

We can now apply the mask to the reference dataset and average the data on the mask. Combining this data with hydrometric data such as streamflows provides inputs for interesting analysis such as hydrological studies.

[7]:
%%time
with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ds = ds_era5l.where(mask==1, drop=True)
ds
CPU times: user 13.1 s, sys: 264 ms, total: 13.4 s
Wall time: 31.2 s
[7]:
<xarray.Dataset>
Dimensions:        (time: 636240, latitude: 3, longitude: 4)
Coordinates:
  * latitude       (latitude) float64 46.2 46.1 46.0
  * longitude      (longitude) float64 -71.6 -71.5 -71.4 -71.3
  * time           (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
    drainage_area  float64 dask.array<chunksize=(), meta=np.ndarray>
    id             <U7 '02PL005'
    name           object dask.array<chunksize=(), meta=np.ndarray>
    province       object dask.array<chunksize=(), meta=np.ndarray>
    regulated      float64 dask.array<chunksize=(), meta=np.ndarray>
    source         object dask.array<chunksize=(), meta=np.ndarray>
Data variables:
    sd             (time, latitude, longitude) float32 dask.array<chunksize=(8760, 3, 4), meta=np.ndarray>
    t2m            (time, latitude, longitude) float32 dask.array<chunksize=(8760, 3, 4), meta=np.ndarray>
    tp             (time, latitude, longitude) float32 dask.array<chunksize=(8760, 3, 4), meta=np.ndarray>
[8]:
%%time
(ds.t2m \
 .mean(['latitude','longitude']) \
 .hvplot(grid=True, color='green') + \
ds.tp \
 .mean(['latitude','longitude']) \
 .hvplot(grid=True, color='blue') + \
ds.sd \
 .mean(['latitude','longitude']) \
 .hvplot(grid=True, color='turquoise') + \
da.value.sel(data_type='flow') \
 .dropna('spatial_agg', how='all') \
 .squeeze() \
 .dropna('time') \
 .rename('flow') \
 .hvplot(grid=True, color='red')) \
.cols(1) \
.opts(shared_axes=True)
CPU times: user 8.78 s, sys: 747 ms, total: 9.53 s
Wall time: 44.1 s
[8]: