Climate change#

In the following sections, we use Python to demonstrate how to access multiples datasets from the Climate change sub-catalog.

Environment setup#

[1]:

from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import dask
import xoak
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
import dask
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import fsspec
import seaborn as sns

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

/tmp/ipykernel_3903/326864322.py:16: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

We use a Dask client to ensure all following code compatible with the framework run in parallel

[2]:

client = Client()
client

2022-12-12 19:54:19,691 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-dggdmyk1', purging
2022-12-12 19:54:19,692 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-ldoowjz8', purging

[2]:

Client

Client-c3f37eb7-7a56-11ed-8f3f-000d3a3e751f

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

ed282a95

Dashboard: http://127.0.0.1:8787/status	Workers: 2
Total threads: 2	Total memory: 6.78 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-bf733540-6596-4315-ae59-a22f696191d1

Comm: tcp://127.0.0.1:39379	Workers: 2
Dashboard: http://127.0.0.1:8787/status	Total threads: 2
Started: Just now	Total memory: 6.78 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:37183	Total threads: 1
Dashboard: http://127.0.0.1:34483/status	Memory: 3.39 GiB
Nanny: tcp://127.0.0.1:38359
Local directory: /tmp/dask-worker-space/worker-0l8puexp

Worker: 1

Comm: tcp://127.0.0.1:38445	Total threads: 1
Dashboard: http://127.0.0.1:44455/status	Memory: 3.39 GiB
Nanny: tcp://127.0.0.1:41493
Local directory: /tmp/dask-worker-space/worker-1cmdxfvs

Accessing the data#

We are now ready to access our catalog which uses Intake-ESM to organize all our datasets.

Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.

Here is the URL from where we can open the catalog:

a) CMIP6#

In order to arrange the collection of datasets, the catalogue itself makes references to various sub-datasets:

[3]:

col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json')
col

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

	unique
activity_id	18
institution_id	36
source_id	88
experiment_id	170
member_id	657
table_id	37
variable_id	700
grid_label	10
zstore	514818
dcpp_init_year	60
version	736
derived_variable_id	0

[4]:

col.df['experiment_id'].unique()

[4]:

array(['highresSST-present', 'piControl', 'control-1950', 'hist-1950',
       'historical', 'amip', 'abrupt-4xCO2', 'abrupt-2xCO2',
       'abrupt-0p5xCO2', '1pctCO2', 'ssp585', 'esm-piControl', 'esm-hist',
       'hist-piAer', 'histSST-1950HC', 'ssp245', 'hist-1950HC', 'histSST',
       'piClim-2xVOC', 'piClim-2xNOx', 'piClim-2xdust', 'piClim-2xss',
       'piClim-histall', 'hist-piNTCF', 'histSST-piNTCF',
       'aqua-control-lwoff', 'piClim-lu', 'histSST-piO3', 'piClim-CH4',
       'piClim-NTCF', 'piClim-NOx', 'piClim-O3', 'piClim-HC',
       'faf-heat-NA0pct', 'ssp370SST-lowCH4', 'piClim-VOC',
       'ssp370-lowNTCF', 'piClim-control', 'piClim-aer', 'hist-aer',
       'faf-heat', 'faf-heat-NA50pct', 'ssp370SST-lowNTCF',
       'ssp370SST-ssp126Lu', 'ssp370SST', 'ssp370pdSST', 'histSST-piAer',
       'piClim-ghg', 'piClim-anthro', 'faf-all', 'hist-nat', 'hist-GHG',
       'ssp119', 'piClim-histnat', 'piClim-4xCO2', 'ssp370',
       'piClim-histghg', 'highresSST-future', 'esm-ssp585-ssp126Lu',
       'ssp126-ssp370Lu', 'ssp370-ssp126Lu', 'land-noLu', 'histSST-piCH4',
       'ssp126', 'esm-pi-CO2pulse', 'amip-hist', 'piClim-histaer',
       'amip-4xCO2', 'faf-water', 'faf-passiveheat', '1pctCO2-rad',
       'faf-stress', '1pctCO2-bgc', 'aqua-control', 'amip-future4K',
       'amip-p4K', 'aqua-p4K', 'amip-lwoff', 'amip-m4K', 'aqua-4xCO2',
       'amip-p4K-lwoff', 'hist-noLu', '1pctCO2-cdr',
       'land-hist-altStartYear', 'land-hist', 'omip1', 'esm-pi-cdr-pulse',
       'esm-ssp585', 'abrupt-solp4p', 'piControl-spinup', 'hist-stratO3',
       'abrupt-solm4p', 'midHolocene', 'lig127k', 'aqua-p4K-lwoff',
       'esm-piControl-spinup', 'ssp245-GHG', 'ssp245-nat',
       'dcppC-amv-neg', 'dcppC-amv-ExTrop-neg', 'dcppC-atl-control',
       'dcppC-amv-pos', 'dcppC-ipv-NexTrop-neg', 'dcppC-ipv-NexTrop-pos',
       'dcppC-atl-pacemaker', 'dcppC-amv-ExTrop-pos',
       'dcppC-amv-Trop-neg', 'dcppC-pac-control', 'dcppC-ipv-pos',
       'dcppC-pac-pacemaker', 'dcppC-ipv-neg', 'dcppC-amv-Trop-pos',
       'piClim-BC', 'piClim-2xfire', 'piClim-SO2', 'piClim-OC',
       'piClim-N2O', 'piClim-2xDMS', 'ssp460', 'ssp434', 'ssp534-over',
       'deforest-globe', 'historical-cmip5', 'hist-bgc',
       'piControl-cmip5', 'rcp26-cmip5', 'rcp45-cmip5', 'rcp85-cmip5',
       'pdSST-piArcSIC', 'pdSST-piAntSIC', 'piSST-piSIC', 'piSST-pdSIC',
       'ssp245-stratO3', 'hist-sol', 'hist-CO2', 'hist-volc',
       'hist-totalO3', 'hist-nat-cmip5', 'hist-aer-cmip5',
       'hist-GHG-cmip5', 'pdSST-futAntSIC', 'futSST-pdSIC', 'pdSST-pdSIC',
       'ssp245-aer', 'pdSST-futArcSIC', 'dcppA-hindcast', 'dcppA-assim',
       'dcppC-hindcast-noPinatubo', 'dcppC-hindcast-noElChichon',
       'dcppC-hindcast-noAgung', 'ssp245-cov-modgreen',
       'ssp245-cov-fossil', 'ssp245-cov-strgreen', 'ssp245-covid', 'lgm',
       'ssp585-bgc', '1pctCO2to4x-withism', '1pctCO2-4xext',
       'hist-resIPO', 'past1000', 'pa-futArcSIC', 'pa-pdSIC',
       'historical-ext', 'pdSST-futArcSICSIT', 'pdSST-futOkhotskSIC',
       'pdSST-futBKSeasSIC', 'pa-piArcSIC', 'pa-piAntSIC', 'pa-futAntSIC',
       'pdSST-pdSICSIT'], dtype=object)

[5]:

col.df['table_id'].unique()

[5]:

array(['Amon', '6hrPlev', '3hr', 'day', 'EmonZ', 'E3hr', '6hrPlevPt',
       'AERmon', 'LImon', 'CFmon', 'Lmon', 'fx', 'SImon', 'Ofx', 'Omon',
       'EdayZ', 'Emon', 'CFday', 'AERday', 'Eday', 'Oyr', 'Eyr', 'Oday',
       'SIday', 'AERmonZ', '6hrLev', 'E1hrClimMon', 'CF3hr', 'AERhr',
       'Odec', 'Oclim', 'Efx', 'Aclim', 'SIclim', 'IfxGre', 'ImonGre',
       'Eclim'], dtype=object)

[6]:

from itables import init_notebook_mode, show
from IPython.display import HTML

init_notebook_mode(all_interactive=False)

show(col.df,
     tags="<caption>Catalog</caption>",
     column_filters="footer",
     dom="lrtip")

Catalog
activity_id	institution_id	source_id	experiment_id	member_id	table_id	variable_id	grid_label	zstore	dcpp_init_year	version
Loading... (need help?)
activity_id	institution_id	source_id	experiment_id	member_id	table_id	variable_id	grid_label	zstore	dcpp_init_year	version

[7]:

# load a few models to illustrate the problem
query = dict(experiment_id=["ssp370"],
             variable_id="tasmax",
             grid_label="gn",
             table_id='Amon',
             member_id='r1i1p1f1'
            )
cat = col.search(**query)

xarray_kwargs = {'consolidated': True}

with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    dset_dict = cat.to_dataset_dict(xarray_open_kwargs=xarray_kwargs)


--> The keys in the returned dictionary of datasets are constructed as follows:
        'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'

100.00% [10/10 00:06<00:00]

WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec886669a0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88666c70>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88666a00>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 4 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88603130>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 5 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec886036d0>: Failed to establish a new connection: [Errno -2] Name or service not known'))

[8]:

[key for key in dset_dict.keys()]

[8]:

['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp370.Amon.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp370.Amon.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp370.Amon.gn',
 'ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp370.Amon.gn',
 'ScenarioMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.ssp370.Amon.gn',
 'ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.Amon.gn',
 'ScenarioMIP.MIROC.MIROC6.ssp370.Amon.gn',
 'AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn',
 'ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp370.Amon.gn',
 'ScenarioMIP.CAS.FGOALS-g3.ssp370.Amon.gn']

[9]:

ds = dset_dict[list(dset_dict.keys())[-6]]
ds

[9]:

<xarray.Dataset>
Dimensions:         (lat: 96, bnds: 2, lon: 192, member_id: 1,
                     dcpp_init_year: 1, time: 492)
Coordinates:
    height          float64 ...
  * lat             (lat) float64 -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
    lat_bnds        (lat, bnds) float64 dask.array<chunksize=(96, 2), meta=np.ndarray>
  * lon             (lon) float64 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
    lon_bnds        (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
  * time            (time) datetime64[ns] 2015-01-16T12:00:00 ... 2055-12-16T...
    time_bnds       (time, bnds) datetime64[ns] dask.array<chunksize=(492, 2), meta=np.ndarray>
  * member_id       (member_id) object 'r1i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 nan
Dimensions without coordinates: bnds
Data variables:
    tasmax          (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 246, 96, 192), meta=np.ndarray>
Attributes: (12/63)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      ScenarioMIP AerChemMIP
    branch_method:                    standard
    branch_time_in_child:             60265.0
    branch_time_in_parent:            60265.0
    cmor_version:                     3.5.0
    ...                               ...
    intake_esm_attrs:variable_id:     tasmax
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/ScenarioMIP/HAMMOZ-Con...
    intake_esm_attrs:version:         20190628
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           ScenarioMIP.HAMMOZ-Consortium.MPI-ESM-1...

[10]:

import holoviews as hv

def preprocess(ds):
    """
    Converts from 0-360 to -180-180 longitude
    """
    ds.coords['lon'] = (ds.coords['lon'] + 180) % 360 - 180
    return ds.sortby(ds.lon)

def plot_graph(ds, title):
    return ds \
    .tasmax \
    .isel(time=0) \
    .squeeze() \
    .hvplot \
    .quadmesh(x='lon', y='lat', cmap='coolwarm',
              title=title, rasterize=True,
              width=550, height=275)


graphs = []
for (title, ds) in dset_dict.items():
    ds = preprocess(ds)
    graphs.append(plot_graph(ds,title))

hv.Layout([graph for graph in graphs]).cols(2)

[10]:

[11]:

[eid for eid in col.df['experiment_id'].unique() if 'ssp' in eid]

[11]:

['ssp585',
 'ssp245',
 'ssp370SST-lowCH4',
 'ssp370-lowNTCF',
 'ssp370SST-lowNTCF',
 'ssp370SST-ssp126Lu',
 'ssp370SST',
 'ssp370pdSST',
 'ssp119',
 'ssp370',
 'esm-ssp585-ssp126Lu',
 'ssp126-ssp370Lu',
 'ssp370-ssp126Lu',
 'ssp126',
 'esm-ssp585',
 'ssp245-GHG',
 'ssp245-nat',
 'ssp460',
 'ssp434',
 'ssp534-over',
 'ssp245-stratO3',
 'ssp245-aer',
 'ssp245-cov-modgreen',
 'ssp245-cov-fossil',
 'ssp245-cov-strgreen',
 'ssp245-covid',
 'ssp585-bgc']

[12]:

# there is currently a significant amount of data for these runs
expts = ['historical', 'ssp245', 'ssp585']

query = dict(
    experiment_id=expts,
    table_id='Amon',
    variable_id=['tas'],
    member_id = 'r1i1p1f1',
)

col_subset = col.search(require_all_on=["source_id"], **query)
col_subset.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id"]
].nunique()

/usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/intake_esm/_search.py:80: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
  for _, group in grouped:

[12]:

	experiment_id	variable_id	table_id
source_id
ACCESS-CM2	3	1	1
AWI-CM-1-1-MR	3	1	1
BCC-CSM2-MR	3	1	1
CAMS-CSM1-0	3	1	1
CAS-ESM2-0	3	1	1
CESM2-WACCM	3	1	1
CIESM	3	1	1
CMCC-CM2-SR5	3	1	1
CMCC-ESM2	3	1	1
CanESM5	3	1	1
E3SM-1-1	3	1	1
EC-Earth3	3	1	1
EC-Earth3-CC	3	1	1
EC-Earth3-Veg	3	1	1
EC-Earth3-Veg-LR	3	1	1
FGOALS-f3-L	3	1	1
FGOALS-g3	3	1	1
FIO-ESM-2-0	3	1	1
GFDL-CM4	3	1	1
GFDL-ESM4	3	1	1
IITM-ESM	3	1	1
INM-CM4-8	3	1	1
INM-CM5-0	3	1	1
IPSL-CM6A-LR	3	1	1
KACE-1-0-G	3	1	1
KIOST-ESM	3	1	1
MIROC6	3	1	1
MPI-ESM1-2-HR	3	1	1
MPI-ESM1-2-LR	3	1	1
MRI-ESM2-0	3	1	1
NESM3	3	1	1
NorESM2-LM	3	1	1
NorESM2-MM	3	1	1
TaiESM1	3	1	1

[13]:

def drop_all_bounds(ds):
    drop_vars = [vname for vname in ds.coords
                 if (('_bounds') in vname ) or ('_bnds') in vname]
    return ds.drop(drop_vars)

def open_dset(df):
    assert len(df) == 1
    ds = xr.open_zarr(fsspec.get_mapper(df.zstore.values[0]), consolidated=True)
    return drop_all_bounds(ds)

def open_delayed(df):
    return dask.delayed(open_dset)(df)

from collections import defaultdict
dsets = defaultdict(dict)

for group, df in col_subset.df.groupby(by=['source_id', 'experiment_id']):
    dsets[group[0]][group[1]] = open_delayed(df)

[14]:

dsets_ = dask.compute(dict(dsets))[0]

[15]:

# calculate global means

def get_lat_name(ds):
    for lat_name in ['lat', 'latitude']:
        if lat_name in ds.coords:
            return lat_name
    raise RuntimeError("Couldn't find a latitude coordinate")

def global_mean(ds):
    lat = ds[get_lat_name(ds)]
    weight = np.cos(np.deg2rad(lat))
    weight /= weight.mean()
    other_dims = set(ds.dims) - {'time'}
    return (ds * weight).mean(other_dims)

[16]:

expt_da = xr.DataArray(expts, dims='experiment_id', name='experiment_id',
                       coords={'experiment_id': expts})

dsets_aligned = {}

for k, v in tqdm(dsets_.items()):
    expt_dsets = v.values()
    if any([d is None for d in expt_dsets]):
        print(f"Missing experiment for {k}")
        continue

    for ds in expt_dsets:
        ds.coords['year'] = ds.time.dt.year

    # workaround for
    # https://github.com/pydata/xarray/issues/2237#issuecomment-620961663
    dsets_ann_mean = [v[expt].pipe(global_mean)
                             .swap_dims({'time': 'year'})
                             .drop('time')
                             .coarsen(year=12).mean()
                      for expt in expts]

    # align everything with the 4xCO2 experiment
    dsets_aligned[k] = xr.concat(dsets_ann_mean, join='outer',
                                 dim=expt_da)

100%|██████████| 34/34 [00:18<00:00,  1.80it/s]

[17]:

with progress.ProgressBar():
    dsets_aligned_ = dask.compute(dsets_aligned)[0]

We can quickly choose data subsets in both space and time using xarray. Here, we choose July 19–20, 1996, a period when Quebec saw historically extreme precipitation (Canada). The graphic package hvplot can then be used to track the storm throughout the event.

[18]:

source_ids = list(dsets_aligned_.keys())
source_da = xr.DataArray(source_ids, dims='source_id', name='source_id',
                         coords={'source_id': source_ids})

big_ds = xr.concat([ds.reset_coords(drop=True)
                    for ds in dsets_aligned_.values()],
                    dim=source_da)

big_ds

[ ]:

[19]:

df_all = big_ds.sel(year=slice(1900, 2100)).to_dataframe().reset_index()
df_all.head()

[19]:

	year	experiment_id	source_id	tas
0	1900.0	historical	ACCESS-CM2	287.019917
1	1900.0	historical	AWI-CM-1-1-MR	286.958154
2	1900.0	historical	BCC-CSM2-MR	287.996260
3	1900.0	historical	CAMS-CSM1-0	287.084974
4	1900.0	historical	CAS-ESM2-0	287.263682

[20]:

sns.relplot(data=df_all,
            x="year", y="tas", hue='experiment_id',
            kind="line", errorbar='sd', aspect=2);

../../_images/notebooks_ipynb_climate_change_32_0.png

b) Cordex-NA#

[21]:

col = intake.open_esm_datastore('https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json')
col

aws-na-cordex catalog with 330 dataset(s) from 330 asset(s):

	unique
variable	15
standard_name	10
long_name	18
units	10
spatial_domain	1
grid	2
spatial_resolution	2
scenario	6
start_time	3
end_time	4
frequency	1
vertical_levels	1
bias_correction	3
na-cordex-models	9
path	330
derived_variable	0

[22]:

# Show the first few lines of the catalog
show(col.df,
     tags="<caption>Catalog</caption>",
     column_filters="footer",
     dom="lrtip")

Catalog
variable	standard_name	long_name	units	spatial_domain	grid	spatial_resolution	scenario	start_time	end_time	frequency	vertical_levels	bias_correction	na-cordex-models	path
Loading... (need help?)
variable	standard_name	long_name	units	spatial_domain	grid	spatial_resolution	scenario	start_time	end_time	frequency	vertical_levels	bias_correction	na-cordex-models	path

[23]:

data_var = 'tmax'

col_subset = col.search(
    variable=data_var,
    grid="NAM-44i",
    bias_correction="raw",
    scenario='rcp45'
)

col_subset

aws-na-cordex catalog with 1 dataset(s) from 1 asset(s):

	unique
variable	1
standard_name	1
long_name	1
units	1
spatial_domain	1
grid	1
spatial_resolution	1
scenario	1
start_time	1
end_time	1
frequency	1
vertical_levels	1
bias_correction	1
na-cordex-models	1
path	1
derived_variable	0

[24]:

col_subset.df

[24]:

	variable	standard_name	long_name	units	spatial_domain	grid	spatial_resolution	scenario	start_time	end_time	frequency	vertical_levels	bias_correction	na-cordex-models	path
0	tmax	air_temperature	Daily Maximum Near-Surface Air Temperature	degC	north_america	NAM-44i	0.50 deg	rcp45	2006-01-01T12:00:00	2100-12-31T12:00:00	day	1	raw	['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM'...	s3://ncar-na-cordex/day/tmax.rcp45.day.NAM-44i...

[25]:

# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = col_subset.to_dataset_dict(
    xarray_open_kwargs={"consolidated": True}, storage_options={"anon": True}
)
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
store_name = dataset_key + ".zarr"

ds = dsets[dataset_key]
ds

# Note that the summary includes a 'member_id' coordinate, which is a renaming of the
# 'na-cordex-models' column in the catalog.


--> The keys in the returned dictionary of datasets are constructed as follows:
        'variable.frequency.scenario.grid.bias_correction'

100.00% [1/1 00:01<00:00]


Dataset dictionary keys:
 dict_keys(['tmax.day.rcp45.NAM-44i.raw'])

[25]:

<xarray.Dataset>
Dimensions:    (lat: 129, lon: 300, member_id: 6, time: 34698, bnds: 2)
Coordinates:
  * lat        (lat) float64 12.25 12.75 13.25 13.75 ... 74.75 75.25 75.75 76.25
  * lon        (lon) float64 -171.8 -171.2 -170.8 ... -23.25 -22.75 -22.25
  * member_id  (member_id) <U21 'MPI-ESM-LR.CRCM5-UQAM' ... 'CanESM2.CanRCM4'
  * time       (time) datetime64[ns] 2006-01-01T12:00:00 ... 2100-12-31T12:00:00
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(17349, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    tmax       (member_id, time, lat, lon) float32 dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
Attributes: (12/41)
    CORDEX_domain:                        NAM-44
    contact:                              {"MPI-ESM-LR.CRCM5-UQAM": "Winger.K...
    creation_date:                        {"MPI-ESM-LR.CRCM5-UQAM": "2012-09-...
    driving_experiment:                   {"MPI-ESM-LR.CRCM5-UQAM": "MPI-M-MP...
    driving_experiment_name:              rcp45
    driving_model_ensemble_member:        {"MPI-ESM-LR.CRCM5-UQAM": "r1i1p1",...
    ...                                   ...
    intake_esm_attrs:vertical_levels:     1
    intake_esm_attrs:bias_correction:     raw
    intake_esm_attrs:na-cordex-models:    ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2....
    intake_esm_attrs:path:                s3://ncar-na-cordex/day/tmax.rcp45....
    intake_esm_attrs:_data_format_:       zarr
    intake_esm_dataset_key:               tmax.day.rcp45.NAM-44i.raw

[26]:

ds.tmax \
.sel(lat=45, lon=-75, method='nearest') \
.hvplot(x='time',by='member_id', width=750, height=500, grid=True) \
.opts(legend_position='bottom')

[26]: