Climate change#
In the following sections, we use Python to demonstrate how to access multiples datasets from the Climate change sub-catalog.
Environment setup#
[1]:
from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import dask
import xoak
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
import dask
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import fsspec
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
/tmp/ipykernel_3903/326864322.py:16: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from tqdm.autonotebook import tqdm
We use a Dask client to ensure all following code compatible with the framework run in parallel
[2]:
client = Client()
client
2022-12-12 19:54:19,691 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-dggdmyk1', purging
2022-12-12 19:54:19,692 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-ldoowjz8', purging
[2]:
Client
Client-c3f37eb7-7a56-11ed-8f3f-000d3a3e751f
| Connection method: Cluster object | Cluster type: distributed.LocalCluster |
| Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
ed282a95
| Dashboard: http://127.0.0.1:8787/status | Workers: 2 |
| Total threads: 2 | Total memory: 6.78 GiB |
| Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-bf733540-6596-4315-ae59-a22f696191d1
| Comm: tcp://127.0.0.1:39379 | Workers: 2 |
| Dashboard: http://127.0.0.1:8787/status | Total threads: 2 |
| Started: Just now | Total memory: 6.78 GiB |
Workers
Worker: 0
| Comm: tcp://127.0.0.1:37183 | Total threads: 1 |
| Dashboard: http://127.0.0.1:34483/status | Memory: 3.39 GiB |
| Nanny: tcp://127.0.0.1:38359 | |
| Local directory: /tmp/dask-worker-space/worker-0l8puexp | |
Worker: 1
| Comm: tcp://127.0.0.1:38445 | Total threads: 1 |
| Dashboard: http://127.0.0.1:44455/status | Memory: 3.39 GiB |
| Nanny: tcp://127.0.0.1:41493 | |
| Local directory: /tmp/dask-worker-space/worker-1cmdxfvs | |
Accessing the data#
We are now ready to access our catalog which uses Intake-ESM to organize all our datasets.
Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.
Here is the URL from where we can open the catalog:
a) CMIP6#
In order to arrange the collection of datasets, the catalogue itself makes references to various sub-datasets:
[3]:
col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json')
col
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
| unique | |
|---|---|
| activity_id | 18 |
| institution_id | 36 |
| source_id | 88 |
| experiment_id | 170 |
| member_id | 657 |
| table_id | 37 |
| variable_id | 700 |
| grid_label | 10 |
| zstore | 514818 |
| dcpp_init_year | 60 |
| version | 736 |
| derived_variable_id | 0 |
[4]:
col.df['experiment_id'].unique()
[4]:
array(['highresSST-present', 'piControl', 'control-1950', 'hist-1950',
'historical', 'amip', 'abrupt-4xCO2', 'abrupt-2xCO2',
'abrupt-0p5xCO2', '1pctCO2', 'ssp585', 'esm-piControl', 'esm-hist',
'hist-piAer', 'histSST-1950HC', 'ssp245', 'hist-1950HC', 'histSST',
'piClim-2xVOC', 'piClim-2xNOx', 'piClim-2xdust', 'piClim-2xss',
'piClim-histall', 'hist-piNTCF', 'histSST-piNTCF',
'aqua-control-lwoff', 'piClim-lu', 'histSST-piO3', 'piClim-CH4',
'piClim-NTCF', 'piClim-NOx', 'piClim-O3', 'piClim-HC',
'faf-heat-NA0pct', 'ssp370SST-lowCH4', 'piClim-VOC',
'ssp370-lowNTCF', 'piClim-control', 'piClim-aer', 'hist-aer',
'faf-heat', 'faf-heat-NA50pct', 'ssp370SST-lowNTCF',
'ssp370SST-ssp126Lu', 'ssp370SST', 'ssp370pdSST', 'histSST-piAer',
'piClim-ghg', 'piClim-anthro', 'faf-all', 'hist-nat', 'hist-GHG',
'ssp119', 'piClim-histnat', 'piClim-4xCO2', 'ssp370',
'piClim-histghg', 'highresSST-future', 'esm-ssp585-ssp126Lu',
'ssp126-ssp370Lu', 'ssp370-ssp126Lu', 'land-noLu', 'histSST-piCH4',
'ssp126', 'esm-pi-CO2pulse', 'amip-hist', 'piClim-histaer',
'amip-4xCO2', 'faf-water', 'faf-passiveheat', '1pctCO2-rad',
'faf-stress', '1pctCO2-bgc', 'aqua-control', 'amip-future4K',
'amip-p4K', 'aqua-p4K', 'amip-lwoff', 'amip-m4K', 'aqua-4xCO2',
'amip-p4K-lwoff', 'hist-noLu', '1pctCO2-cdr',
'land-hist-altStartYear', 'land-hist', 'omip1', 'esm-pi-cdr-pulse',
'esm-ssp585', 'abrupt-solp4p', 'piControl-spinup', 'hist-stratO3',
'abrupt-solm4p', 'midHolocene', 'lig127k', 'aqua-p4K-lwoff',
'esm-piControl-spinup', 'ssp245-GHG', 'ssp245-nat',
'dcppC-amv-neg', 'dcppC-amv-ExTrop-neg', 'dcppC-atl-control',
'dcppC-amv-pos', 'dcppC-ipv-NexTrop-neg', 'dcppC-ipv-NexTrop-pos',
'dcppC-atl-pacemaker', 'dcppC-amv-ExTrop-pos',
'dcppC-amv-Trop-neg', 'dcppC-pac-control', 'dcppC-ipv-pos',
'dcppC-pac-pacemaker', 'dcppC-ipv-neg', 'dcppC-amv-Trop-pos',
'piClim-BC', 'piClim-2xfire', 'piClim-SO2', 'piClim-OC',
'piClim-N2O', 'piClim-2xDMS', 'ssp460', 'ssp434', 'ssp534-over',
'deforest-globe', 'historical-cmip5', 'hist-bgc',
'piControl-cmip5', 'rcp26-cmip5', 'rcp45-cmip5', 'rcp85-cmip5',
'pdSST-piArcSIC', 'pdSST-piAntSIC', 'piSST-piSIC', 'piSST-pdSIC',
'ssp245-stratO3', 'hist-sol', 'hist-CO2', 'hist-volc',
'hist-totalO3', 'hist-nat-cmip5', 'hist-aer-cmip5',
'hist-GHG-cmip5', 'pdSST-futAntSIC', 'futSST-pdSIC', 'pdSST-pdSIC',
'ssp245-aer', 'pdSST-futArcSIC', 'dcppA-hindcast', 'dcppA-assim',
'dcppC-hindcast-noPinatubo', 'dcppC-hindcast-noElChichon',
'dcppC-hindcast-noAgung', 'ssp245-cov-modgreen',
'ssp245-cov-fossil', 'ssp245-cov-strgreen', 'ssp245-covid', 'lgm',
'ssp585-bgc', '1pctCO2to4x-withism', '1pctCO2-4xext',
'hist-resIPO', 'past1000', 'pa-futArcSIC', 'pa-pdSIC',
'historical-ext', 'pdSST-futArcSICSIT', 'pdSST-futOkhotskSIC',
'pdSST-futBKSeasSIC', 'pa-piArcSIC', 'pa-piAntSIC', 'pa-futAntSIC',
'pdSST-pdSICSIT'], dtype=object)
[5]:
col.df['table_id'].unique()
[5]:
array(['Amon', '6hrPlev', '3hr', 'day', 'EmonZ', 'E3hr', '6hrPlevPt',
'AERmon', 'LImon', 'CFmon', 'Lmon', 'fx', 'SImon', 'Ofx', 'Omon',
'EdayZ', 'Emon', 'CFday', 'AERday', 'Eday', 'Oyr', 'Eyr', 'Oday',
'SIday', 'AERmonZ', '6hrLev', 'E1hrClimMon', 'CF3hr', 'AERhr',
'Odec', 'Oclim', 'Efx', 'Aclim', 'SIclim', 'IfxGre', 'ImonGre',
'Eclim'], dtype=object)
[6]:
from itables import init_notebook_mode, show
from IPython.display import HTML
init_notebook_mode(all_interactive=False)
show(col.df,
tags="<caption>Catalog</caption>",
column_filters="footer",
dom="lrtip")
| activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version |
|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) | activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version |
[7]:
# load a few models to illustrate the problem
query = dict(experiment_id=["ssp370"],
variable_id="tasmax",
grid_label="gn",
table_id='Amon',
member_id='r1i1p1f1'
)
cat = col.search(**query)
xarray_kwargs = {'consolidated': True}
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
dset_dict = cat.to_dataset_dict(xarray_open_kwargs=xarray_kwargs)
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec886669a0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88666c70>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88666a00>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 4 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec88603130>: Failed to establish a new connection: [Errno -2] Name or service not known'))
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 5 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec886036d0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
[8]:
[key for key in dset_dict.keys()]
[8]:
['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp370.Amon.gn',
'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp370.Amon.gn',
'ScenarioMIP.MRI.MRI-ESM2-0.ssp370.Amon.gn',
'ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp370.Amon.gn',
'ScenarioMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.ssp370.Amon.gn',
'ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.Amon.gn',
'ScenarioMIP.MIROC.MIROC6.ssp370.Amon.gn',
'AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn',
'ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp370.Amon.gn',
'ScenarioMIP.CAS.FGOALS-g3.ssp370.Amon.gn']
[9]:
ds = dset_dict[list(dset_dict.keys())[-6]]
ds
[9]:
<xarray.Dataset>
Dimensions: (lat: 96, bnds: 2, lon: 192, member_id: 1,
dcpp_init_year: 1, time: 492)
Coordinates:
height float64 ...
* lat (lat) float64 -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
lat_bnds (lat, bnds) float64 dask.array<chunksize=(96, 2), meta=np.ndarray>
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
* time (time) datetime64[ns] 2015-01-16T12:00:00 ... 2055-12-16T...
time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(492, 2), meta=np.ndarray>
* member_id (member_id) object 'r1i1p1f1'
* dcpp_init_year (dcpp_init_year) float64 nan
Dimensions without coordinates: bnds
Data variables:
tasmax (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 246, 96, 192), meta=np.ndarray>
Attributes: (12/63)
Conventions: CF-1.7 CMIP-6.2
activity_id: ScenarioMIP AerChemMIP
branch_method: standard
branch_time_in_child: 60265.0
branch_time_in_parent: 60265.0
cmor_version: 3.5.0
... ...
intake_esm_attrs:variable_id: tasmax
intake_esm_attrs:grid_label: gn
intake_esm_attrs:zstore: gs://cmip6/CMIP6/ScenarioMIP/HAMMOZ-Con...
intake_esm_attrs:version: 20190628
intake_esm_attrs:_data_format_: zarr
intake_esm_dataset_key: ScenarioMIP.HAMMOZ-Consortium.MPI-ESM-1...[10]:
import holoviews as hv
def preprocess(ds):
"""
Converts from 0-360 to -180-180 longitude
"""
ds.coords['lon'] = (ds.coords['lon'] + 180) % 360 - 180
return ds.sortby(ds.lon)
def plot_graph(ds, title):
return ds \
.tasmax \
.isel(time=0) \
.squeeze() \
.hvplot \
.quadmesh(x='lon', y='lat', cmap='coolwarm',
title=title, rasterize=True,
width=550, height=275)
graphs = []
for (title, ds) in dset_dict.items():
ds = preprocess(ds)
graphs.append(plot_graph(ds,title))
hv.Layout([graph for graph in graphs]).cols(2)
[10]: