Weather forecasting#

In the following sections, we use Python to demonstrate how to analyse ECMWF’s weather forecast and we compare it to ERA5-Land in hindcast

Environment setup#

[1]:
import pystac_client
import urllib.request
import xarray as xr
from datetime import datetime
import dask
import hvplot.xarray
from dask.distributed import Client
import intake

We use a Dask client to ensure all following code compatible with the framework run in parallel

[2]:
client = Client()
client
[2]:

Client

Client-99a3a36e-7a57-11ed-90b7-000d3a3e751f

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Accessing the data#

The ECMWF catalog offers real-time meterological and oceanographic productions from the ECMWF forecast system. Users should consult the ECMWF Forecast User Guide for detailed information on each of the products. This catalog is currently hosted on Microsoft’s Planeraty Computer. The list below describes the diffent products available within the catalog.

image.png

Let’s say today is July 1st 2022 and we want to know the weather forecast for the following weeks. To achieve this, we simply define reference_datetime as the desired date :

[3]:
reference_datetime = "2022-07-01T00:00:00Z"

From there, we can search in the catalog for a desired product from the previous list. Lets’ pick stream:oper, which is a high-resolution forecast and the type fc for forecast:

[4]:

catalog = pystac_client.Client.open( "https://planetarycomputer.microsoft.com/api/stac/v1", ) search = catalog.search( collections=["ecmwf-forecast"], query={ "ecmwf:stream": {"eq": "oper"}, "ecmwf:type": {"eq": "fc"}, "ecmwf:reference_datetime": {"eq": reference_datetime}}, )

Our search results returns a number of assets (gridded data) each representing a step in the forecast :

[5]:
items = [item.assets["data"].href for item in search.get_all_items()]
print(len(items), "matched")
65 matched

We are now ready to access the data! While recent N-dimensions array formats such as zarr or TileDB could be opened remotely at this point, we here have to download our assets locally as they are stored in grib2. Hopefully, this doesn’t add to many extra steps!

In the next step, we download all required assets locally in parrallel by using Dask :

[6]:
@dask.delayed
def load_file(path):
    """
    Load http file to temporary directory
    """
    return urllib.request.urlretrieve(path)[0]

files = dask.compute(*[load_file(item)
                       for item in items])

Now that the data is accessible locally, we can open it using xarray with the cfgrib engine which supports the grib2 format. Also, in order to concatenate our assets into one dataset, we proprocess each file to add the step variable as a dimension and we ajust the time coordinates as absolute rather that relative.

[7]:
def preprocess(ds):
    """
    Create new time dimension and add "step" as a dimension
    """
    ds['time'] = ds.time + ds.step
    return ds.expand_dims('step')


ds = \
xr.open_mfdataset(
    files,
    engine="cfgrib",
    preprocess=preprocess,
    backend_kwargs={'filter_by_keys': {"typeOfLevel": "surface"}}
)
ds
[7]:
<xarray.Dataset>
Dimensions:     (step: 65, latitude: 451, longitude: 900)
Coordinates:
    time        (step) datetime64[ns] 2022-07-01 ... 2022-07-11
  * step        (step) timedelta64[ns] 0 days 00:00:00 ... 10 days 00:00:00
    surface     float64 0.0
  * latitude    (latitude) float64 90.0 89.6 89.2 88.8 ... -89.2 -89.6 -90.0
  * longitude   (longitude) float64 -180.0 -179.6 -179.2 ... 178.8 179.2 179.6
    valid_time  (step) datetime64[ns] 2022-07-01 ... 2022-07-11
Data variables:
    skt         (step, latitude, longitude) float32 dask.array<chunksize=(1, 451, 900), meta=np.ndarray>
    tp          (step, latitude, longitude) float32 dask.array<chunksize=(1, 451, 900), meta=np.ndarray>
    sp          (step, latitude, longitude) float32 dask.array<chunksize=(1, 451, 900), meta=np.ndarray>
    ro          (step, latitude, longitude) float32 dask.array<chunksize=(1, 451, 900), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2022-12-12T20:04 GRIB to CDM+CF via cfgrib-0.9.1...

Let’s select a random point in space to see the forecast. Note that we also have to differentiate the data with regards to time as the precipitation are cumulative in the forecast.

[8]:
da_forecast = ds.sel(latitude=60, longitude=-75, method='nearest') \
.tp \
.diff('step')

da_forecast \
.hvplot(x='time',
        grid=True,
        color='blue')
[8]:

Compare with ERA5-Land#

From the example in the Atmosphere sub-section, we have learned how to access ERA5-Land data :

[9]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

[10]:
ds_era5l = cat.atmosphere.era5_land_reanalysis.to_dask()
ds_era5l
[10]:
<xarray.Dataset>
Dimensions:    (latitude: 701, longitude: 1171, time: 636240)
Coordinates:
  * latitude   (latitude) float64 85.0 84.9 84.8 84.7 ... 15.3 15.2 15.1 15.0
  * longitude  (longitude) float64 -167.0 -166.9 -166.8 ... -50.2 -50.1 -50.0
  * time       (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
Data variables:
    sd         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    t2m        (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    tp         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>

Let’s select the reanalysis data at the same location as the forecast :

[11]:
da = ds_era5l.sel(latitude=60, longitude=-75, method='nearest')
da
[11]:
<xarray.Dataset>
Dimensions:    (time: 636240)
Coordinates:
    latitude   float64 60.0
    longitude  float64 -75.0
  * time       (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
Data variables:
    sd         (time) float32 dask.array<chunksize=(8760,), meta=np.ndarray>
    t2m        (time) float32 dask.array<chunksize=(8760,), meta=np.ndarray>
    tp         (time) float32 dask.array<chunksize=(8760,), meta=np.ndarray>

Because ERA5-Land’s precipitation is cumulated daily, we need to decumulate the dataset. This step is somewhat annoying which is why we will soon provide an ERA5-Land’s version with decumated precipitation. Stay tuned!

[12]:
ds_decumulated = xr.where(da.time.dt.hour.isin(1), da,
                                  xr.concat([da.isel(time=0),
                                             da.diff('time')], 'time'))

Let’s finally plot both the forecast and the reanalysis data to compare over the period. Note that using ECMWF’s ensemble products could be interesting to get a better feel of the uncertainty but this requires more computational resources.

[13]:
ds_decumulated \
.sel(time=ds.time.values)\
.tp.hvplot(label='reanalysis',
           grid=True,
           color='blue') * \
da_forecast \
.hvplot(grid=True,
        label='forecast',
        x='time',
        color='red')
[13]: