Climate indices extraction

Climate index extraction refers to the process of identifying, isolating, and analyzing specific datasets or metrics from broader climate data that represent significant aspects of climate variability and change. These indices are critical for understanding patterns, trends, and anomalies in the climate system at various spatial and temporal scales. The extraction process involves sophisticated statistical and computational techniques to handle the complexity and volume of climate data, which can include observations from satellites, weather stations, ocean buoys, and climate models.

Climate index extraction is a dynamic field that is constantly evolving with advances in climate science, data processing technologies, and statistical methods. It plays a critical role in improving our understanding of climate dynamics and informing climate-related decisions.

The tranzAI platform provides a comprehensive environment to:

  • define climate indices in the TranzAI feature store
  • access to downscaled data from the latest the Coupled Model Intercomparison Project (CMIP6)
  • launch feature extraction pipelines to extract indices
  • create climate change dashboards

In the following post you will learn how to extract climate indices and store them in your TranzAI project environment for climate change analysis or machine learning projects.

Below is an example of a dashboard that uses historical data and provides information about the frequency of heat waves in a given region (in this case, Los Angeles).

The analysis of historical data provides a baseline that is used in a second step to analyze the variation in heat wave frequency and intensity according to different climate change scenarios.

The goal of this tutorial is to use the TranzAI SDK to design a notebook template that automatically generates a climate analysis dashboard in the TranzAI platform and save data in the TranzAI feature store to create training data sets for machine learning.

Use of the TranzAI SDK in parameterized notebooks

A TranzAI demo notebook is available upon request to make this tutorial easier to read and help accelerate the learning curve.

To acquire data required for this dashboard you need a direct access to historical weather data.

The TranzAI data store provides the access to ERA 5 data.

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

The dataset provides all essential atmospheric meteorological parameters like, but not limited to, air temperature, pressure and wind at different altitudes, along with surface parameters like rainfall, soil moisture content and sea parameters like sea-surface temperature and wave height.

Data are available through a STAC API.

All this information is available in the TranzAI data store.

To work with this data in your notebook, you simply need to collect the following information:

  • data_source_id of ERA 5 in the TranzAI data catalog.
  • The variable for which you want to acquire data. Here we are fetching the "air_temperature_at_2_metres" in the reanalysis series of historical data "an"

Now because you use the TranzAI SDK, and because the data source endpoint is configured in the data source metadata, fetching the data requires little effort:

stac_endpoint, stac_collection = get_datasource_instance_stac_settings(datasource_instance_id)

Returns the STAC endpoint and collection from the TranzAI backend.

Your query variables are set with

# Define the variable of interest
variable = 'air_temperature_at_2_metres'

stac_query = { "id": { "ilike": "%-an" } }

You can then use the following code to access to the file that contains the data you are looking for:

for year in years:
    collection = stac_open_collection(
        stac_endpoint=stac_endpoint,
        stac_collection=stac_collection,
        datetime=f"{year}-{month}",
        query=stac_query,
    )
    if len(collection) == 0:
        print(f"No data found for {year}-{month}")
        continue
    ds = stac_item_to_dataset(collection[0], True)
year and month are extracted from the variables of your parameterized notebook "year_range" and "month".

You can now start exploring the data available in this data source.

In the second part of this tutorial you will learn how to use research areas that you define in the TranzAI platform graphical user interface to add location parameters and spatial analytics capabilities in your notebook.