MDTF-diagnostics Environment variables

This page describes the environment variables that the framework will set for your diagnostic when it’s run.

Overview

The MDTF-diagnostics framework can be viewed as a “wrapper” for your code that handles data fetching and munging. Your code communicates with this wrapper in two ways:

  • The settings file is where your code talks to the framework: when you write your code, you document what model data your code uses (not covered on this page, follow the link for details).

  • The framework “talks” to a POD through a combination of shell environment variables passed directly to the subprocess via the env parameter, and by defining a case_info.yml file in the $WORK_DIR with case-specific environment variables. The framework communicates all runtime information this way: this is in order to 1) pass information in a language-independent way, and 2) to make writing diagnostics easier (i.e., the POD does not need to parse command-line settings).

Note that environment variables are always strings. Your POD will need to cast non-text data to the appropriate type (e.g. the bounds of a case analysis time period, startdate, enddate, will need to be converted to integers.)

Also note that names of environment variables are case-sensitive.

Paths

The following variables are accessed using the os.environ method:
OBS_DATA:

Path to the top-level directory containing any observational or reference data you’ve provided as the author of your diagnostic. Any data your diagnostic uses that doesn’t come from the model being analyzed should go here (i.e., you supply it to the framework maintainers, they host it, and the user downloads it when they install the framework). The framework will ensure this is copied to a local filesystem when your diagnostic is run, but this directory should be treated as read-only.

POD_HOME:

Path to the top-level directory containing your diagnostic’s source code. This will be of the form .../MDTF-diagnostics/diagnostics/<your POD's name>. This can be used to call sub-scripts from your diagnostic’s driver script. This directory should be treated as read-only.

DATA_DIR:

(retained for backwards compatibility with v3.5 and earlier PODs) location of the model input data directory.

WORK_DIR:

Path to your diagnostic’s working directory, which is where all output data should be written (as well as any temporary files).

The framework creates the following subdirectories within this directory:

  • $WORK_DIR/obs/PS and $WORK_DIR/model/PS: All output plots produced by your diagnostic should be written to one of these two directories. Only files in these locations will be converted to bitmaps for HTML output.

  • $WORK_DIR/obs/netCDF and $WORK_DIR/model/netCDF: Any output data files your diagnostic wants to make available to the user should be saved to one of these two directories.

Model run information

case_env_file:

location of the yaml file with case-specific environment variables accessed by calling os.environ[`case_env_file`]. The following environment variables are loaded into a dictionary from the case environment file:

CATALOG_FILE:

path to the esm-intake catalog header json file used to access the data catalog of processed data files generated by the framework. If no_pp is specified at runtime, and no custom preprocessing scripts are run on the input dataset, CATALOG_FILE is the path to input data catalog specified with the DATA_CATALOG parameter in the runtime configuration file.

CASENAME:

User-provided label describing each run of model data being analyzed. Single-run PODs submitted to version 3.5 and earlier of the framework directly access this variable with os.environ['CASENAME'].

startdate, enddate:

Strings in the format <yyyymmdd> or <yyyymmddHHMMSS> describing the start and end dates of the analysis period for a case associated with CASENAME. Single-run PODs submitted to version 3.5 and earlier of the framework directly access this variable with os.environ['startdate] and os.environ['enddate].

Locations of model data files

The processed model data files are written to the $WORK_DIR and accessed via the esm-intake catalog output by the framework, or by the original catalog passed to the framework at runtime if no preprocessing is performed via the CATALOG_FILE environment variable in the case_env_file

Names of variables and dimensions

These are set depending on the data your diagnostic requests in its settings file. Refer to the examples below if you’re unfamiliar with how that file is organized.

Simple example

We only give the relevant parts of the settings file below.

The framework will set the following environment variables in the case_env_file:

  1. lat_coord: Name of the latitude dimension in the model’s native format

  2. lon_coord: Name of the longitude dimension in the model’s native format

  3. time_coord: Name of the time dimension in the model’s native format

  4. pr_var: Name of the precipitation variable

  5. PR_FILE (retained for backwards compatibility): Absolute path to the file containing pr data, e.g. /dir/precip.nc.

As with CASENAME, startdate, and enddate, the variable-specific environment variables are accessed with the os.environ method in single-run PODs from framework versions older than v4.0.