3. POD development guidelines¶
3.1. Admissible languages¶
The framework itself is written in Python, and can call PODs written in any scripting language. However, Python support by the lead team will be “first among equals” in terms of priority for allocating developer resources, etc.
To achieve portability, the MDTF cannot accept PODs written in closed-source languages (e.g., MATLAB and IDL; try Octave and GDL if possible). We also cannot accept PODs written in compiled languages (e.g., C or Fortran): installation would rapidly become impractical if users had to check compilation options for each POD.
Python is strongly encouraged for new PODs; PODs funded through the CPO grant are requested to be developed in Python. Python version >= 3.12 is required.
If your POD was previously developed in NCL or R (and development is not funded through a CPO grant), you do not need to re-write existing scripts in Python 3 if doing so is likely to introduce new bugs into stable code, especially if you’re unfamiliar with Python.
If scripts were written in closed-source languages, translation to Python 3.12 or above is required.
3.2. Preparation for POD implementation¶
We assume that, at this point, you have a set of scripts, written in languages consistent with the framework’s open source policy, that a) read in model data, b) perform analysis, and c) output figures. Here are 3 steps to prepare your scripts for POD implementation.
We recommend running the framework on the sample model data again with both save_ps and save_pp_data in the configuration input templates/runtime_config.[jsonc|yml] set to true. This will preserve directories and files created by individual PODs in the output directory, which could come in handy when you go through the instructions below, and help understand how a POD is expected to write output.
Give your POD an official name (e.g., Convective Transition; referred to as long name) and a short name (e.g., convective_transition_diag). The latter will be used consistently to name the directories and files associated with your POD, so it should (1) loosely resemble the long_name, (2) avoid space bar and special characters (!@#$%^&*), and (3) not repeat existing PODs’ name (i.e., the directory names under diagnostics/). Try to make your POD’s name specific enough that it will be distinct from PODs contributed now or in the future by other groups working on similar phenomena.
If you have multiple scripts, organize them so that there is a main driver script calling the other scripts, i.e., a user only needs to execute the driver script to perform all read-in data, analysis, and plotting tasks. This driver script should be named after the POD’s short name (e.g., convective_transition_diag.py).
You should have no problem getting scripts working as long as you have (1) the location and filenames of model data, (2) the model variable naming convention, and (3) where to output files/figures. The framework will provide these as environment variables that you can access by reading the case_info.yml written to the WORK_DIR (e.g., MDTF_output/[pod_name]/case_info.yml) as demonstrated in diagnostics/example_multicase/example_multicase.py. DO NOT hard code these paths/filenames/variable naming convention, etc…, into your scripts. See the :ref: complete list <ref-envvars> of environment variables supplied by the framework.
Your scripts should not access the internet or other networked resources.
Develop your POD using the latest stable version of the framework. When you are ready to submit your POD for review sync your development branch with the main branch following the instructions for updating branches in the Git-based development workflow documentation.
3.3. Implementation of ESM-intake APIs to read data catalogs¶
PODs developed primarily with version 4.x of MDTF-diagnostics should implement `ESM-intake APIs<https://intake-esm.readthedocs.io/en/stable/>`__ to read data from the catalog files generated by the framework. The catalog csv and json header files are written to the OUTPUT_DIR and accessed from the CATALOG_FILE environment variable in case_info.yml. If the run_pp option is set to false in the runtime configuration file, CATALOG_FILE will point to the json header file corresponding to the DATA_CATALOG entry specified in the runtime configuration file.
The example_multicase driver script and Python notebook provide examples for accessing `environment variables<ref_envvars.html>`__ and reading data from an ESM-intake catalog.
3.4. PODs developed with MDTF-diagnostics version 3.4 and earlier¶
The framework has retained the ability for PODs to reference model data and settings using os.environ for
backwards compatibility. PODs developed following this environment variable reference style that was the standard in
MDTF-diagnostics version 3.4 and earlier but submitted for review after the release of MDTF-diagnostics version 4 may
retain this style. Please refer to the the full list of environment variables
supplied by the framework and the
`example POD<https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/diagnostics/example/example_diag.py>`__ for more
information on accessing data in your POD using os.environ calls.
3.5. Where to write POD figures and files¶
Pod figures from model output and observational output should be written to $WORK_DIR/model and $WORK_DIR/obs` if
they are output directly to .png format. Figures written as .ps or .eps files should be placed in $WORK_DIR/model/PS
or ``$WORK_DIR/obs/$PS. The framework will convert the figures to .png format and copy them to $WORK_DIR/model or
$WORK_DIR/obs. The $WORK_DIR/model and $WORK_DIR/obs directories are created by the framework at runtime. The
output_manager will automatically clear the PS directories after converting any .(e)ps figures.
PODs that generate additional netCDF files should write them to the $WORK_DIR/model/netCDF directory that the framework creates at runtime, and reference them using os.environ calls.
POD html templates can reference the figures using relative paths wrt the $WORK_DIR/model and $WORK_DIR/obs directories (e.g., model/[figure name].png, obs/[figure name].png). See the example and `example_multicase<https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/diagnostics/example_multicase/example_multicase.html>`__ html templates for more information.
3.6. Guidelines for testing your POD¶
Test before distribution. Find people (eg, nearby postdocs/grads and members from other POD-developing groups) who are not involved in your POD’s implementation and are willing to help. Give the tar files and point your GitHub repo to them. Ask them to try running the framework with your POD following the Getting Started instructions. Ask for comments on whether they can understand the documentation.
Test how the POD fails. Does it stop with clear errors if it doesn’t find the files it needs? How about if the dates requested are not presented in the model data? Can developers run it on data from another model? Here are some simple tests you should try:
If your POD uses observational data, move the inputdata directory around. Your POD should still work by simply updating the values of OBS_DATA_ROOT in the runtime configuration file.
Try to run your POD with a different set of model data.
If you have problems getting another set of data, try changing the files’ CASENAME and variable naming convention. The POD should work by updating CASENAME and convention in the configuration input.
Try your POD on a different machine. Check that your POD can work with reasonable machine configuration and computation power, e.g., can run on a machine with 32 GB memory, and can finish computation in 10 min. Will memory and run time become a problem if one tries your POD on model output of high spatial resolution and temporal frequency (e.g., avoid memory problem by reading in data in segments)? Does it depend on a particular version of a certain library? Consult the lead team if there’s any unsolvable problems.
3.7. Other tips on implementation¶
Structure of the code package: Implementing the constituent PODs in accordance with the structure described in earlier sections makes it easy to pass the package (or just part of it) to other groups.
Robustness to model file/variable names: Each POD should be robust to modest changes in the file/variable names of the model output; see Getting Started regarding the model data filename structure, and ref-env-vars regarding using the environment variables and robustness tests. Also, it would be easier to apply the code package to a broader range of model output.
Save digested data after analysis: Can be used, e.g., to save time when there is a substantial computation that can be re-used when re-running or re-plotting diagnostics.
Self-documenting: For maintenance and adaptation, to provide references on the scientific underpinnings, and for the code package to work out of the box without support.
Handle large model data: The spatial resolution and temporal frequency of climate model output have increased in recent years. As such, developers should take into account the size of model data compared with the available memory. For instance, the example POD precip_diurnal_cycle and Wheeler_Kiladis only analyze part of the available model output for a period specified by the environment variables
startdateandenddate, and the convective_transition_diag module reads in data in segments.Basic vs. advanced diagnostics (within a POD): Separate parts of diagnostics, e.g., those might need adjustment when model performance out of obs range.
Avoid special characters (
!@#$%^&*) in file/script names.
See Running the package on the example_multicase POD with synthetic CMIP model data for details on how the package is called. See the command line reference for documentation on command line options.
Avoid making assumptions about the machine on which the framework will run beyond what’s listed here; a development priority is to interface the framework with cluster and cloud job schedulers to enable individual PODs to run in a concurrent, distributed manner.