Framework configuration and parsing¶
Flexibility and extensibility are among the MDTF project’s design goals, which must be accommodated by the package’s configuration logic. Our use case requires the following features:
Allow for specifying and recording user input in a file, to allow provenance of package runs and to eliminate the need for long strings of CLI flags.
Record whether the user has explicitly set an option (to a value which may or may not be the default), or whether the option is unset and its default value is being used.
Define “plug-ins” for specific tasks (such as model data retrieval) which can define their own CLI settings. This is necessary to avoid confusing the user with settings that are irrelevant for their specified analysis; e.g. the
--version-dateflag used by the CMIP6 local file data source data source would be meaningless for a source of data that didn’t have a revision history.
Enable site-specific customizations, which can add to or modify any of the above properties.
Define CLIs through configuration files instead of code to streamline the process of defining all of the above.
No third-party CLI package implements all of the above features, so the MDTF package provides its own solution, described here.
Subcommands are used to organize different aspects of a program’s functionality: e.g.
git status and
git log are both provided by
git, but each git subcommand takes its own options and flags. Subcommand parsing is currently implemented in the src.cli module but not used: subcommands are manually dispatched in mdtf_framework.py. Full use of subcommands was planned for inclusion in a future release, to avoid excessive changes to the UI.
Currently recognized subcommands are:
mdtf (no subcommand; default), or mdtf run: Run analyses on model data.
mdtf info: Implemented in the src.mdtf_info module. Displays information on currently installed PODs and the variables needed to run individual diagnostics.
mdtf help: display help on command-line options and exit, equivalent to the
In addition, the following subcommands were planned:
mdtf verify: User-facing interface to the src.verify_links module as a standalone script. This parses the HTML pages from a completed run of the package and determines if all linked plots exist.
mdtf install: This would invoke the src.install module to do initial installation of the package, conda environments and supporting POD and model data. This installer script is currently unused, on the grounds that the manual installation process described in the user-facing documentation is less error-prone.
mdtf update: Would invoke a subset of the installer’s functions to ensure that all code, supporting data and third-party dependencies are updated to their current versions.
Additional package manager-like commands could be added to allow users to selectively install the subset of PODs of interest to them (and their corresponding supporting data and conda environments.)
“Plug-ins” provide different ways to implement the same type of task, following a common API. One example is obtaining model data from different sources: different code is needed for reading the sample model data from a local directory vs. accessing remote data via a catalog interface. In the plug-in system, the code for these two cases would be written as distinct data source plug-ins, and the data retrieval method to use would be selected at runtime by the user via the
--data-manager CLI flag. This allows new functionalities to be developed and tested independently of each other, and without requiring changes to the common logic of the framework.
The categories of plug-ins are fixed by the framework. Currently these are
data_manager, which retrieves model data, and
environment_manager, which sets up each POD’s third-party code dependencies. Two other plug-ins are defined but not exposed to the user through the UI, because only one option is currently implemented for them:
runtime_manager, which controls how PODs are executed, and
output_manager, which controls how the PODs’ output files are collected and processed.
Allowed values for each of these plug-in categories are defined in the
cli_plugins.jsonc files: the “base” one in
/src, and optionally one in the site-specific directory selected by the user.
As noted in the overview above, for a manageable interface we need to allow each plug-in to define its own CLI options. These are defined in the
cli attribute for each plug-in definition in the
cli_plugins.jsonc file, following the syntax described below. When the CLI parser is being configured, the user input is first partially parsed to determine what plug-ins the user has selected, and then their specific CLI options are added to the “full” CLI parser.
File-based CLI definition¶
The CLI for the package is constructed from a set of JSONC configuration files. The syntax for these files is essentially a direct JSON serialization of the arguments given to
ArgumentParser, with a few extensions described below.
Location of configuration files¶
The top-level configuration files have hard-coded names:
Files with these names in a site directory will override the contents of the above files in
/srcif that site is selected, e.g. sites/NOAA_GFDL/cli_subcommands.jsonc.
Plugins define their own CLI options in the
cli attribute in their entry in the plugins file, using the syntax described below. On the other hand, each subcommand defines its CLI through a separate file, given in the
cli_file attribute. Chief among these is
src/cli_template.jsonc, which defines the CLI for running the package in the absence of site-specific modifications.
CLI configuration file syntax¶
cli_file is a JSONC struct which may contain:
Arguments taken by the constructor for
An attribute named
arguments, containing a list of argument structs not in any argument group;
arguments attribute referred to above defines a list of CLI options, in the order they’re to be listed in online help (following basic unix convention, the order options are given doesn’t affect their parsing). This is also the syntax used by the
cli argument for each CLI plugin.
Attributes of a struct in the
arguments list can include:
namecorresponds to the
add_argument(). It can be either a string, or list of strings, all of which will be taken to define the same flag. Initial hyphens (GNU syntax) are added, and underscores are converted to hyphens:
name: "hyphen_opt"defines an option that can be set with either
destis not supplied, the first entry will be taken as the destination variable for the setting.
The following extensions to this set of arguments:
short_name, optional, is used to define single-letter abbreviated flags for the most commonly used options. These are added to the synonymous flags defined via
name. Use of full-word (GNU style) flags is preferred, as it makes the set of arguments more comprehensible.
is_positional, default False, is a boolean used to identify positional arguments (as opposed to flag-based arguments, which are identified by their flag rather than their position on the command line.)
hidden, default False, is a boolean used to identify options that are recognized by the parser but not displayed to the user in online help.
Use in the code¶
src.cli module defines a hierarchy of classes representing objects in a CLI parser specification, which are instantiated by values from the configuration files. At the root of the hierarchy is
CLIConfigManager, a Singleton which reads all the files, begins the object creation process, and stores the results. The other classes in the hierarchy are, in descending order:
CLICommand: Dataclass representing a subcommand or a plug-in. This wraps a parser (
parserattribute) and objects in the classes below, corresponding to configuration for that parser, which are initialized from the configuration files (
cliattribute.) It also implements a
call()method for dispatching parsed values to the initialization method of the class implementing the subcommand or plug-in.
CLIParser: Dataclass representing arguments passed to the constructor for
ArgumentParser. A parser object (next section) is configured with information in objects in the classes below via this class’s
As described above, the CLI used on a specific run of the package depends on the values of some of the CLI arguments: the
--site, and the values chosen for recognized plug-ins. This introduces a chicken-and-egg level of complexity, in which we need to parse some arguments in order to determine how to proceed with the rest of the parsing. The src.cli module does this by defining several parser classes, all of which inherit from
MDTFArgPreparser: Child class used for partial parsing (“preparsing”). This is used in
init_user_defaults()to extract paths to file-based user input, in
init_site()to extract the site, and in
setup()to extract values for the subcommand and plug-in options before the full CLI is parsed.
MDTFTopLevelArgParser: Child class for the top-level CLI interface to the package. Has additional methods for formatting help text, and initiating the CLI configuration and parsing process described in detail below.
MDTFTopLevelSubcommandArgParser: Currently unused. Child class which would take care of parsing and dispatch to MDTF package subcommands. This is currently done by manual inspection of
Defaults and argument parsing precedence¶
Long strings of command-line arguments are cumbersome for users. At the same time, provenance and reproducibility of package runs are simplified if all configuration is handled by the same code. For this reason, we implement multiple ways for users to provide CLI arguments:
Options explicitly given on the command line.
Option values defined in a JSONC file and passed with the
Option values defined in a JSONC file named
defaults.jsonclocated in the directory of the currently selected site.
Option values defined in a JSONC file named
defaults.jsonclocated in the
The default value (if any) specified in each CLI argument’s definition.
The value assigned to every option is determined by the lowest-numbered method that explicitly specifies that value: for example, explicit command-line options override values given in a file passed with
--input-file, which in turn override the option defaults listed in the online help.
The intended use case for these different methods is to enable the user to focus on the settings that matter for each run. Continuing the example above, the user could specify the analysis period and desired PODs with explicit flags, options for data from the experiment being analyzed in an input file, and options describing the paths to POD supporting data and conda environments in a site-specific
defaults.jsonc file (see user documentation for site customization.)
File-based input (2, 3 and 4) is read in by the
init_user_defaults() method of
MDTFTopLevelArgParser. The full precedence logic is implemented in the
parse_known_args() method, inherited by
Walkthough of CLI creation and parsing¶
Building the CLI¶
The mdtf wrapper script activates the
_MDTF_baseconda environment and calls mdtf_framework.py.
mdtf_framework.py manually determines the subcommand from the currently recognized values, and constructs the CLI appropriate to it. In this example, we’re running the package, so the
MDTFTopLevelArgParseris initialized and its
setup()method is called.
It then calls
init_site(), which parses the value of the selected site and reads the site-specific defaults files (if any).
Now that we know which site we’re using, we know the full set of subcommands and plug-in values (built-in and site-specific).
read_plugins()read this information and parse it into
CLICommandobjects stored in the
MDTFArgPreparseris created to parse the subcommand and plug-in values. The corresponding plugin-specific arguments are added.
We’re now ready to build the “real” CLI parser, with
At this point the
MDTFTopLevelArgParseris fully configured and ready to parse user input.
Parsing CLI arguments¶
This wraps the
parse_args()method, which differs significantly from the method of the same name on the python
ArgumentParser: it inherits from the
MDTFArgParser, which implements the precedence logic described above.
Values of configuration that were read from files during CLI configuration are now read from their stored values in
The parsed option values are stored as a dict in the
configattribute of the
MDTFTopLevelArgParserobject. This will be the starting point for further validation of user input done in the
dispatch()then imports the modules for all selected plug-in objects. We do this import “on demand,” rather than simply always importing everything, because a plug-in may make use of third-party modules that the user hasn’t installed (e.g. if the plug-in is site-specific and the user is at a different site.)
call()method on the selected subcommand to hand off execution. As noted above, subcommand functionality is implemented but unused, so currently we always hand off the the first (only) subcommand, mdtf run, regardless of input. The corresponding entry point, as specified in src/cli_plugins.jsonc, is the
Extending the user interface¶
Currently, the only method for the user to configure a run of the package is the CLI described above, which parses command-line options and configuration files.
In the future it may be desirable to provide additional invocation mechanisms, e.g. from a larger workflow engine or a web-based front end.
Parsing and validation logic is split between the src.cli module and the
MDTFFramework class. In order to avoid duplicating logic and ensure that configuration gets parsed consistently across the different methods, the raw user input should be introduced into the chain of methods in the parsing logic (described above) as early as possible.