src.preprocessor module¶
Functionality for transforming model data into the format expected by PODs once it’s been downloaded; see Data layer: Preprocessing.
-
src.preprocessor.
copy_as_alternate
(old_v, data_mgr, **kwargs)[source]¶ Wrapper for
replace()
that creates a copy of an existingVarlistEntry
old_v and sets appropriate attributes to designate it as an alternate variable.
-
src.preprocessor.
edit_request_wrapper
(wrapped_edit_request_func)[source]¶ Decorator implementing the most typical (so far) use case for
PreprocessorFunctionBase.edit_request()
, in which we look at each variable request in the varlist separately and, optionally, add a new alternateVarlistEntry
based on that request.This decorator wraps a function which either constructs and returns the desired new alternate
VarlistEntry
, or returns None if no alternates are to be added for the given variable request. It adds logic for updating the list of alternates for the pod’s varlist.
-
class
src.preprocessor.
PreprocessorFunctionBase
(data_mgr, pod)[source]¶ Bases:
abc.ABC
Abstract interface for implementing a specific preprocessing functionality. We prefer to put each set of operations in its own child class, rather than dumping everything into a general Preprocessor class, in order to keep the logic easier to follow.
It’s up to individual Preprocessor child classes to select which functions to use, and in what order to perform them.
-
edit_request
(data_mgr, pod)[source]¶ Edit the data requested in pod’s
Varlist
queue, based on the transformations the functionality can perform. If the function can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
abstract
process
(var, dataset)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry
) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.
CropDateRangeFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
A
PreprocessorFunctionBase
class which trims the time axis of the dataset to the user-requested analysis period.-
static
cast_to_cftime
(dt, calendar)[source]¶ Workaround to cast python
datetime
dt to cftime.datetime with given calendar. Python stdlib datetime has no support for different calendars.
-
process
(var, ds)[source]¶ Parse quantities related to the calendar for time-dependent data. In particular,
date_range
was set from user input before we knew the model’s calendar. Workaround here to cast those values into cftime.datetime objects so they can be compared with the model data’s time axis.
-
static
-
class
src.preprocessor.
PrecipRateToFluxFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
Convert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the
VarlistEntry
.-
edit_request
(v, pod, data_mgr)[source]¶ Edit the POD’s Varlist prior to query. If v has a standard_name in the list above, insert an alternate varlist entry whose translation requests the complementary type of variable (ie, if given rate, add an entry for flux; if given flux, add an entry for rate.)
-
process
(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry
) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.
ConvertUnitsFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
Convert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the
VarlistEntry
.-
process
(var, ds)[source]¶ Convert units on the dependent variable and coordinates of var from what’s specified in the dataset attributes to what’s given in the VarlistEntry var. Units attributes are updated on the
TranslatedVarlistEntry
.
-
-
class
src.preprocessor.
RenameVariablesFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
-
process
(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry
) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.
ExtractLevelFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
Extract a single pressure level from a Dataset. Unit conversions of pressure are handled by cfunits, (see src.units module) but paramateric vertical coordinates are not handled: interpolation is not implemented here. If the exact level is not provided by the data, KeyError is raised.
-
edit_request
(v, pod, data_mgr)[source]¶ Edit the pod’s
Varlist
prior to data query. If given aVarlistEntry
v which specifies a scalar Z coordinate, return a copy with that scalar_coordinate removed to be used as an alternate variable for v.
-
-
class
src.preprocessor.
ApplyScaleAndOffsetFunction
(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase
If the variable has
scale_factor
andadd_offset
attributes set, apply the corresponding constant linear transformation to the variable’s values and unset these attributes. By default this function is not applied.See CF convention documentation on the
scale_factor
andadd_offset
attributes.-
process
(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry
) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.
MDTFPreprocessorBase
(*args, **kwargs)[source]¶ Bases:
object
Base class for preprocessing data after it’s been fetched, in order to put it into a format expected by PODs. The only functionality implemented here is parsing data axes and CF attributes; all other functionality is provided by
PreprocessorFunctionBase
functions, which are called in order.-
edit_request
(data_mgr, pod)[source]¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
setup
(data_mgr, pod)[source]¶ Method to do additional configuration immediately before
process()
is called on each variable for pod.
-
property
open_dataset_kwargs
¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
property
save_dataset_kwargs
¶ Arguments passed to xarray to_netcdf().
-
clean_nc_var_encoding
(var, name, ds_obj)[source]¶ Clean up the
attrs
andencoding
dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUND
byxr_parser.DefaultDatasetParser
. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValue
attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValue
by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs
(var, ds)[source]¶ Call
clean_nc_var_encoding()
on all sets of attributes in the Dataset ds.
-
log_history_attr
(var, ds)[source]¶ Update
history
attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
write_dataset
(var, ds)[source]¶ Writes processed Dataset ds to location specified by
dest_path
attribute of var, using xarray to_netcdf()
-
load_ds
(var)[source]¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset()
.
-
process_ds
(var, ds)[source]¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
write_ds
(var, ds)[source]¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset()
.
-
-
class
src.preprocessor.
SingleFilePreprocessor
(*args, **kwargs)[source]¶ Bases:
src.preprocessor.MDTFPreprocessorBase
A
MDTFPreprocessorBase
for preprocessing model data that is provided as a single netcdf file per variable, for example the sample model data.-
read_dataset
(var)[source]¶ Read a single file Dataset specified by the
local_data
attribute of var, usingread_one_file()
.
-
clean_nc_var_encoding
(var, name, ds_obj)¶ Clean up the
attrs
andencoding
dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUND
byxr_parser.DefaultDatasetParser
. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValue
attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValue
by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs
(var, ds)¶ Call
clean_nc_var_encoding()
on all sets of attributes in the Dataset ds.
-
edit_request
(data_mgr, pod)¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds
(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset()
.
-
log_history_attr
(var, ds)¶ Update
history
attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs
¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process
(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds
(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_one_file
(var, path_list)¶
-
property
save_dataset_kwargs
¶ Arguments passed to xarray to_netcdf().
-
setup
(data_mgr, pod)¶ Method to do additional configuration immediately before
process()
is called on each variable for pod.
-
write_dataset
(var, ds)¶ Writes processed Dataset ds to location specified by
dest_path
attribute of var, using xarray to_netcdf()
-
write_ds
(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset()
.
-
-
class
src.preprocessor.
DaskMultiFilePreprocessor
(*args, **kwargs)[source]¶ Bases:
src.preprocessor.MDTFPreprocessorBase
A
MDTFPreprocessorBase
that uses xarray’s dask support to preprocessing model data provided as one or several netcdf files per variable.-
edit_request
(data_mgr, pod)[source]¶ Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
read_dataset
(var)[source]¶ Open multi-file Dataset specified by the
local_data
attribute of var, wrapping xarray open_mfdataset().
-
clean_nc_var_encoding
(var, name, ds_obj)¶ Clean up the
attrs
andencoding
dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUND
byxr_parser.DefaultDatasetParser
. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValue
attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValue
by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs
(var, ds)¶ Call
clean_nc_var_encoding()
on all sets of attributes in the Dataset ds.
-
load_ds
(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset()
.
-
log_history_attr
(var, ds)¶ Update
history
attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs
¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process
(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds
(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_one_file
(var, path_list)¶
-
property
save_dataset_kwargs
¶ Arguments passed to xarray to_netcdf().
-
setup
(data_mgr, pod)¶ Method to do additional configuration immediately before
process()
is called on each variable for pod.
-
write_dataset
(var, ds)¶ Writes processed Dataset ds to location specified by
dest_path
attribute of var, using xarray to_netcdf()
-
write_ds
(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset()
.
-
-
class
src.preprocessor.
SampleDataPreprocessor
(*args, **kwargs)[source]¶ Bases:
src.preprocessor.SingleFilePreprocessor
Implementation class for
MDTFPreprocessorBase
intended for use on sample model data distributed with the package. Assumes all data is in one netCDF file.-
clean_nc_var_encoding
(var, name, ds_obj)¶ Clean up the
attrs
andencoding
dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUND
byxr_parser.DefaultDatasetParser
. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValue
attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValue
by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs
(var, ds)¶ Call
clean_nc_var_encoding()
on all sets of attributes in the Dataset ds.
-
edit_request
(data_mgr, pod)¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds
(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset()
.
-
log_history_attr
(var, ds)¶ Update
history
attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs
¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process
(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds
(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_dataset
(var)¶ Read a single file Dataset specified by the
local_data
attribute of var, usingread_one_file()
.
-
read_one_file
(var, path_list)¶
-
property
save_dataset_kwargs
¶ Arguments passed to xarray to_netcdf().
-
setup
(data_mgr, pod)¶ Method to do additional configuration immediately before
process()
is called on each variable for pod.
-
write_dataset
(var, ds)¶ Writes processed Dataset ds to location specified by
dest_path
attribute of var, using xarray to_netcdf()
-
write_ds
(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset()
.
-
-
class
src.preprocessor.
DefaultPreprocessor
(*args, **kwargs)[source]¶ Bases:
src.preprocessor.DaskMultiFilePreprocessor
Implementation class for
MDTFPreprocessorBase
for the general use case. Includes all implemented functionality and handles multi-file data.-
__init__
(data_mgr, pod)¶ Initialize self. See help(type(self)) for accurate signature.
-
clean_nc_var_encoding
(var, name, ds_obj)¶ Clean up the
attrs
andencoding
dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUND
byxr_parser.DefaultDatasetParser
. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValue
attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValue
by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs
(var, ds)¶ Call
clean_nc_var_encoding()
on all sets of attributes in the Dataset ds.
-
edit_request
(data_mgr, pod)¶ Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds
(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset()
.
-
log_history_attr
(var, ds)¶ Update
history
attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs
¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process
(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds
(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_dataset
(var)¶ Open multi-file Dataset specified by the
local_data
attribute of var, wrapping xarray open_mfdataset().
-
read_one_file
(var, path_list)¶
-
property
save_dataset_kwargs
¶ Arguments passed to xarray to_netcdf().
-
setup
(data_mgr, pod)¶ Method to do additional configuration immediately before
process()
is called on each variable for pod.
-
write_dataset
(var, ds)¶ Writes processed Dataset ds to location specified by
dest_path
attribute of var, using xarray to_netcdf()
-
write_ds
(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset()
.
-