src.util.dataclass module

Extensions to Python dataclasses, for streamlined class definition.

class src.util.dataclass.RegexPatternBase[source]

Bases: object

Dummy parent class for RegexPattern and ChainedRegexPattern.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class src.util.dataclass.RegexPattern(regex, defaults=None, input_field=None, match_error_filter=None)[source]

Bases: collections.UserDict, src.util.dataclass.RegexPatternBase

Wraps re.Pattern with more convenience methods for the use case of parsing information in a string, using a regex with named capture groups corresponding to the data fields being collected from the string.

__init__(regex, defaults=None, input_field=None, match_error_filter=None)[source]

Constructor.

Parameters
  • regex (str or re.Pattern) – regex to use for string parsing. Should contain named match groups corresponding to the fields to parse.

  • defaults (dict) – Optional. If supplied, any fields not matched by the named match groups in regex will be set equal to their values here.

  • input_field (str) – Optional. If supplied, add a field to the match with the supplied name which will be set equal to the contents of the input string on a successful match.

  • match_error_filter (bool or RegexPattern or ChainedRegexPattern) – Optional. If supplied, determines whether a ValueError is raised when the match() method fails to parse a string (see below.)

Attributes
  • data (dict) – Key:value pairs corresponding to the contents of the matching groups from the last successful call to match(), or empty if no successful call has been made. From collections.UserDict.

  • fields (frozenset) – Set of fields matched by the pattern. Consists of the union of named match groups in regex, and all keys in defaults.

  • input_string (str) – Contains string that was input to last call of match(), whether successful or not.

  • is_matched (bool) – True if the last call to match() was successful, False otherwise.

clear()[source]

Erase field values parsed from a pre-existing match.

update_defaults(d)[source]

Update the default values used for the match with the values in d.

match(str_, *args)[source]

Match str_ using Python re.fullmatch() with regex and populate object’s fields according to the values captured by the named capture groups in regex.

Parameters
  • str_ (str) – Input string to parse.

  • args – Optional. Flags (as defined in Python re) to use in the re.fullmatch() method of the regex and match_error_filter (if defined.)

Raises
  • RegexParseError – If match() fails to parse the input string, and the following conditions on match_error_filter are met. If match_error_filter not supplied (default), always raise when match() fails. If match_error_filter is bool, always/never raise. If match_error_filter is a RegexPattern or ChainedRegexPattern, attempt to match() the input string that caused the failed match against the value of match_error_filter. If it matches, do not raise an error; otherwise raise an error.

  • RegexSuppressedError – If match() fails to parse the input string and the above conditions involving match_error_filter are not met. One of RegexParseError or RegexSuppressedError is always raised on failure.

copy()
classmethod fromkeys(iterable, value=None)
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values
class src.util.dataclass.RegexPatternWithTemplate(regex, defaults=None, input_field=None, match_error_filter=None, template=None, log=<Logger>)[source]

Bases: src.util.dataclass.RegexPattern

Adds formatted output to RegexPattern.

__init__(regex, defaults=None, input_field=None, match_error_filter=None, template=None, log=<Logger>)[source]

Constructor.

Parameters

template (str) – Optional. Template string to use for formatting contents of match in format() method. Contents of the matched fields will be subsituted using the {}-syntax of python string formatting.

Other arguments are the same as in RegexPattern.

format()[source]

Return template string, templated with the values obtained in the last successful call to match().

clear()

Erase field values parsed from a pre-existing match.

copy()
classmethod fromkeys(iterable, value=None)
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
match(str_, *args)

Match str_ using Python re.fullmatch() with regex and populate object’s fields according to the values captured by the named capture groups in regex.

Parameters
  • str_ (str) – Input string to parse.

  • args – Optional. Flags (as defined in Python re) to use in the re.fullmatch() method of the regex and match_error_filter (if defined.)

Raises
  • RegexParseError – If match() fails to parse the input string, and the following conditions on match_error_filter are met. If match_error_filter not supplied (default), always raise when match() fails. If match_error_filter is bool, always/never raise. If match_error_filter is a RegexPattern or ChainedRegexPattern, attempt to match() the input string that caused the failed match against the value of match_error_filter. If it matches, do not raise an error; otherwise raise an error.

  • RegexSuppressedError – If match() fails to parse the input string and the above conditions involving match_error_filter are not met. One of RegexParseError or RegexSuppressedError is always raised on failure.

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

update_defaults(d)

Update the default values used for the match with the values in d.

values() → an object providing a view on D’s values
class src.util.dataclass.ChainedRegexPattern(*string_patterns, defaults=None, input_field=None, match_error_filter=None)[source]

Bases: src.util.dataclass.RegexPatternBase

Class which takes an ‘or’ of multiple RegexPatterns, to parse data that may be represented as a string in one of multiple formats.

Matches are attempted on the supplied RegexPatterns in order, with the first one that succeeds determining the parsed field values. Public methods work the same as on RegexPattern.

__init__(*string_patterns, defaults=None, input_field=None, match_error_filter=None)[source]

Constructor.

Parameters

string_patterns (iterable of RegexPattern) – Individual regexes which will be tried, in order, when match() is called. Parsing will be done by the first RegexPattern whose match() succeeds.

Note

The constructor changes attributes on RegexPattern objects passed as string_patterns, so once the object is created its component RegexPattern objects shouldn’t be accessed on their own.

Other arguments and attributes are the same as in RegexPattern.

property is_matched
property data
clear()[source]
update_defaults(d)[source]
match(str_, *args)[source]
format()[source]
src.util.dataclass.NOTSET = sentinel.NotSet

Sentinel object to detect uninitialized values for fields in mdtf_dataclass() objects, for use in cases where None is a valid value for the field.

src.util.dataclass.MANDATORY = sentinel.Mandatory

Sentinel object to mark all mdtf_dataclass() fields that do not take a default value. This is a workaround to avoid errors with non-default fields coming after default fields in the dataclass auto-generated __init__ method under inheritance: we use the second solution described in https://stackoverflow.com/a/53085935.

src.util.dataclass.mdtf_dataclass(cls=None, **deco_kwargs)[source]

Wrap the Python dataclass() class decorator to customize dataclasses to provide rudimentary type checking and conversion. This is hacky, since dataclasses don’t enforce type annontations for their fields. A better solution would be to use the third-party cattrs package, which has essentially the same aim.

The decorator rewrites the class’s constructor as follows:

  1. Execute the auto-generated __init__ method from Python dataclass().

  2. Verify that fields with MANDATORY default have been assigned values. We have to work around the usual dataclass() way of doing this, because it leads to errors in the signature of the auto-generated __init__ method under inheritance (mandatory fields can’t come after optional fields in the signature.)

  3. Execute the class’s __post_init__ method, if defined, which can do more complex type coercion and validation.

  4. Finally, check each field’s value to see if it’s consistent with the given type information. If not, attempt to coerce it to that type, using a from_struct method on that type if it exists.

Warning

Unlike dataclass(), all fields must have a default or default_factory defined. Fields which are mandatory must have their default value set to the sentinel object MANDATORY. This is necessary in order for dataclass inheritance to work properly, and is not currently enforced when the class is decorated.

Parameters
  • cls (class) – Class to be decorated.

  • deco_kwargs – Optional. Keyword arguments to pass to the Python dataclass() class decorator.

Raises

DataclassParseError – If we attempted to construct an instance without giving values for MANDATORY fields, or if values of some fields after __post_init__ could not be coerced into the types given in their annotation.

src.util.dataclass.is_regex_dataclass(obj)[source]

Returns True if obj is a regex_dataclass().

src.util.dataclass.regex_dataclass(pattern, **deco_kwargs)[source]

Decorator combining the functionality of RegexPattern and mdtf_dataclass(): dataclass fields are parsed from a regex and coerced to appropriate classes.

Specifically, this is done via a from_string classmethod, added by this decorator, which creates dataclass instances by parsing an input string with a RegexPattern or ChainedRegexPattern. The values of all fields returned by the match() method of the pattern are passed to the __init__ method of the dataclass as kwargs.

Additionally, if the type of one or more fields is set to a class that’s also been decorated with regex_dataclass, the parsing logic for that field’s regex_dataclass will be invoked on that field’s value (i.e., a string obtained by regex matching in this regex_dataclass), and the parsed values of those fields will be supplied to this regex_dataclass constructor. This is our implementation of composition for regex_dataclasses.

Note

Unlike mdtf_dataclass(), type coercion here is done after __post_init__ for these dataclasses. This is necessary due to composition: if a regex_dataclass is being instantiated as a field of another regex_dataclass, all values being passed to it will be strings (the regex fields), and type coercion is the job of __post_init__.

src.util.dataclass.dataclass_factory(dataclass_decorator, class_name, *parents, **kwargs)[source]

Function that returns a dataclass (ie, a decorated class) whose fields are the union of the fields in parents, which the new dataclass inherits from.

Parameters
  • dataclass_decorator (function) – decorator to apply to the new class.

  • class_name (str) – name of the new class.

  • parents – collection of other mdtf_dataclasses to inherit from. Order in the collection determines the MRO.

  • kwargs – Optional; arguments to pass to dataclass_decorator when it’s applied to produce the returned class.

src.util.dataclass.filter_dataclass(d, dc, init=False)[source]

Return a dict of the subset of fields or entries in d that correspond to the fields in dataclass dc.

Parameters
  • d (dict, dataclass or dataclass instance) – Object to take field values from.

  • dc (dataclass or dataclass instance) – Dataclass defining the set of fields that are returned. Values of fields in d that are not fields of dc are discarded.

  • init (bool or 'all') –

    Optional, default False. Controls whether init-only fields are included:

    • If False: Include only the fields of dc as returned by dataclasses.fields().

    • If True: Include only the arguments to dc’s constructor (i.e., include any init-only fields and exclude any of dc’s fields with init=False.)

    • If ‘all’: Include the union of the above two options.

Returns

The subset of key:value pairs from d such that the keys are included in the set of dc’s fields specified by the value of init.

Return type

dict

src.util.dataclass.coerce_to_dataclass(d, dc, **kwargs)[source]

Given a dataclass dc (may be the class or an instance of it), and a dict, dataclass or dataclass instance d, return an instance of dc’s class with field values initialized from those in d, along with any extra values passed in kwargs.

Because this constructs a new dataclass instance, it copies field values according to the init=True logic in filter_dataclass().

Parameters
  • d (dict, dataclass or dataclass instance) – Object to take field values from.

  • dc (dataclass or dataclass instance) – Class to instantiate.

  • kwargs – Optional. If provided, override field values provided in d.

Returns

Instance of dataclass dc with field values populated from kwargs and d.