src.util.dataclass module¶
Extensions to Python dataclasses
, for streamlined class definition.
- class src.util.dataclass.ClassMaker[source]¶
Bases:
object
Class to instantiate other classes from strings
- class src.util.dataclass.RegexPatternBase[source]¶
Bases:
object
Dummy parent class for
RegexPattern
andChainedRegexPattern
.
- class src.util.dataclass.RegexPattern(regex, defaults=None, input_field=None, match_error_filter=None)[source]¶
Bases:
UserDict
,RegexPatternBase
Wraps
re.Pattern
with more convenience methods for the use case of parsing information in a string, using a regex with named capture groups corresponding to the data fields being collected from the string.- __init__(regex, defaults=None, input_field=None, match_error_filter=None)[source]¶
Constructor.
- Parameters:
regex (
str or :py:class:`re.Pattern`
) – regex to use for string parsing. Should contain named match groups corresponding to the fields to parse.defaults (
dict
) – Optional. If supplied, any fields not matched by the named match groups in regex will be set equal to their values here.input_field (
str
) – Optional. If supplied, add a field to the match with the supplied name which will be set equal to the contents of the input string on a successful match.match_error_filter (
bool or :class:`RegexPattern
orChainedRegexPattern`
) – Optional. If supplied, determines whether a ValueError is raised when thematch()
method fails to parse a string (see below.)
- data¶
Key:value pairs corresponding to the contents of the matching groups from the last successful call to
match()
, or empty if no successful call has been made. Fromcollections.UserDict
.- Type:
- fields¶
Set of fields matched by the pattern. Consists of the union of named match groups in regex, and all keys in defaults.
- Type:
- match(str_, *args)[source]¶
Match str_ using Python
re.fullmatch()
with regex and populate object’s fields according to the values captured by the named capture groups in regex.- Parameters:
str_ (
str
) – Input string to parse.args – Optional. Flags (as defined in Python
re
) to use in there.fullmatch()
method of the regex and match_error_filter (if defined.)
- Raises:
RegexParseError – If
match()
fails to parse the input string, and the following conditions on match_error_filter are met. If match_error_filter not supplied (default), always raise whenmatch()
fails. If match_error_filter is bool, always/never raise. If match_error_filter is aRegexPattern
orChainedRegexPattern
, attempt tomatch()
the input string that caused the failed match against the value of match_error_filter. If it matches, do not raise an error; otherwise raise an error.RegexSuppressedError – If
match()
fails to parse the input string and the above conditions involving match_error_filter are not met. One of RegexParseError or RegexSuppressedError is always raised on failure.
- class src.util.dataclass.RegexPatternWithTemplate(regex, defaults=None, input_field=None, match_error_filter=None, template=None, log=<Logger>)[source]¶
Bases:
RegexPattern
Adds formatted output to
RegexPattern
.- __init__(regex, defaults=None, input_field=None, match_error_filter=None, template=None, log=<Logger>)[source]¶
Constructor.
- Parameters:
template (
str
) – Optional. Template string to use for formatting contents of match informat()
method. Contents of the matched fields will be subsituted using the {}-syntax of python string formatting.
Other arguments are the same as in
RegexPattern
.
- format()[source]¶
Return template string, templated with the values obtained in the last successful call to
match()
.
- clear()¶
Erase field values parsed from a pre-existing match.
- match(str_, *args)¶
Match str_ using Python
re.fullmatch()
with regex and populate object’s fields according to the values captured by the named capture groups in regex.- Parameters:
str_ (
str
) – Input string to parse.args – Optional. Flags (as defined in Python
re
) to use in there.fullmatch()
method of the regex and match_error_filter (if defined.)
- Raises:
RegexParseError – If
match()
fails to parse the input string, and the following conditions on match_error_filter are met. If match_error_filter not supplied (default), always raise whenmatch()
fails. If match_error_filter is bool, always/never raise. If match_error_filter is aRegexPattern
orChainedRegexPattern
, attempt tomatch()
the input string that caused the failed match against the value of match_error_filter. If it matches, do not raise an error; otherwise raise an error.RegexSuppressedError – If
match()
fails to parse the input string and the above conditions involving match_error_filter are not met. One of RegexParseError or RegexSuppressedError is always raised on failure.
- update_defaults(d)¶
Update the default values used for the match with the values in d.
- class src.util.dataclass.ChainedRegexPattern(*string_patterns, defaults=None, input_field=None, match_error_filter=None)[source]¶
Bases:
RegexPatternBase
Class which takes an ‘or’ of multiple :class:`RegexPatterns to parse data that may be represented as a string in one of multiple formats.
Matches are attempted on the supplied RegexPatterns in order, with the first one that succeeds determining the parsed field values. Public methods work the same as on
RegexPattern
.- __init__(*string_patterns, defaults=None, input_field=None, match_error_filter=None)[source]¶
Constructor.
- Parameters:
string_patterns (
iterable of :class:`RegexPattern`
) – Individual regexes which will be tried, in order, whenmatch()
is called. Parsing will be done by the first RegexPattern whosematch()
succeeds.
Note
The constructor changes attributes on
RegexPattern
objects passed as string_patterns, so once the object is created its componentRegexPattern
objects shouldn’t be accessed on their own.Other arguments and attributes are the same as in
RegexPattern
.
- property is_matched¶
- property data¶
- src.util.dataclass.NOTSET = sentinel.NotSet¶
Sentinel object to detect uninitialized values for fields in
mdtf_dataclass()
objects, for use in cases whereNone
is a valid value for the field.
- src.util.dataclass.MANDATORY = sentinel.Mandatory¶
Sentinel object to mark all
mdtf_dataclass()
fields that do not take a default value. This is a workaround to avoid errors with non-default fields coming after default fields in the dataclass auto-generated__init__
method under inheritance: we use the second solution described in https://stackoverflow.com/a/53085935.
- src.util.dataclass.mdtf_dataclass(cls=None, **deco_kwargs)[source]¶
Wrap the Python
dataclass()
class decorator to customize dataclasses to provide rudimentary type checking and conversion. This is hacky, since dataclasses don’t enforce type annotations for their fields. A better solution would be to use the third-party cattrs package, which has essentially the same aim.The decorator rewrites the class’s constructor as follows:
Execute the auto-generated
__init__
method from Pythondataclass()
.Verify that fields with
MANDATORY
default have been assigned values. We have to work around the usualdataclass()
way of doing this, because it leads to errors in the signature of the auto-generated__init__
method under inheritance (mandatory fields can’t come after optional fields in the signature.)Execute the class’s
__post_init__
method, if defined, which can do more complex type coercion and validation.Finally, check each field’s value to see if it’s consistent with the given type information. If not, attempt to coerce it to that type, using a
from_struct
method on that type if it exists.
Warning
Unlike
dataclass()
, all fields must have a default or default_factory defined. Fields which are mandatory must have their default value set to the sentinel objectMANDATORY
. This is necessary in order for dataclass inheritance to work properly, and is not currently enforced when the class is decorated.- Parameters:
cls (
class
) – Class to be decorated.deco_kwargs – Optional. Keyword arguments to pass to the Python
dataclass()
class decorator.
- Raises:
DataclassParseError – If we attempted to construct an instance without giving values for
MANDATORY
fields, or if values of some fields after__post_init__
could not be coerced into the types given in their annotation.
- src.util.dataclass.is_regex_dataclass(obj)[source]¶
Returns True if obj is a
regex_dataclass()
.
- src.util.dataclass.regex_dataclass(pattern, **deco_kwargs)[source]¶
Decorator combining the functionality of
RegexPattern
andmdtf_dataclass()
: dataclass fields are parsed from a regex and coerced to appropriate classes.Specifically, this is done via a
from_string
classmethod, added by this decorator, which creates dataclass instances by parsing an input string with aRegexPattern
orChainedRegexPattern
. The values of all fields returned by thematch()
method of the pattern are passed to the__init__
method of the dataclass as kwargs.Additionally, if the type of one or more fields is set to a class that’s also been decorated with regex_dataclass, the parsing logic for that field’s regex_dataclass will be invoked on that field’s value (i.e., a string obtained by regex matching in this regex_dataclass), and the parsed values of those fields will be supplied to this regex_dataclass constructor. This is our implementation of composition for regex_dataclasses.
Note
Unlike
mdtf_dataclass()
, type coercion here is done after__post_init__
for these dataclasses. This is necessary due to composition: if a regex_dataclass is being instantiated as a field of another regex_dataclass, all values being passed to it will be strings (the regex fields), and type coercion is the job of__post_init__
.
- src.util.dataclass.filter_dataclass(d, dc, init=False)[source]¶
Return a dict of the subset of fields or entries in d that correspond to the fields in dataclass dc.
- Parameters:
d (
dict, dataclass or dataclass instance
) – Object to take field values from.dc (
dataclass or dataclass instance
) – Dataclass defining the set of fields that are returned. Values of fields in d that are not fields of dc are discarded.init (
bool or 'all'
) – Optional, default False. Controls whether init-only fields are included:If False: Include only the fields of dc as returned by
dataclasses.fields()
.If True: Include only the arguments to dc’s constructor (i.e., include any init-only fields and exclude any of dc’s fields with *init*=False.)
If ‘all’: Include the union of the above two options.
- Returns:
dict – The subset of key:value pairs from d such that the keys are included in the set of dc’s fields specified by the value of init.
- src.util.dataclass.coerce_to_dataclass(d, dc, **kwargs)[source]¶
Given a dataclass dc (may be the class or an instance of it), and a dict, dataclass or dataclass instance d, return an instance of dc’s class with field values initialized from those in d, along with any extra values passed in kwargs.
Because this constructs a new dataclass instance, it copies field values according to the *init*=True logic in
filter_dataclass()
.- Parameters:
d (
dict, dataclass or dataclass instance
) – Object to take field values from.dc (
dataclass or dataclass instance
) – Class to instantiate.kwargs – Optional. If provided, override field values provided in d.
- Returns:
Instance of dataclass dc with field values populated from kwargs and d.