src.verify_links module¶
Check output of the files returned by a run of the MDTF framework and determine if any PODs failed to generate files, as determined by non-functional html links in the output webpages.
Based on test_website by Dani Coleman, bundy@ucar.edu
-
class
src.verify_links.Link(origin, target)¶ Bases:
tupleClass representing individual links, to simplify bookkeeping.
- Attributes
origin (str) – URL of the document containing the link.
target (str) – URL referred to by the link.
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
_field_defaults= {}¶
-
_fields= ('origin', 'target')¶
-
_fields_defaults= {}¶
-
classmethod
_make(iterable)¶ Make a new Link object from a sequence or iterable
-
_replace(**kwds)¶ Return a new Link object replacing specified fields with new values
-
property
origin¶ Alias for field number 0
-
property
target¶ Alias for field number 1
-
class
src.verify_links.LinkParser(*, convert_charrefs=True)[source]¶ Bases:
html.parser.HTMLParserCustom subclass of
HTMLParserwhich constructs an iterable over each <a> tag.Adapted from https://stackoverflow.com/a/41663924.
-
class
src.verify_links.LinkVerifier(root, verbose=False)[source]¶ Bases:
object-
static
gen_links(f, parser)[source]¶ Generator which parses the contents of an HTML file f and yields targets of all the links it contains.
Adapted from https://stackoverflow.com/a/41663924.
- Parameters
f –
urllib.resposeobject of the form returned byurlopen(): eitherHTTPResponsefor http or https, orurllib.response.addinfourlfor files.parser – instance of
LinkParser.
- Yields
- Contents of the href attribute of each a tag of f, as extracted
by
LinkParser.
-
breadth_first(root_url)[source]¶ Breadth-first search of all files linked from an initial root_url.
The search correctly handles cycles (ie, A.html links to B.html and B.html links to A.html) and only examines files in subdirectories of root_url’s directory, so that links to external sites are ignored, rather than trying to trace the link structure of the whole internet.
- Parameters
root_url (str) – URL of an html file to start the search at.
- Returns
- list of (link_source, link_target) tuples where the file in
link_target couldn’t be found.
-
group_relative_links(missing)[source]¶ Format paths to missing linked files as relative paths, grouped by POD.
- Parameters
missing (list) – List of
Linkobjects found bybreadth_first(), whose targets correspond to missing files.- Returns
- dict, with keys given by the short names of PODs with missing files
and values given by a list of the files that POD is missing. Missing files are listed by their path relative to the POD’s output directory.
-
verify_pod_links(pod_name)[source]¶ Perform search for missing linked files that were supposed to have been output by pod_name.
- Parameters
pod_name – Name of the POD to check for missing files.
- Returns
- A list of the files that POD is missing. Missing files are listed by
their path relative to the POD’s output directory.
-
verify_all_links()[source]¶ Perform search for any missing linked files from a run of the MDTF framework and collect them by POD.
- Returns
- dict, with keys given by the short names of PODs with missing files
and values given by a list of the files that POD is missing. Missing files are listed by their path relative to the POD’s output directory.
-
static