src.verify_links module¶

Check output of the files returned by a run of the MDTF framework and determine if any PODs failed to generate files, as determined by non-functional html links in the output webpages.

Based on test_website by Dani Coleman, bundy@ucar.edu

class src.verify_links.Link(origin, target)¶

Bases: tuple

Class representing individual links, to simplify bookkeeping.

Attributes
  • origin (str) – URL of the document containing the link.

  • target (str) – URL referred to by the link.

_asdict()¶

Return a new OrderedDict which maps field names to their values.

_field_defaults = {}¶
_fields = ('origin', 'target')¶
_fields_defaults = {}¶
classmethod _make(iterable)¶

Make a new Link object from a sequence or iterable

_replace(**kwds)¶

Return a new Link object replacing specified fields with new values

property origin¶

Alias for field number 0

property target¶

Alias for field number 1

class src.verify_links.LinkParser(*, convert_charrefs=True)[source]¶

Bases: html.parser.HTMLParser

Custom subclass of HTMLParser which constructs an iterable over each <a> tag.

Adapted from https://stackoverflow.com/a/41663924.

reset()[source]¶

Reset this instance. Loses all unprocessed data.

handle_starttag(tag, attrs)[source]¶
class src.verify_links.LinkVerifier(root, verbose=False)[source]¶

Bases: object

__init__(root, verbose=False)[source]¶

Initialize search for broken links.

Parameters
  • root (str) – Either a URL or path on the local filesystem. Location of the top-level html file to begin the search from.

  • verbose (bool, default False) – Set to True to print each file examined.

static gen_links(f, parser)[source]¶

Generator which parses the contents of an HTML file f and yields targets of all the links it contains.

Adapted from https://stackoverflow.com/a/41663924.

Parameters
  • f – urllib.respose object of the form returned by urlopen(): either HTTPResponse for http or https, or urllib.response.addinfourl for files.

  • parser – instance of LinkParser.

Yields
Contents of the href attribute of each a tag of f, as extracted

by LinkParser.

check_one_url(link)[source]¶

Get list of URLs linked to from the current URL (if any).

Parameters

link (Link) – Instance of Link. Only the URL in link.target is examined.

Returns

Either

  1. None if link.target can’t be opened,

  2. the empty list if link.target is not an html document, or

  3. a list of links contained in link.target, expressed as

    Link objects.

breadth_first(root_url)[source]¶

Breadth-first search of all files linked from an initial root_url.

The search correctly handles cycles (ie, A.html links to B.html and B.html links to A.html) and only examines files in subdirectories of root_url’s directory, so that links to external sites are ignored, rather than trying to trace the link structure of the whole internet.

Parameters

root_url (str) – URL of an html file to start the search at.

Returns

list of (link_source, link_target) tuples where the file in

link_target couldn’t be found.

group_relative_links(missing)[source]¶

Format paths to missing linked files as relative paths, grouped by POD.

Parameters

missing (list) – List of Link objects found by breadth_first(), whose targets correspond to missing files.

Returns

dict, with keys given by the short names of PODs with missing files

and values given by a list of the files that POD is missing. Missing files are listed by their path relative to the POD’s output directory.

verify_pod_links(pod_name)[source]¶

Perform search for missing linked files that were supposed to have been output by pod_name.

Parameters

pod_name – Name of the POD to check for missing files.

Returns

A list of the files that POD is missing. Missing files are listed by

their path relative to the POD’s output directory.

verify_all_links()[source]¶

Perform search for any missing linked files from a run of the MDTF framework and collect them by POD.

Returns

dict, with keys given by the short names of PODs with missing files

and values given by a list of the files that POD is missing. Missing files are listed by their path relative to the POD’s output directory.

MDTF Diagnostics

Navigation

  • Getting started
  • Site-specific information
  • Developer information
  • Diagnostics reference
  • Framework reference
  • Internal framework code

  • Getting Started (PDF)
  • Developer's Walkthough (PDF)
  • Full documentation (PDF)

Related Topics

  • Documentation overview

Quick search

©2020, Model Diagnostics Task Force. | Powered by Sphinx 3.1.2 & Alabaster 0.7.12 | Page source