Add logging to indicate mismatch between HTML spec version and html dumps version
Our specific extraction logic is generally only correct for a given HTML spec -- e.g., HTML 2.5 changed how different filetypes are identified in the DOM. While most if not all things will be stable version-to-version (breaking changes should be rare), it would probably be good for our code to have a hard-coded parameter for what HTML spec it was built for that is compared to the HTML spec number in the article HTML to make sure they match (and maybe emits a warning message if there's a mismatch so folks know there may be errors).