Revision ID function returns parentID not current revision ID
Relevant code: https://gitlab.wikimedia.org/repos/research/html-dumps/-/blob/main/src/mwparserfromhtml/parse/article.py?ref_type=heads#L66
Example HTML that shows what is needed. We've been extracting 1175301841
but that's the parent revision ID and what we actually need is 1221574718
in the original html
tag.
...
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/" about="https://en.wikipedia.org/wiki/Special:Redirect/revision/1221574718"><head prefix="mwr: https://en.wikipedia.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageId" content="4481033"/><meta property="mw:pageNamespace" content="0"/><link rel="dc:replaces" resource="mwr:revision/1175301841"/><meta property="mw:revisionSHA1" content="3c6beae69a9ceeb3da92cf6c69f418d59a155dca"/><meta property="dc:modified" content="2024-04-30T18:41:07.000Z"/><meta property="mw:htmlVersion" content="2.8.0"/><meta property="mw:html:version" content="2.8.0"/><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Renaissance_in_the_Low_Countries"/><base href="//en.wikipedia.org/wiki/"/><title>Renaissance in the Low Countries</title>
...