More holistic approach to "tables"
Right now, we're trying to identify all the table types and make them into individual instances. There are many types however and this isn't clearly a strategy that will achieve complete coverage, which is important for deciding what to remove from plaintext. I propose to switch to using the role
attribute alongside a few core classes. I think this would mean:
- class Wikitable: core content tables and nothing else (
<table class="wikitable...
). - class Infobox: infoboxes and nothing else (
<table class="infobox...
). - class Note: hatnotes, article stub boxes, disambiguation boxes, and anything else that uses
role=note
. These can various types of tags (div
,table
) and I don't specify a list of classes that they must match to allow for edge cases. - class Navigation: navboxes, portalboxes, sister project boxes, and anything else that uses
role=navigation
. These can various types of tags (div
,table
) and I don't specify a list of classes that they must match to allow for edge cases. - class Message: message boxes in all their other forms (
<table>
with one of these classes:ambox
,fmbox
,ombox
,imbox
,cmbox
,tmbox
). This one gets tricky though -- ideally I'd like to also include anything else that usesrole=presentation
but in practice that role seems to be used for other things too. For instance, there's a<table role="presentation">
that is nested within article-stub boxes which would make them a Message nested within a Note? and also see the edge-case below which I presume happens in other forms too. And whiledmbox
would seem to fit this based on its classname, it actually is just adiv
element and is considered a Note.
Note: alternatively, we could keep it mostly as it is and only introduce the roles as a filter in plaintext functions. I prefer the full refactoring though as I think it's more consistent and therefore simpler and I don't currently like the messiness of the current taxonomy of table-like classes anyways.
Edge-cases that wouldn't be covered:
-
https://en.wikipedia.org/wiki/1984%E2%80%9385_FIS_Cross-Country_World_Cup#Overall_standings has a
role=presentation
table that has no messages in it and instead just holds theMen's standings
andWomen's standings
sections+tables. The tables underneath it don't haveclass=wikitable
either.