Port the HTML table filter from section topics
- copy the generic dataframe filter function from https://gitlab.wikimedia.org/repos/structured-data/section-topics/-/blob/a11b6e70f2b00d039b05715167545d6abc284717/section_topics/pipeline.py#L78
- call
F.lower
on thesection_title
column of the passed table filter dataframe to match againsttarget_heading
of the source one - select & rename the passed table filter columns to comply with the source ones
Note that this MR also introduces changes needed to make the Spark scripts work with Spark 3 on stat boxes.
Bug: T330841
Edited by Marco Fossati