Skip to content

filter main pages by QID from lead images

Marco Fossati requested to merge T325629 into main

Pass a list of Wikidata QIDs to be filtered from the lead images dataframe. The default value is FREQUENTLY_UPDATED_PAGE_QIDS = ['Q5296'], which filters main pages.

Also pull out the hardcoded QID threshold value of the existing filter.

python image_suggestions/commonswiki_file.py T325629 2024-08-12 64
prod = spark.read.table('analytics_platform_eng.image_suggestions_lead_image_data').where('snapshot="2024-08-12"')
dev = spark.read.table('T325629.image_suggestions_lead_image_data').where('snapshot="2024-08-12"')

prod.where('item_id="Q5296"').toPandas()
       page_id item_id                                     tag  score       found_on    snapshot
0     59905123   Q5296  image.linked.from.wikipedia.lead_image      1      [dinwiki]  2024-08-12
1      8397621   Q5296  image.linked.from.wikipedia.lead_image      2      [mrjwiki]  2024-08-12
2     60126046   Q5296  image.linked.from.wikipedia.lead_image      1  [brwikiquote]  2024-08-12
3      1145429   Q5296  image.linked.from.wikipedia.lead_image      5      [wuuwiki]  2024-08-12
4       997212   Q5296  image.linked.from.wikipedia.lead_image      1      [chywiki]  2024-08-12
..         ...     ...                                     ...    ...            ...         ...
291    1721280   Q5296  image.linked.from.wikipedia.lead_image      2      [novwiki]  2024-08-12
292     714512   Q5296  image.linked.from.wikipedia.lead_image     40       [euwiki]  2024-08-12
293  121293459   Q5296  image.linked.from.wikipedia.lead_image     10       [ltwiki]  2024-08-12
294      81454   Q5296  image.linked.from.wikipedia.lead_image      5       [kswiki]  2024-08-12
295    1382935   Q5296  image.linked.from.wikipedia.lead_image     11  [tawikiquote]  2024-08-12

[296 rows x 6 columns]

dev.where('item_id="Q5296"').toPandas()
Empty DataFrame
Columns: [page_id, item_id, tag, score, found_on, snapshot]
Index: []

Bug: T325629

Merge request reports

Loading