Skip to content

T404196 Adding DAGs that lead to measurements of WD entity usage

Contributor checklist

Description

  • T404196: The following three DAGs calculate: The usage of Wikidata entities on pages by wiki; the usage of Wikidata entities across all wikis; and the number of pages using Wikidata entities by wiki.
    • DAG ID: wd_entity_usage_by_wiki_monthly_dag
    • Destination: wmde.wd_entity_usage_by_wiki_monthly
    • DAG ID: wd_entity_usage_distinct_monthly_dag
    • Destination: wmde.wd_entity_usage_distinct_monthly
    • DAG ID: wiki_page_wd_entity_usage_monthly_dag
    • Destination: wmde.wiki_page_wd_entity_usage_monthly

Note: I'm including a DAG for wiki_wd_infobox_usage that will be used once one of the underlying tables is available. This DAG will not be tested or activated as a part of this MR, and as of now is broken due to lacking a solution for running a monthly DAG using a dataset that's partitioned weekly.

Test outputs

Destination table summary
  • wmde.wd_entity_usage_by_wiki_monthly
month wiki total_wd_entities_used total_distinct_wd_entities_used
DATE STRING BIGINT BIGINT
  • wmde.wd_entity_usage_distinct_monthly
month total_wd_entities_used total_distinct_wd_entities_used
DATE BIGINT BIGINT
  • wmde.wiki_page_wd_entity_usage
month wiki total_pages_using_wd_entities
DATE STRING BIGINT
Test screenshots

Note: I'm having issues running the SQLOperator in dev Airflow instances, but the screenshots below do show that the sensors are firing for data that's available.

  • wd_entity_usage_by_wiki_monthly

Screenshot_from_2025-10-01_15-59-26

  • wd_entity_usage_distinct_monthly

Screenshot_from_2025-10-01_15-59-42

  • wiki_page_wd_entity_usage

Screenshot_from_2025-10-01_15-59-54

Edited by Andrew McAllister (WMDE)

Merge request reports

Loading