New job: scrape and aggregate page summaries

Note

Reviewers: Please see REVIEW_AIRFLOW_MRS.md for directions on what to check for.

Contributor checklist

Description

  • T418442
    • wiki_page_cite_references_monthly: Monthly job to scrape Enterprise snapshots, summarize per-page usage of Cite references, and aggregate by wiki.

Test outputs

Please describe the outputs of the tests that were ran.

Destination tables summary

If applicable, include sanitized outputs of DAG jobs so that the results can be compared against expected outputs.

dbname snapshot_date error_key error_count
ffwiki 2026-03-02 cite_error_ref_too_many_keys 4
dbname snapshot_date transclusion_name transclusion_count
ffwiki 2026-03-02 Efn 67
dbname snapshot_date transclusion_name transclusion_count
ffwiki 2026-03-02 Officeholder table 9
dbname snapshot_date transclusion_name transclusion_count
ffwiki 2026-03-02 Cite quran 6
dbname snapshot_date identical_refs_count identical_refs_on_pages_with_25_or_less_refs_average identical_refs_on_pages_with_over_25_refs_average identical_refs_on_pages_with_over_25_refs_count list_defined_ref_per_page_having_ref list_defined_ref_sum max_ref_reuse_average nested_ref_sum page_count pages_with_automatically_named_refs_count pages_with_identical_refs_and_over_25_refs_count pages_with_identical_refs_count pages_with_multiple_reflists_count pages_with_named_refs_count pages_with_nested_refs_count pages_with_over_25_refs_count pages_with_ref_reuse_count pages_with_refs_count pages_with_similar_refs_count pages_with_subrefs_count proportion_of_named_refs_uniquely_named_average proportion_of_pages_with_identical_refs proportion_of_pages_with_nested_refs proportion_of_pages_with_similar_refs proportion_of_pages_with_refs proportion_of_refs_from_transclusion proportion_of_refs_having_transclusion proportion_of_refs_named_average proportion_of_refs_reused_average ref_by_transclusion_average ref_by_transclusion_count ref_count ref_count_per_page ref_count_per_page_having_ref reflist_count reflists_per_page_having_ref refs_with_solely_transclusion_count refs_with_transclusions_countsimilar_refs_count subrefs_sum transclusion_average transclusion_sum wikitext_length_average
ffwiki 2026-03-02 1151 1150.9946 1.4634147 60 4.4535493E-4 9 1.6768292 3 26180 1167 22 696 2176 4908 2 41 820 11227 176 0 0.92662185 0.06199341 1.7814199E-4 0.015676495 0.42883882 0.010996527 0.67252445 0.299617 0.024091247 0.037231673 418 38012 1.451948 3.3857665 16228 1.445444 24147 25564 312 0 1.8700917 48959
automatic_ref_name_usages_count automatic_ref_names_count html_length identical_ref_count list_defined_ref_count main_ref_count nested_ref_count page_id page_namespacepage_title potential_ref_transclusions potential_subref_transclusions potential_transclusions_with_top_level_refs ratio_subrefs_to_main_refs ref_by_top_ref_transclusion_count ref_by_transclusion_count ref_count ref_error_counts_by_type ref_reuse_count ref_reuse_counts ref_with_name_count reflist_count reflist_item_count reflist_subref_item_count refs_with_solely_transclusion_count refs_with_transclusions_count rev_id rev_timestamp similar_ref_count subref_count subref_error_counts_by_type subref_reuse_count subrefs_with_errors_count transclusion_count transclusions_inside_refs transclusions_inside_subrefs unique_name_count wikitext_length database snapshot_date
0 0 2775 0 0 0 0 27460 0 The Fall-Down Artist {} {} {} 0.0 0 0 0 {} 0 [] 0 0 00 0 0 101582 NULL 0 0 {} 0 0 1 {} {} 0 676 ffwiki 2026-03-02
Test screenshots

Include screenshots of the DAGs in the test Airflow UI as verification of the tests.

  • wiki_page_cite_references_monthly

image

Edited by Awight

Merge request reports

Loading