Skip to content

Add DAGs for video metric aggregations + update test var JSONs

Contributor checklist

Description

  • Tentatively T198628 (T386916 was merged in): The Wiki Loves Broadcast project has been looking for metrics on video plays for some time. These DAGs and the corresponding Hive queries attempt to solve this as well as possible within the context of the current data stack.

A summary is:

  • We'd like monthly metrics of video views
  • As of now we have just been providing metrics on video thumbnails being loaded/entering the viewport
  • There are routinely multiple fires of a video play for different quality levels each time the user clicks the thumbnail
  • The best possible solution with the current data available was to count unique video plays on a daily basis broken down by user agent and ip
  • These daily aggregates for videos within a category links category are then aggregated for "monthly unique viewers on a daily basis"
    • This allows for a number that will more closely reflect a monthly view count by allowing for a user agent x ip combination to be counted twice only if the view is on different day

DAGs and destination tables:

  • DAG ID: wlb_commons_video_metrics_daily
  • Destination: wmde.wlb_commons_video_metrics_daily
  • DAG ID: wlb_commons_video_metrics_monthly
  • Destination: wmde.wlb_commons_video_metrics_monthly

Test outputs

Destination table summary
  • andrewtavis_wmde.wlb_commons_video_metrics_monthly
month video_filename video_category sum_daily_unique_viewers
2025-04 string string bigint
Test screenshots
  • wlb_commons_video_metrics_daily
  • Note that I only ran it for a few days and then marked the rest of the month as successful so that the monthly DAG sensor would fire appropriately.

Screenshot_from_2025-07-03_12-56-16

  • wlb_commons_video_metrics_monthly

Screenshot_from_2025-07-03_12-56-29

Edited by Andrew McAllister (WMDE)

Merge request reports

Loading