T404196 Adding DAGs that lead to measurements of WD entity usage
Contributor checklist
-
I have written tests for this DAG that will be merged into data-engineering/airflow-dags/tests/wmde -
I have locally ran the above tests and code quality checks as outlined in the tests section of the Airflow DAGs project readme -
I have tested the jobs for this DAG in my local database using the process defined in: - wmde/analytics/hql/airflow_jobs/wd_entity_usage/_test_monthly
- wmde/analytics/hql/airflow_jobs/wiki_page_wd_entity_usage/_test_monthly
- Note: Job queries were tested separately from the DAG and did insert the needed data.
-
I have tested the included DAGs using the process outlined in TEST_AIRFLOW_DAGS.md and the test variable files provided for each DAG - Note: Sensors are firing, and for some reason I'm still not able to run the
SQLOpertorin the dev Airflow instance to run. This shouldn't be a permission thing as the Airflow user is my personal user, notanalytics-wmde. We'll be good to merge though.
- Note: Sensors are firing, and for some reason I'm still not able to run the
-
All Hive tables that are needed by the included DAG jobs have been created and are accessible by the analytics-wmdeAirflow user -
All changes from the mainbranch have been rebased into this branch
Description
-
T404196: The following three DAGs calculate: The usage of Wikidata entities on pages by wiki; the usage of Wikidata entities across all wikis; and the number of pages using Wikidata entities by wiki.
- DAG ID:
wd_entity_usage_by_wiki_monthly_dag - Destination:
wmde.wd_entity_usage_by_wiki_monthly - DAG ID:
wd_entity_usage_distinct_monthly_dag - Destination:
wmde.wd_entity_usage_distinct_monthly - DAG ID:
wiki_page_wd_entity_usage_monthly_dag - Destination:
wmde.wiki_page_wd_entity_usage_monthly
- DAG ID:
Note: I'm including a DAG for wiki_wd_infobox_usage that will be used once one of the underlying tables is available. This DAG will not be tested or activated as a part of this MR, and as of now is broken due to lacking a solution for running a monthly DAG using a dataset that's partitioned weekly.
Test outputs
Destination table summary
wmde.wd_entity_usage_by_wiki_monthly
| month | wiki | total_wd_entities_used | total_distinct_wd_entities_used |
|---|---|---|---|
| DATE | STRING | BIGINT | BIGINT |
wmde.wd_entity_usage_distinct_monthly
| month | total_wd_entities_used | total_distinct_wd_entities_used |
|---|---|---|
| DATE | BIGINT | BIGINT |
wmde.wiki_page_wd_entity_usage
| month | wiki | total_pages_using_wd_entities |
|---|---|---|
| DATE | STRING | BIGINT |


