Airflow DAGs merge requestshttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests2024-03-28T15:19:25Zhttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/635Bump spark memory for analytics cassandra top articles2024-03-28T15:19:25ZJoalBump spark memory for analytics cassandra top articlesThe job is failing due to lack of memory.The job is failing due to lack of memory.GmodenaGmodenahttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/634Initial commit to support a custom partition format2024-03-28T12:47:31ZAleksandar MastilovicInitial commit to support a custom partition formatAleksandar MastilovicAleksandar Mastilovichttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/633Short-circuit transfer_to_es DAGs in case convert_to_esbulk implies it2024-03-25T12:05:00ZPeter Fischertemp-email-for-oauth-pfischer@gitlab.localhostShort-circuit transfer_to_es DAGs in case convert_to_esbulk implies itI was not sure if we rely on reaching the empty `complete` task anywhere. If that's the case I'd have to rewrite the code so it relies on `BranchPythonOperator` instead of `ShortCircuitOperator`, [see docs](https://docs.astronomer.io/lea...I was not sure if we rely on reaching the empty `complete` task anywhere. If that's the case I'd have to rewrite the code so it relies on `BranchPythonOperator` instead of `ShortCircuitOperator`, [see docs](https://docs.astronomer.io/learn/airflow-branch-operator).
Bug: T358472https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/629Add Dag to Compute Mediawiki History Data Quality2024-03-28T09:56:15ZSnwachukwuAdd Dag to Compute Mediawiki History Data Quality1. Check data quality of mediawiki history.
2. Send Alert if alert email is given.
Bug: T3546921. Check data quality of mediawiki history.
2. Send Alert if alert email is given.
Bug: T354692SnwachukwuSnwachukwuhttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/625Modify README docker commands to work locally with new blubbler build.2024-03-08T16:06:50ZXcollazoModify README docker commands to work locally with new blubbler build.XcollazoXcollazohttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/623Fix canary for gitlab raw SQL reader to not depend on a personal Hive database.2024-03-07T19:24:33ZXcollazoFix canary for gitlab raw SQL reader to not depend on a personal Hive database.This is a fix due to email https://groups.google.com/a/wikimedia.org/g/platform-eng-alerts/c/51kveiYhqxQ:
> Try 6 out of 6
> Exception:
> SkeinHook Airflow SparkSkeinSubmitHook skein launcher test_generic_artifact_deployment_dag__do_hql...This is a fix due to email https://groups.google.com/a/wikimedia.org/g/platform-eng-alerts/c/51kveiYhqxQ:
> Try 6 out of 6
> Exception:
> SkeinHook Airflow SparkSkeinSubmitHook skein launcher test_generic_artifact_deployment_dag__do_hql__20240305 application_1707226456123_178183
> Log: Link
> Host: an-airflow1004.eqiad.wmnet
@btullis added you as reviewer as you're in opsweek.XcollazoXcollazohttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/610wikidiff external hive table2024-02-10T04:32:14ZFabian Kaelinwikidiff external hive table- Repair Wikidiff external hive table
- Proper external hive table handling- Repair Wikidiff external hive table
- Proper external hive table handlinghttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/583Draft: Periodically invoke a bunch of section-topics scripts2024-03-06T11:39:49ZMatthias MullieDraft: Periodically invoke a bunch of section-topics scriptsThe section-topics pipeline uses a bunch of input data
generated by a variety of scripts. This data should be
recomputed periodically to remain relevant.
Bug: T339129The section-topics pipeline uses a bunch of input data
generated by a variety of scripts. This data should be
recomputed periodically to remain relevant.
Bug: T339129Matthias MullieMatthias Mulliehttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/581Update statsd-exporter-mappings2024-01-15T13:21:46ZAquUpdate statsd-exporter-mappingsThe file in our docker directory should reflect the puppet configuration:
https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/profile/airflow.yaml
Bug: T343232The file in our docker directory should reflect the puppet configuration:
https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/profile/airflow.yaml
Bug: T343232https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/578T346278 airflow job that publishes the xml dumps2024-01-11T10:22:47ZJennifer EbeT346278 airflow job that publishes the xml dumpsJennifer EbeJennifer Ebehttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/571readme: fix docker run volumne mount2023-12-22T17:47:04ZGmodenareadme: fix docker run volumne mountMount local volumes from the absolute path,
instead of a relative one.
Fixes the following error reported by dockerd:
".: volume name is too short, names should be at least two alphanumeric characters."
cc / @joal @xcollazo @mforns @ottoMount local volumes from the absolute path,
instead of a relative one.
Fixes the following error reported by dockerd:
".: volume name is too short, names should be at least two alphanumeric characters."
cc / @joal @xcollazo @mforns @ottohttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/570Move swift_upload.py in refinery2023-12-21T13:52:09ZAquMove swift_upload.py in refineryAs we are removing the refinery oozie directory, we moved this util
script into the python folder.
Needs: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/983674/
Bug: T336739As we are removing the refinery oozie directory, we moved this util
script into the python folder.
Needs: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/983674/
Bug: T336739https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/548Add the sasl module to the pip environment2023-11-29T09:59:12ZBtullisAdd the sasl module to the pip environmentOur previous build had problems with the airflow-hive provider, because
it could not find the sasl python module.
Bug: T343232
Bug: T351621
Bug: T344602Our previous build had problems with the airflow-hive provider, because
it could not find the sasl python module.
Bug: T343232
Bug: T351621
Bug: T344602BtullisBtullishttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/530Add Wikitech link in pageview check email2023-10-27T14:17:22ZAquAdd Wikitech link in pageview check emailThe OpsWeek operator would like to know what to do with the alert email.The OpsWeek operator would like to know what to do with the alert email.https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/477Click stream: expand list of languages2023-08-14T15:53:38ZFabian KaelinClick stream: expand list of languagesExpand the list of languages to include for the click stream datasets. As discussed in https://phabricator.wikimedia.org/T289532#8266909.Expand the list of languages to include for the click stream datasets. As discussed in https://phabricator.wikimedia.org/T289532#8266909.https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/467Equity landscape DAG updates2023-07-27T20:15:13ZNmaphopheEquity landscape DAG updatesUpdate the DAGs to call the proper updated python packages.Update the DAGs to call the proper updated python packages.NmaphopheNmaphophehttps://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/458Fix for uploading deb pkg to Gitlab registry2023-09-21T14:13:28ZAquFix for uploading deb pkg to Gitlab registrySome env variables are necessary to send artifacts from the build. They were not sent to the buildctl process.Some env variables are necessary to send artifacts from the build. They were not sent to the buildctl process.https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/447Update conda env to make it Mac+Linux compatible2023-09-21T14:10:09ZAquUpdate conda env to make it Mac+Linux compatibleLoad python-graphviz from pip, not conda. If loaded from conda, some dependencies are platform specific, preventing the build of the environment on some platforms (Darwin, at least).Load python-graphviz from pip, not conda. If loaded from conda, some dependencies are platform specific, preventing the build of the environment on some platforms (Darwin, at least).https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/445Add diff output when running black from Gitlab-CI2023-09-21T14:06:26ZAquAdd diff output when running black from Gitlab-CIWithout it, the output in CI is only showing you the faulty file and not how black would have formatted it.Without it, the output in CI is only showing you the faulty file and not how black would have formatted it.https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/438Draft: Auto update canonical data database2023-06-20T15:46:33ZAquDraft: Auto update canonical data databaseCreates an Airflow job to refresh the canonical_data.table from its GitHub repository.
Bug: [T339928](https://phabricator.wikimedia.org/T339928)Creates an Airflow job to refresh the canonical_data.table from its GitHub repository.
Bug: [T339928](https://phabricator.wikimedia.org/T339928)