platform-airflow-dags merge requestshttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests2021-12-23T19:43:37Zhttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/27Add cookiecutter_replay to gitignore.2021-12-23T19:43:37ZGmodenaAdd cookiecutter_replay to gitignore.GmodenaGmodenahttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/28Add Gitlab CI pipeline config.2022-02-14T13:26:38ZGmodenaAdd Gitlab CI pipeline config.This MR is prep work for migrating this repo to the [Generated Data Platform org](https://phabricator.wikimedia.org/T300734).
Once the repo is moved over, we'll be able to run CI using Gitlab runners and won't need to fallback to Github...This MR is prep work for migrating this repo to the [Generated Data Platform org](https://phabricator.wikimedia.org/T300734).
Once the repo is moved over, we'll be able to run CI using Gitlab runners and won't need to fallback to Github Action workflows.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/5Add mypy checks.2021-11-09T09:43:45ZGmodenaAdd mypy checks.This PR integrates mypy checks for the `image-matching` project as as described in
[[SPIKE] Investigate Different CI Checks](https://phabricator.wikimedia.org/T293382).
Checks can be triggered via `make mypy`, and have been added to th...This PR integrates mypy checks for the `image-matching` project as as described in
[[SPIKE] Investigate Different CI Checks](https://phabricator.wikimedia.org/T293382).
Checks can be triggered via `make mypy`, and have been added to the gitlab pipeline config.
Some (minor) fixes to error detected during the implementations of these checks are included.
## What changes with this PR
ImageMatching Spark pipelines had already type annotations. This PR add some initial integration with mypy
to enforce type checking at project build time.
This PR contains some fixes to type errors, and false positives, discovered during the integration.
Before:
```
$ mypy spark
spark/transform.py:5: error: Cannot find implementation or library stub for module named "schema"
spark/transform.py:6: error: Cannot find implementation or library stub for module named "instances_to_filter"
spark/search_table.py:34: error: "Column" not callable
spark/raw2parquet.py:3: error: Cannot find implementation or library stub for module named "schema"
spark/raw2parquet.py:3: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
spark/raw2parquet.py:32: error: Argument "header" to "options" of "DataFrameReader" has incompatible type "bool";
expected "str"
```
After:
```
$ mypy spark/
Success: no issues found in 6 source files
```GmodenaGmodenahttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/20Add sample-project to deploy targets.2021-12-23T14:35:33ZGmodenaAdd sample-project to deploy targets.Adds sample-project to the list of
deployable pipelines.Adds sample-project to the list of
deployable pipelines.Luke BowmakerLuke Bowmakerhttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/6Add tests for airflow dags2021-11-09T09:43:59ZGmodenaAdd tests for airflow dagsThis PR adds the capability to test Airflow DAG instances via `pytest`, as described in
[[SPIKE] Investigate Different CI Checks](https://phabricator.wikimedia.org/T293382).
### What changes with this PR
A new test suite for DAG integ...This PR adds the capability to test Airflow DAG instances via `pytest`, as described in
[[SPIKE] Investigate Different CI Checks](https://phabricator.wikimedia.org/T293382).
### What changes with this PR
A new test suite for DAG integrity testing has been added.
## Integrity testing
The `tests/dags/test_dag_integrity.py` suite performs validation of all DAG modules in the repo; it follows a recommended validation practice described in https://www.astronomer.io/guides/testing-airflow#dag-validation-testing:
_ensure that your DAG objects are defined correctly, acyclic, and free from import errors_.
The suite can be triggered via: `make test_dags`.
Note: `dags/ima.py` contains side effects that cause the test to (correctly) fail. We should get rid of in e.g. T292740. We could `xfails()` or `skip()` the suite before merging.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/4archive should run natively in gitlab2021-10-27T09:05:57ZGmodenaarchive should run natively in gitlabhttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/1Bump pyyaml from 5.3.1 to 5.4 in /similar-users2021-08-06T13:29:09ZGmodenaBump pyyaml from 5.3.1 to 5.4 in /similar-users*Created by: dependabot[bot]*
Bumps [pyyaml](https://github.com/yaml/pyyaml) from 5.3.1 to 5.4.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/yaml/pyyaml/blob/master/CHANGES">pyyaml's changelog</...*Created by: dependabot[bot]*
Bumps [pyyaml](https://github.com/yaml/pyyaml) from 5.3.1 to 5.4.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/yaml/pyyaml/blob/master/CHANGES">pyyaml's changelog</a>.</em></p>
<blockquote>
<p>5.4 (2021-01-19)</p>
<ul>
<li><a href="https://github-redirect.dependabot.com/yaml/pyyaml/pull/407">yaml/pyyaml#407</a> -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA</li>
<li><a href="https://github-redirect.dependabot.com/yaml/pyyaml/pull/472">yaml/pyyaml#472</a> -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader</li>
<li><a href="https://github-redirect.dependabot.com/yaml/pyyaml/pull/441">yaml/pyyaml#441</a> -- Fix memory leak in implicit resolver setup</li>
<li><a href="https://github-redirect.dependabot.com/yaml/pyyaml/pull/392">yaml/pyyaml#392</a> -- Fix py2 copy support for timezone objects</li>
<li><a href="https://github-redirect.dependabot.com/yaml/pyyaml/pull/378">yaml/pyyaml#378</a> -- Fix compatibility with Jython</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/yaml/pyyaml/commit/58d0cb7ee09954c67fabfbd714c5673b03e7a9e1"><code>58d0cb7</code></a> 5.4 release</li>
<li><a href="https://github.com/yaml/pyyaml/commit/a60f7a19c0b418fe95fcf2ec0957005ae39e1090"><code>a60f7a1</code></a> Fix compatibility with Jython</li>
<li><a href="https://github.com/yaml/pyyaml/commit/ee98abd7d7bd2ca9c7b98aa19164fd0306a3f3d2"><code>ee98abd</code></a> Run CI on PR base branch changes</li>
<li><a href="https://github.com/yaml/pyyaml/commit/ddf20330be1fae8813b8ce1789c48f244746d252"><code>ddf2033</code></a> constructor.timezone: _<em>copy</em> & <strong>deepcopy</strong></li>
<li><a href="https://github.com/yaml/pyyaml/commit/fc914d52c43f499224f7fb4c2d4c47623adc5b33"><code>fc914d5</code></a> Avoid repeatedly appending to yaml_implicit_resolvers</li>
<li><a href="https://github.com/yaml/pyyaml/commit/a001f2782501ad2d24986959f0239a354675f9dc"><code>a001f27</code></a> Fix for CVE-2020-14343</li>
<li><a href="https://github.com/yaml/pyyaml/commit/fe150624146ee631bb0f95e45731e8b01281fed6"><code>fe15062</code></a> Add 3.9 to appveyor file for completeness sake</li>
<li><a href="https://github.com/yaml/pyyaml/commit/1e1c7fb7c09e9149967c208a6fd07276a6140d57"><code>1e1c7fb</code></a> Add a newline character to end of pyproject.toml</li>
<li><a href="https://github.com/yaml/pyyaml/commit/0b6b7d61719fbe0a11f0980489f1bf8ce746c164"><code>0b6b7d6</code></a> Start sentences and phrases for capital letters</li>
<li><a href="https://github.com/yaml/pyyaml/commit/c97691596eec279ef9191a9b3bba583a17139d5a"><code>c976915</code></a> Shell code improvements</li>
<li>Additional commits viewable in <a href="https://github.com/yaml/pyyaml/compare/5.3.1...5.4">compare view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyyaml&package-manager=pip&previous-version=5.3.1&new-version=5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/gmodena/wmf-platform-airflow-dags/network/alerts).
</details>https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/10Ci demo cleanup2021-11-11T08:31:58ZGmodenaCi demo cleanuphttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/33Deprecate and archive repo2022-02-17T22:45:18ZGmodenaDeprecate and archive repohttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/14Fix deploy-local-build target2021-11-23T19:48:34ZGmodenaFix deploy-local-build targetFixes name clashes that were causing the wrong targets to be executed, breaking import paths.Fixes name clashes that were causing the wrong targets to be executed, breaking import paths.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/15Fix tar archive generation.2021-11-24T20:55:52ZGmodenaFix tar archive generation.Fixes an exclude pattern in tar,
that was resulting in empty archives.
Clean up dags and project dirs on the target airflow
instance before copying files over.
Adds comments to deployment commands.Fixes an exclude pattern in tar,
that was resulting in empty archives.
Clean up dags and project dirs on the target airflow
instance before copying files over.
Adds comments to deployment commands.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/12Fix typo in README.md2021-11-11T14:18:37ZGmodenaFix typo in README.mdhttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/17Install openjdk in base image2021-12-15T20:11:23ZGmodenaInstall openjdk in base imageThis is a workaround for
https://phabricator.wikimedia.org/T297782.This is a workaround for
https://phabricator.wikimedia.org/T297782.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/18Install openjdk in Github action.2021-12-17T14:09:50ZGmodenaInstall openjdk in Github action.Conda vendored openjdk shows flaky
behaviour with the rest of the build
pipeline.
This change installs adopt openjdk
directly on the host system.Conda vendored openjdk shows flaky
behaviour with the rest of the build
pipeline.
This change installs adopt openjdk
directly on the host system.ClarakosiClarakosihttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/26Make task id unique and input/output optional.2022-02-07T16:57:34ZGmodenaMake task id unique and input/output optional.Unique task id are necessary to support dynamic dag
creation. Optional input/output path are not ideal
but useful to help backport spark scripts with different
parametrisation requirements.Unique task id are necessary to support dynamic dag
creation. Optional input/output path are not ideal
but useful to help backport spark scripts with different
parametrisation requirements.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/31Make task_id configurable2022-02-10T21:45:08ZGmodenaMake task_id configurableThis MR adds an optional `task_id` field to `Task`, so that clients can overwrite defaults.
This sort of rolls back MR#26. We introduced unique `task_id`s to support dynamic dag creation, but the implementation based on uuid broke the c...This MR adds an optional `task_id` field to `Task`, so that clients can overwrite defaults.
This sort of rolls back MR#26. We introduced unique `task_id`s to support dynamic dag creation, but the implementation based on uuid broke the cli and ui utilities.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/13Modify pipeline to be compatible with algo_v2 changes2021-12-13T17:28:30ZClarakosiModify pipeline to be compatible with algo_v2 changes* Modified spark scripts and sql tables to be consistent with algo_v2
changes
* Added search table hql
* Updated tests to be compatible with algo changes
* Updated ima dag to no longer use raw data file sensor and uploader* Modified spark scripts and sql tables to be consistent with algo_v2
changes
* Added search table hql
* Updated tests to be compatible with algo changes
* Updated ima dag to no longer use raw data file sensor and uploaderhttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/29Move pypi extra index to requirements.2022-02-14T13:25:51ZGmodenaMove pypi extra index to requirements.Gitlab pypi indexes should be configured
in a project's requirements file.Gitlab pypi indexes should be configured
in a project's requirements file.https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/21Normalise path layout in factories and cookiecutter2021-12-22T14:17:04ZGmodenaNormalise path layout in factories and cookiecutterThis MR fixes some inconsistencies between dags boilerplate and the
cookiecutter template:
* the expected venv location has moved one level up in the deployed project home
* project dir (cookiecutter config) is not part of pipelines hom...This MR fixes some inconsistencies between dags boilerplate and the
cookiecutter template:
* the expected venv location has moved one level up in the deployed project home
* project dir (cookiecutter config) is not part of pipelines home; we let boilerplate chain dirs together.GmodenaGmodenahttps://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/25Onboard sample-project data pipeline2021-12-23T15:17:31ZGmodenaOnboard sample-project data pipelineThis merge request adds a new `sample-project` data pipeline
to the Generated Data Platform portfolio and deployment targets.This merge request adds a new `sample-project` data pipeline
to the Generated Data Platform portfolio and deployment targets.Luke BowmakerLuke Bowmaker