- 15 Mar, 2022 3 commits
- 14 Mar, 2022 2 commits
- 10 Mar, 2022 8 commits
-
-
Ottomata authored
Fix bug in fsspec exists call See merge request repos/data-engineering/workflow_utils!8
-
Ottomata authored
-
Ottomata authored
-
Ottomata authored
-
Ottomata authored
Add a Gitlab CI pipeline. See merge request repos/data-engineering/workflow_utils!6
-
Gmodena authored
Adds a Gitlab CI pipeline to run tests, mypy and linting on python 3.7 and 3.9.
-
- 03 Mar, 2022 1 commit
-
-
Snwachukwu authored
Automate usage of fsspec hdfs URLs via new pyarrow HDFS API See merge request repos/data-engineering/workflow_utils!5
-
- 01 Mar, 2022 1 commit
-
-
Ottomata authored
- fsspec_use_new_pyarrow_api - call this to make fsspec always use new pyarrow API with all hdfs:// URLs. This is only needed until https://github.com/fsspec/filesystem_spec/issues/874 is resolved. - set_hadoop_env_vars - sets needed env vars to work with new pyarrow HDFS API. This is also called by fsspec_use_new_pyarrow_api() by default. https://phabricator.wikimedia.org/T300876
-
- 15 Feb, 2022 1 commit
-
-
Ottomata authored
-
- 08 Feb, 2022 1 commit
-
-
Ottomata authored
-
- 07 Feb, 2022 6 commits
-
-
Ottomata authored
-
Ottomata authored
-
Ottomata authored
-
Ottomata authored
-
Ottomata authored
There seems to be an issue with airflow and pyarrow, at least in SequentialExecutor, where a call to pyarrow in the scheduler can cause deadlock. However, when determining the cached url for an artifact, we'd like to be able to reference it without checking that it exists (which in HDFS would result in a call through pyarrow). This avoids that call if we don't care if the artifact exists at this stage.
-
Ottomata authored
There is a syntax bug in an older version
-
- 04 Feb, 2022 1 commit
-
-
Ottomata authored
-
- 02 Feb, 2022 1 commit
-
-
Ottomata authored
-
- 12 Jan, 2022 8 commits
- 11 Jan, 2022 4 commits
- 10 Jan, 2022 1 commit
-
-
Ottomata authored
-
- 06 Jan, 2022 2 commits