Skip to content

wmf_airflow_common: add drop_older_than utility method.

Gmodena requested to merge add-python-executor into main

Adds a decorator to execute python scripts by wrapping BashOperator.

This approach should allow us to swap in SkeinOperator once refinery python code will be extracted to a standalone module

Adds a utility method to delete partitions older than a given threshold. This method wraps an existing refinery script used to purge data.


Why Bash Scripts

refinery implements extensive logic for purging old data.
However, this logic is implemented as a set of Python-based CLI scripts, which cannot be easily executed via a Python operator. The need for drop-older-than capabilities emerged as part of T379024, but a broader refinery refactoring is out of scope.

These scripts reside in the refinery mono-repo as shell scripts (not Python modules). Currently, there is no way to bundle them into a Conda environment, ruling out the use of the skein operator.

See this slack thread for more details: https://wikimedia.slack.com/archives/CSV483812/p1732276680848229

Script Execution

The search team implemented a wrapper to execute drop-older-than scripts in drop_old_data_daily. This merge request (MR) builds upon that code and promotes it to a public namespace: wmf_airflow_common.operator.python.

The original code already uses a decorator pattern and has been refactored here to make it generally applicable to any Python script, with additional Pythonic syntactic sugar.


Example

A usage example for this method can be found here.

Edited by Gmodena

Merge request reports