wmf_airflow_common: add drop_older_than utility method.
Adds a decorator to execute python scripts by wrapping BashOperator.
This approach should allow us to swap in SkeinOperator once refinery python code will be extracted to a standalone module
Adds a utility method to delete partitions older than a given threshold. This method wraps an existing refinery script used to purge data.
Why Bash Scripts
refinery
implements extensive logic for purging old data.
However, this logic is implemented as a set of Python-based CLI scripts, which cannot be easily executed via a Python operator. The need for drop-older-than capabilities emerged as part of T379024, but a broader refinery
refactoring is out of scope.
These scripts reside in the refinery
mono-repo as shell scripts (not Python modules). Currently, there is no way to bundle them into a Conda environment, ruling out the use of the skein
operator.
https://wikimedia.slack.com/archives/CSV483812/p1732276680848229
See this slack thread for more details:Script Execution
The search team implemented a wrapper to execute drop-older-than scripts in drop_old_data_daily
. This merge request (MR) builds upon that code and promotes it to a public namespace: wmf_airflow_common.operator.python
.
The original code already uses a decorator pattern and has been refactored here to make it generally applicable to any Python script, with additional Pythonic syntactic sugar.
Example
A usage example for this method can be found here.
Bug: T379024