Skip to content

analytics_product/cx_suggestions_menu_interactions_daily rework BashOperator as PythonOperator

curl isn't available in the image, so we rely on the PythonOperator to send the http request, via requests. As the KubernetesExecutor makes sure to create a task pod in which all hadoop/spark config files are mounted, we can just subprocess to hdfs dfs and everything just works.

After having pulled this feature branch on the instance and run the first DAG task, we see that the file is in HDFS:

airflow@hadoop-shell-7c8dff47fb-6kxmp:/opt/airflow$ hdfs dfs -ls /wmf/tmp/analytics_product/content_translation
Found 1 items
-rw-r-----   3 analytics-product analytics-privatedata-users        462 2025-04-17 12:08 /wmf/tmp/analytics_product/content_translation/cx_exclude_users.csv

Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T386675

Merge request reports

Loading