Don't propagate the container SPARK_HOME to the hadoop workers
When attempting to migrate the airflow-analytics-test
scheduler to
Kubernetes, I've encountered some
errors
when executing Skein jobs.
The skein spec contains a spark-submit
command with --conf spark.executorEnv.SPARK_HOME=/tmp/pyenv/versions/3.10.15/lib/python3.10/site-packages/pyspark
,
which will fail on the hadoop worker side, as this path does not exist
there. It only exists in our airflow container, as Blubber won't let us
install anything under /usr/lib
.
By detecting that the job is running in Kubernetes, we sidestep the wrongful propagation of this environment variable to Skein spec.
Bug: T364389 Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org