Ensure SPARK_HOME=/usr/lib/spark3
Our previous SPARK_HOME value was /tmp/pyenv/versions/3.10.15/lib/python3.10/site-packages/pyspark,
and I have a hunch that it was causing issues when submitting Skein jobs
launching spark jobs, as some utility function in airlfow-dags was
propagating that environment variable to the hadoop worker executing the
skein job.
Looking at the YARN/skein logs, I was seeing:
LogContents:
.skein.sh: line 1: spark-submit: command not found
brouberol@an-test-client1002:~$ SPARK_HOME=/usr/lib/spark3
brouberol@an-test-client1002:~$ ls $SPARK_HOME/bin/spark-submit
/usr/lib/spark3/bin/spark-submit
I think that having all parties agreeing on a common value for
SPARK_HOME might help.
Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T364389