Skip to content

Add the spark3 yarn shuffler jar to conda-analytics

Btullis requested to merge add_spark3_shuffler_jar into main

We need to provide the correct jar file to each of the hadoop workers, so that this can be made available to the yarn resource manager.

Previously we used a binary distribution of spark2, which contained the jar faile and distributed that with the spark2 package. However our version of spark3 is based on pyspark, which doesn't contain the jar.

This commit adds an execution step to the build process for our conda-analytics environment, which downloads the required jar file from Maven Central. It verifies that the published checksum for version 3.1.2 of the shuffler jar matches the downloaded file.

Bug: T332765

Merge request reports