Skip to content

Add common conf args to kwargs of SparkSubmitOperator

Ebernhardson requested to merge work/ebernhardson/spark-submit-max-executors into main

spark.dynamicAllocation.maxExecutors is perhaps the most common spark conf value to need to set within a SparkSubmitOperator. This ends up a bit tedious because passing conf= as a kwarg prevents airflow from providing the conf value that was provided to the DAG default_args. This ends up in boilerplate-ish code that looks like:

  SparkSubmitOperator(
      ...
      conf=dict(dag_defaults['conf'], **{
          'spark.dynamicAllocation.maxExecutors': 42,
      }),
  )

Make things simpler by adding a max_executors kwarg that adds to the existing kwargs instead of overriding. This also provides the benefit that it's much harder to typo. A conf key that is typo'd will not raise any errors at test or runtime, but a typo'd kwarg will fail when the operator is created.

While here add a few other common overrides:

  • max_executors -> spark.dynamicAllocation.maxExecutors
  • sql_shuffle_partitions -> spark.sql.shuffle.partitions
  • executor_memory_overhead -> spark.executor.memoryOverhead
  • driver_memory_overhead -> spark.driver.memoryOverhead
Edited by Ebernhardson

Merge request reports