Add common conf args to kwargs of SparkSubmitOperator
spark.dynamicAllocation.maxExecutors is perhaps the most common spark conf value to need to set within a SparkSubmitOperator. This ends up a bit tedious because passing conf= as a kwarg prevents airflow from providing the conf value that was provided to the DAG default_args. This ends up in boilerplate-ish code that looks like:
SparkSubmitOperator(
...
conf=dict(dag_defaults['conf'], **{
'spark.dynamicAllocation.maxExecutors': 42,
}),
)
Make things simpler by adding a max_executors kwarg that adds to the existing kwargs instead of overriding. This also provides the benefit that it's much harder to typo. A conf key that is typo'd will not raise any errors at test or runtime, but a typo'd kwarg will fail when the operator is created.
While here add a few other common overrides:
- max_executors -> spark.dynamicAllocation.maxExecutors
- sql_shuffle_partitions -> spark.sql.shuffle.partitions
- executor_memory_overhead -> spark.executor.memoryOverhead
- driver_memory_overhead -> spark.driver.memoryOverhead