Add spark-submit boilerplate.
This MR adds boilerplate to configure spark-submit for Java based jobs.
This change is meant to simplify submitting spark jobs, deployed in cluster mode, from airflow tasks. It is carried out in the context of T296758.
A new SparkTask
dataclass has been added to our dag template and factory, that wraps spark-submit
in a BashOperator
airflow op.
Example
The spark-submit command for the canonical SparkPi demo application can be configured as
config = SparkConfig()
task = SparkTask(config=config,
main="org.apache.spark.examples.SparkPi",
application_jar="spark-examples_2.11-2.4.5.jar",
main_args="5")
airflow_op = task.operator()