Skip to content

Safely quote spark args in skein script

Ebernhardson requested to merge work/ebernhardson/spark-skein-safe-arguments into main

The skein script created for running spark scripts wasn't quoting the arguments. This prevents special characters such as | from being used, as they get interpreted by the shell rather than passed to the application.

Changes to existing dags were evaluated with the new skein-spec fixtures. Broadly three classes of changes happened and can be see in the commit:

  • The empty application arg in SparkSqlOperator is now quoted and passed as an empty arg. Testing shows spark is fine with an empty argument provided here.

  • The files argument to spark where a # is provided are now quoted. This is not strictly required, bash ignores # in the middle of a string, only treating it as a comment when preceded by a space. But python's shlex.quote quotes this out of safety

  • Changed quoting on existing dags. shlex quotes with single quotes, but some spots were quoting with double quotes. This leads to slightly different outputs with the same results. Similarly dags in search were not quoting yet (waiting for this fix) and now see corrected quoting.

Edited by Ebernhardson

Merge request reports