Commit e44f8b8e authored by Gmodena's avatar Gmodena
Browse files

Merge branch 'T275162-enable-spark-metrics-collection' of...

Merge branch 'T275162-enable-spark-metrics-collection' of into T275162-enable-spark-metrics-collection
parents 2d706e6e 2554f0cb
......@@ -55,7 +55,7 @@ spark2-submit etl/ <raw data> <production data>
`conf/` provides default settings to run the ETL as a [regular size spark job]( on WMF's Analytics cluster.
spark2-submit --properties-file etl/ <raw data> <production data>
spark2-submit --properties-file conf/ etl/ <raw data> <production data>
## Metrics collection
......@@ -66,7 +66,7 @@ metrics files, that outpus to the driver and executors stdout, can be found at `
The easiest way to do it by setting `PYSPARK_SUBMISSION_ARGS`. For example
export PYSPARK_SUBMIT_ARGS="--files ./conf/ --conf pyspark-shell"
python3 2020-12-28 hywiki Outputpython3 2020-12-28 hywiki Output
python3 2020-12-28 hywiki Output
Will submit the `algorunner` job, with additional instrumentation.
