Commit e44f8b8e authored by Gmodena's avatar Gmodena
Browse files

Merge branch 'T275162-enable-spark-metrics-collection' of...

Merge branch 'T275162-enable-spark-metrics-collection' of github.com:gmodena/ImageMatching into T275162-enable-spark-metrics-collection
parents 2d706e6e 2554f0cb
......@@ -55,7 +55,7 @@ spark2-submit etl/transform.py <raw data> <production data>
`conf/spark.properties` provides default settings to run the ETL as a [regular size spark job](https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#Spark_Resource_Settings) on WMF's Analytics cluster.
```python
spark2-submit --properties-file etl/transform.py <raw data> <production data>
spark2-submit --properties-file conf/spark.properties etl/transform.py <raw data> <production data>
```
## Metrics collection
......@@ -66,7 +66,7 @@ metrics files, that outpus to the driver and executors stdout, can be found at `
The easiest way to do it by setting `PYSPARK_SUBMISSION_ARGS`. For example
```bash
export PYSPARK_SUBMIT_ARGS="--files ./conf/metrics.properties --conf spark.metrics.conf=metrics.properties pyspark-shell"
python3 algorunner.py 2020-12-28 hywiki Outputpython3 algorunner.py 2020-12-28 hywiki Output
python3 algorunner.py 2020-12-28 hywiki Output
```
Will submit the `algorunner` job, with additional instrumentation.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment