Creates an artifact, a packed conda environment, to be deployed across the data engineering cluster. Currently contains Pyspark3 and JupyterHub.
This is a Work In Progress fork of https://github.com/banzaicloud/spark-metrics/ modified to meet WMF conventions.
Experimental configurable generic Flink pipelines
Differentially private analytics with Apache Spark (demo)
PoC for consuming revision and android interaction streams
Spark metrics related custom classes and sinks (e.g. Prometheus)
Experiments with PrometheusSink and PushGateway