Creates an artifact, a packed conda environment, to be deployed across the data engineering cluster. Currently contains Pyspark3 and JupyterHub.
Experimental configurable generic Flink pipelines
Differentially private analytics with Apache Spark (demo)
This is a Work In Progress fork of https://github.com/banzaicloud/spark-metrics/ modified to meet WMF conventions.
Experiments with PrometheusSink and PushGateway
PoC for consuming revision and android interaction streams