Explore GitLab
Discover projects, groups and snippets. Share your projects with others
-
PySpark jobs for the image suggestions data pipeline.
Updated -
Collection of data engineering DAGs to be executed by the WMF Airflow instances.
Updated -
-
-
-
Updated
-
Updated
-
Updated
-
Updated
-
In the heart of the Section topics data pipeline lies a set of Spark jobs.
Updated -
This project takes care of creating the docker image required to run the RDF Streaming Updater. The RDF Streaming Updater is a flink job that replicates updates made to wikidata and commons into blazegraph.
Updated -
-
Updated
-
Updated
-
A container registry implemented in Python which supports automatic online garbage collection of unreferenced blobs.
Updated -
Updated