Skip to content

T275165 dataset metrics

Gmodena requested to merge github/fork/clarakosi/T275165_dataset_metrics into main

Created by: clarakosi

Acceptance Criteria

As an PET Data Engineer, I want the ability to generate a csv file with the following metrics, so that I can have a baseline of how the pipeline performs.

  • Total number of records (per wiki)
  • Total number of images per page
    • Per Wiki
  • Summary of population statistics
  • Size and counts of intermediate and final datasets

A better look at the python notebook here:

Merge request reports