Skip to content

T275165 dataset metrics

Gmodena requested to merge github/fork/clarakosi/T275165_dataset_metrics into main

Created by: clarakosi

Acceptance Criteria

As an PET Data Engineer, I want the ability to generate a csv file with the following metrics, so that I can have a baseline of how the pipeline performs.

  • Total number of records (per wiki)
  • Total number of images per page
    • Per Wiki
  • Summary of population statistics
  • Size and counts of intermediate and final datasets

A better look at the python notebook here: https://github.com/mirrys/ImageMatching/blob/f34ff48e430b0e83261f45fd754ee6f351db959f/Dataset_Metrics/Dataset_metrics.ipynb

Merge request reports