Add extract_embeddings cli

A job that:

  • parses enterprise structured content
  • extract passages
  • fix section names using cirrus dumps
  • extract embeddings using spark-nlp
  • attempt to re-use pre-computed embeddings from past partition

Bug: T414070

Edited by DCausse

Merge request reports

Loading