Initial draft of refactoring efforts

Gmodena requested to merge github/fork/clarakosi/refactoring into main

Created by: clarakosi

Changes:

  • Adds spark udf
  • Modifies schema for top_candidates column to now view null image suggestions as an empty array
  • Saves output as parquet in hdfs

Does not enable cluster mode in spark because it does not appear to be possible with jupyter notebooks

Merge request reports