Initial draft of refactoring efforts
Created by: clarakosi
Changes:
- Adds spark udf
- Modifies schema for top_candidates column to now view null image suggestions as an empty array
- Saves output as parquet in hdfs
Does not enable cluster mode in spark because it does not appear to be possible with jupyter notebooks