Skip to content

T277552 project jdata store as parquet

Gmodena requested to merge T277552-project-jdata-store-as-parquet into main

Created by: gmodena

Project instanceof metadata in the model output.

This PR adds change to export metadata related to the "instance of" property of a q item. This information stored as an appended to column to the model output, and propagates to HDFS and Hive imagerec datasets.

This PR also adds a spark job to upload and convert model output to Parquet. This has been done to facilitate interop with spark, and handle schema migrations.

Closes: https://phabricator.wikimedia.org/T277552

Merge request reports