Implement parsing of “instance of” fields in ImageMatching production datasets

Gmodena requested to merge github/fork/clarakosi/T277555-parse-instancof into main

Created by: clarakosi

The spark job we use to generate production datasets needs to parse the new "instance of fields"

Acceptance criteria

  • Logic to parse the "instance of" json blob is implemented
  • Tests for this capability have been added
  • The number of articles with and without valid "instance of" metadata is known (add metric)

