Add enterprise structured content chunker

Just a small class that parses, flatten and trim the structured content docs. It's not connected to anything yet but seemed isolated enough from the rest that a dedicated commit could make sense. The trimming strategy is taken from early expiriment where the following criteria were applied:

  • max passage length: 2800
  • min passage length: 40
  • max list size: 10
  • max number of passage: 150

Bug: T414070

Merge request reports

Loading