Split plaintext by sections and paragraphs
Splitting on sections is easy but we'll want to identify all the different HTML elements that indicate a new paragraph (new line) so that we can return a more structured plaintext result. This will include the <p>
tags but also list items and likely other types of new HTML nodes. This will provide better support for people who e.g., only want the first paragraph of the article or want to break it into chunks for input into language models.