Sentence break regex doesn't work for non-whitespace languages
It expects 1+ space characters after the sentence-ending punctuation, which doesn't happen in non-whitespace languages. Effectively it just splits on paragraphs. This one is tricky -- we could leave the regex as-is for whitespace-delimited languages and remove the whitespace requirement for non-whitespace languages. But not sure if that'll cause false-positives as terminating punctuation appears mid-sentence effectively -- e.g., decimal points in numbers.