Sentence Tokenization: language specific sentence joiner
In src/wikinlptools/benchmarking/bmark_sentence.py
, instead of concatenating strings with a whitespace for all languages, we'll eventually want to replace this " "
space joiner with a language-specific joiner -- e.g., something like delimiters.get(language, " ")
where delimiters is a list we build of languages that don't use whitespace after full stops.