Armenian has low sentence performance due to use of standard colon in Flores data
The library correctly handles the Armenian full stop (\u0589 in https://www.unicode.org/charts/PDF/U0530.pdf which is ։
) but the Flores data uses a normal colon most of the time, which looks similar but obviously breaks our approach. Options:
- Document: Just caveat the "low" performance in a README somewhere so folks are aware.
- Cover-up: Convert the colons to the official character in our dataset. I'd hesitate about this though because this usage of standard colons might appear in other language data.
- Fix: Make language-specific exceptions like colons for Armenian (presumably there are other languages where this sort of thing happens). Not sure how complicated this would be though.