take into account abbreviations
- compile list of abbreviations from wiktionary
- add feature to avoid sentence-splits with abbreviations
- Build a global or language-specific list from wiktionary but we can collaborate on this together via PySpark Jupyter notebook running against our cluster.