How to register an account on GitLab. To prevent spam, new accounts are locked until approved by an admin or the approver bot. You can also file an unlock request to expedite access.

Support: mw:GitLab, how to host a project on GitLab, #wikimedia-gitlab on libera.chat, #GitLab on Phabricator.

Packaging: add sentence segmentation tests

This will help with unit-testing and also provide a simple way for us to play with the code locally without having to set up a special environment. A few edge cases I can think of right now for sentence_tokenization_naive (we could also separately test pre_processing and the global split pattern but I think okay to just "test" that as part of the fulller tokenization function):

empty: ''
just whitespace: ' '
sentence with no text: ' . '
sentence w/o end punctuation: 'This is a sentence'
well-formed sentences: 'This is a sentence. And another.'
sentences w/ abbreviation: 'This is Q.E.D. a sentence. And another.'