Bump mwtokenizer version
Bump from 0.1.0 to 0.2.0.
Changes:
- Non whitespace languages output "▁" in place of " " as per sentencepiece. We replace "▁" with " " in the tokenizer itself, so end users don't have to.
- Separate spaces and punctuations as separate tokens for non whitespace languages, as is done for whitespace languages.