Filter abbreviations list for each wikiproject
- notebook to generate abbreviation filtering metadata (
FREQUENCY_THRESHOLD > 10
). - Scaling for all wiki-projects (
319
) - filtered abbreviations csv files for each language (
LIKELIHOOD_RATIO > 0.6
) - updated naive sentence segmentation code for the language-specific filtered abbreviations list
- updated benchmarking code for naive sentence segmentation (to accomodate language-specific abbr lists)
- upgraded the
.pre-commit-config.yaml
due to flake8 version conflict - removed the old evaluation results, as they don't make much sense for the current implementation
- uploaded (TO BE) JSON abbreviations list
Closes #10 (closed)