Skip to content

Filter abbreviations list for each wikiproject

Appledora requested to merge 10-filter-performance-analysis into main
  • notebook to generate abbreviation filtering metadata (FREQUENCY_THRESHOLD > 10).
  • Scaling for all wiki-projects (319)
  • filtered abbreviations csv files for each language (LIKELIHOOD_RATIO > 0.6)
  • updated naive sentence segmentation code for the language-specific filtered abbreviations list
  • updated benchmarking code for naive sentence segmentation (to accomodate language-specific abbr lists)
  • upgraded the .pre-commit-config.yaml due to flake8 version conflict
  • removed the old evaluation results, as they don't make much sense for the current implementation
  • uploaded (TO BE) JSON abbreviations list

Closes #10 (closed)

Edited by Appledora

Merge request reports