Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • W Wiki NLP Tools
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 12
    • Issues 12
    • List
    • Boards
    • Service Desk
    • Milestones
  • Custom issue tracker
    • Custom issue tracker
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • repos
  • research
  • Wiki NLP Tools
  • Merge requests
  • Open 1
  • Merged 14
  • Closed 1
  • All 16
Subscribe to RSS feed
  • Resolve "Sentence Tokenizer: add FLORES dataset for benchmarking"
    !15 · created May 06, 2023 by Appledora   review
    • MERGED
    • 11
    updated May 24, 2023
  • Resolve "Word-tokenization evaluation methodology and datasets"
    !13 · created Apr 16, 2023 by Appledora   high priority ongoing
    • MERGED
    • 37
    updated May 09, 2023
  • Resolve "Update: Migrate to Spark3"
    !14 · created Apr 28, 2023 by Appledora   ongoing
    • MERGED
    • 1
    updated May 01, 2023
  • Resolve "Evaluation: update and modify the sentence evaluation code"
    !12 · created Apr 04, 2023 by Appledora   high priority ongoing
    • MERGED
    • 18
    updated Apr 10, 2023
  • Resolve "train sentencepiece for sample language clusters"
    !7 · created Feb 01, 2023 by Appledora   ongoing
    • MERGED
    • Approved
    • 51
    updated Mar 31, 2023
  • Resolve "Word Tokenization: add abbreviation post-processing"
    !11 · created Mar 25, 2023 by Appledora   low priority ongoing
    • MERGED
    • Approved
    • 5
    updated Mar 29, 2023
  • feature: benchmarking module and deterministic benchmarking
    !3 · created Nov 06, 2022 by Appledora   review
    • MERGED
    • 27
    updated Mar 25, 2023
  • Resolve "Tokenization: Restructure tokenizer code into class"
    !9 · created Feb 21, 2023 by Appledora   high priority ongoing
    • MERGED
    • Approved
    • 32
    updated Mar 20, 2023
  • Resolve "Packaging: set up repo for packaging"
    !10 · created Mar 06, 2023 by Appledora   high priority ongoing
    • MERGED
    • 1
    updated Mar 15, 2023
  • Resolve "Word Tokenization: Whitespace-delimited languages"
    !8 · created Feb 15, 2023 by Appledora   high priority
    • MERGED
    • 31
    updated Mar 08, 2023
  • Filter abbreviations list for each wikiproject
    !6 · created Jan 03, 2023 by Appledora   ongoing
    • MERGED
    • 47
    updated Feb 16, 2023
  • Resolve "take into account abbreviations"
    !5 · created Nov 24, 2022 by Appledora   ongoing
    • MERGED
    • Approved
    • 73
    updated Dec 09, 2022
  • Resolve "Naive Rule-based Sentence Segmenter"   3 of 3 checklist items completed
    !2 · created Oct 27, 2022 by Appledora   ongoing
    • MERGED
    • Approved
    • 30
    updated Nov 03, 2022
  • feature: the basic structure of the project   7 of 7 checklist items completed
    !1 · created Sep 29, 2022 by Appledora
    • MERGED
    • 3
    updated Oct 05, 2022