benchmark module and deterministic benchmarking for sentence segmantation
Benchmarking data into benchmarking directory. Can be Jupyter notebook or whatever format makes sense.
1: Sentence one.
2: Sentence two.
3: Sentence three.
Option 1: "Sentence one. Sentence two. Sentence three."
Option 2: “Sentence one. Sentence two. Sentence three. Sentence one."
Option 3:
“Sentence one. Sentence two.” -> ["Sentence one.", "Sentence two."]
“Sentence two. Sentence three.” -> ["Sentence two.", "Sentence three."]
“Sentence three. Sentence one.” -> ["Sentence three.", "Sentence one."]
Open questions:
- How to select the sentences?
- One set that is just a random sample from a corpus (representative; benchmark)
- One set that is edge cases (golden rules; unit tests)
- How to evaluate the performance on the selected sample (100) of sentences?
- Count number of exact matches. Option 3 above is probably the easiest way to do this.