Resolve "Evaluation: update and modify the sentence evaluation code"
Updated the old sentence benchmarking to make it compatible with the current tokenizer implementation. Given a json file of following format :
{
"en" = [sentence 1, sentence2 ... sentence100],
"de" = [sentence 1, sentence2 ... sentence100],
"bn" = [sentence 1, sentence2 ... sentence100]
....
}
the benchmarking code outputs a csv file with following columns.
<correct> <partially correct> <incorrect> <missing> <accuracy>
The code also generates a benchmarking log as a csv file:
We can identify four types of errors:
- type 1 (2-no-match): splits into two sentences. But neither the input sentences
- type 2 (>2-one-match): splits into more than two sentences, with at least one of the sentences
- type 3 (>2-no-match): splits into more than two sentences, with none of the sentences
- type 4 (no-split): doesn't split into two sentences
Closes #27 (closed)