feature: benchmarking module and deterministic benchmarking
- Generates some statistics from the comparison for each language and stores them in a log file. Sample:
Total boundaries: 100
Actual boundaries identified: 89
Misidentified boundaries: 5
Missing boundaries: 6
- Not all languages in the dataset have more than or equal to a hundred samples
Closes #8 (closed)
Edited by Appledora