Train on wikis and grid search
This MR modified the code to train and evaluate language agnostic models.
As an experiment we select a set of 52 languages from fallback chains and another set of 44 randomly selected wikis. We train a model on all languages in each set and evaluate on each individual language wiki. The performance comparison of the language-agnostic model and the single-language model can be found here: Sheets-Set-1 and Sheets-Set-2. In short: the performance of the language-agnostic model for both sets of languages are comparable to the single-language versions. This shows we can theoretically select any set wikis, perform combined training, and expect very good results.
Another experiment: We train a model with all (317) language wikis with a 100k sample cap. The evaluations can be found here: Sheets-All