Extract and analyze misspelling from all language wikipedias
We extract and analyze misspellings from all language wikipedias using the misspellings extracted from enwiktionary
. We also filter the misspellings by ["is_list", "is_table", "is_quote", "is_text_formatting", "is_different_language"]
formattings, assuming these fomattings typically don't have misspellings. Then we find the number of missellings and the gini coefficient to assess the distribution of misspellings across various words in a language.