Filter list of common misspellings based on the number of occurrences in Wikipedia
The list of common misspellings extracted from enwiktionary (see issue) might contain entries that are commonly used in Wikipedia. For example, the word alledged is tagged as a misspelling in English; however, as a verb it seems to be a correctly spelled word. We would thus like to filter such cases where a supposed misspelling is very common.
The aim is to count the number of occurrences of each misspelling (the misspelled word as well as the correctly spelled word) in all articles of the respective language version of the Wikipedia.
We will then need to define a filter for the misspellings; e.g. if the ratio between misspelled and correctly spelled word surpasses a threshold. Details to be determined.