Find misspellings in Wikipedia text
Having collected a list of misspellings from Wiktionary and comparing with existing approaches (e.g AWB), we would now like to check Wikipedia itself.
The aim is to find the occurrences of misspellings in Wikipedia and reporting the count, section heading, and the paragraph (for context). We manually inspect the count and position of misspelling in text to find places we need to avoid searching. This is because certain text should not be altered even if they appear to be misspellings: quotes, references, etc.
We start with simplewiki as a first step and move on to enwiki.