fix-regex-for-non-ws-languages
Fixed regex that turns detected mention in text to a link. Currently it was detecting words with word boundaries (\b
) which was inherently looking for white spaces. This does not work with non-whitespace languages. The regex was modified to detect mention as a pure substring. This improved the recall of most languages that were previously failing. It also does not deteriorate the performance of other models. Some example languages were run to ensure consistent performance.
14 of the 22 language's recall improved. Rest had similar results. There was no significant drop in performance.
Languages that were previously failing (previous = the state of the link-recommendation as of last commit)
wiki | previous precision | precision | previous recall | recall | comments |
---|---|---|---|---|---|
aswiki | 0.67 | 0.68 | 0.17 | 0.28 | recall improvement |
bowiki | 0.90 | 0.98 | 0.07 | 0.62 | recall improvement |
diqwiki | 0.92 | 0.88 | 0.35 | 0.49 | recall improvement, slight drop in precision |
dvwiki | 1.0 | 0.88 | 0.02 | 0.49 | recall improvement, slight drop in precision |
dzwiki | 1.0 | 1.0 | 0.07 | 0.23 | recall improvement |
fywiki | 0.82 | 0.82 | 0.45 | 0.459 | similar results |
ganwiki | 0.88 | 0.82 | 0.07 | 0.296 | recall improvement |
hywwiki | 0.78 | 0.75 | 0.20 | 0.30 | recall improvement |
jawiki | 0.85 | 0.82 | 0.06 | 0.35 | recall improvement |
krcwiki | 0.77 | 0.78 | 0.33 | 0.35 | similar results |
mnwwiki | 1.0 | 0.97 | 0.02 | 0.68 | recall improvement |
mywiki | 0.70 | 0.95 | 0.047 | 0.82 | recall improvement |
piwiki | 0 | 0 | nan | nan | only 13 sentences |
shnwiki | 0.99 | 0.99 | 0.77 | 0.88 | recall improvement |
snwiki | 0.67 | 0.69 | 0.16 | 0.18 | similar results |
szywiki | 0.69 | 0.79 | 0.23 | 0.48 | improvement |
tiwiki | 0.796 | 0.796 | 0.48 | 0.48 | similar results |
urwiki | 0.86 | 0.86 | 0.53 | 0.54 | similar results |
wuuwiki | 0.42 | 0.68 | 0.007 | 0.36 | improvement |
zhwiki | 0.82 | 0.78 | 0.04 | 0.47 | improvement |
zh_classicalwiki | 1.0 | 1.0 | 0.0001 | 0.0001 | no improvement |
zh_yuewiki | 0.31 | 0.31 | 0.0006 | 0.0006 | no improvement |
Some other languages.
wiki | previous precision | precision | previous recall | recall | comments |
---|---|---|---|---|---|
arwiki | 0.82 | 0.82 | 0.35 | 0.36 | similar |
bnwiki | 0.734 | 0.725 | 0.295 | 0.38 | similar |
cswiki | 0.80 | 0.80 | 0.45 | 0.45 | similar |
dewiki | 0.83 | 0.83 | 0.48 | 0.48 | similar |
frwiki | 0.82 | 0.82 | 0.50 | 0.50 | similar |
simplewiki | 0.79 | 0.79 | 0.43 | 0.43 | similar |
viwiki | 0.91 | 0.91 | 0.67 | 0.67 | similar |