Skip to content

fix-regex-for-non-ws-languages

AKhatun requested to merge fix-recall into main

Fixed regex that turns detected mention in text to a link. Currently it was detecting words with word boundaries (\b) which was inherently looking for white spaces. This does not work with non-whitespace languages. The regex was modified to detect mention as a pure substring. This improved the recall of most languages that were previously failing. It also does not deteriorate the performance of other models. Some example languages were run to ensure consistent performance.

14 of the 22 language's recall improved. Rest had similar results. There was no significant drop in performance.

Languages that were previously failing (previous = the state of the link-recommendation as of last commit)

wiki previous precision precision previous recall recall comments
aswiki 0.67 0.68 0.17 0.28 recall improvement
bowiki 0.90 0.98 0.07 0.62 recall improvement
diqwiki 0.92 0.88 0.35 0.49 recall improvement, slight drop in precision
dvwiki 1.0 0.88 0.02 0.49 recall improvement, slight drop in precision
dzwiki 1.0 1.0 0.07 0.23 recall improvement
fywiki 0.82 0.82 0.45 0.459 similar results
ganwiki 0.88 0.82 0.07 0.296 recall improvement
hywwiki 0.78 0.75 0.20 0.30 recall improvement
jawiki 0.85 0.82 0.06 0.35 recall improvement
krcwiki 0.77 0.78 0.33 0.35 similar results
mnwwiki 1.0 0.97 0.02 0.68 recall improvement
mywiki 0.70 0.95 0.047 0.82 recall improvement
piwiki 0 0 nan nan only 13 sentences
shnwiki 0.99 0.99 0.77 0.88 recall improvement
snwiki 0.67 0.69 0.16 0.18 similar results
szywiki 0.69 0.79 0.23 0.48 improvement
tiwiki 0.796 0.796 0.48 0.48 similar results
urwiki 0.86 0.86 0.53 0.54 similar results
wuuwiki 0.42 0.68 0.007 0.36 improvement
zhwiki 0.82 0.78 0.04 0.47 improvement
zh_classicalwiki 1.0 1.0 0.0001 0.0001 no improvement
zh_yuewiki 0.31 0.31 0.0006 0.0006 no improvement

Some other languages.

wiki previous precision precision previous recall recall comments
arwiki 0.82 0.82 0.35 0.36 similar
bnwiki 0.734 0.725 0.295 0.38 similar
cswiki 0.80 0.80 0.45 0.45 similar
dewiki 0.83 0.83 0.48 0.48 similar
frwiki 0.82 0.82 0.50 0.50 similar
simplewiki 0.79 0.79 0.43 0.43 similar
viwiki 0.91 0.91 0.67 0.67 similar
Edited by AKhatun

Merge request reports