Sentence Tokenization: For sentences that include references
Hello team! While using mwtokenizer for sentence tokenization, I ran into a case, where references/foot notes added at the end of a sentence would cause such sentences to be grouped together with the subsequent sentence during tokenization. Here are some (slightly modified) examples from enwiki, where 3 sentences result in 1:
-
He also sent a copy to Watt, who forwarded it to the researchers who were moving to their new research station at Bawdsey Manor.{{sfn|Bowen|1998|p=31}} In a meeting at the Crown and Castle pub, Bowen pressed Watt for permission to form a group to study the possibility of placing a radar on the aircraft itself.{{sfn|Bowen|1998|p=31}} This would mean the CH stations would only need to get the fighter into the general area of the bomber, the fighter would be able to use its own radar for the rest of the interception.
link to the article -
Many details of Ruth's childhood are unknown, including the date of his parents' marriage.<ref>{{harvp|Creamer|1992|p=11}}</ref> As a child, Ruth spoke [[German language|German]].<ref>{{citation|last=Sowell|first=Thomas|author-link=Thomas Sowell|title=Migrations and Cultures: A World View|publisher=[[Basic Books]]|place=[[New York City|New York]]|year=1996|page=82|quote={{nbsp}}...it may be indicative of how long German cultural ties endured [in the United States] that the German language was spoken in childhood by such disparate twentieth-century American figures as famed writer [[H.L.Mencken]], baseball stars Babe Ruth and [[Lou Gehrig]], and by the Nobel Prize-winning economist [[George Stigler]].|isbn=978-0-465-04589-1}}</ref> When Ruth was a toddler, the family moved to 339 South Woodyear Street, not far from the rail yards; by the time he was six years old, his father had a saloon with an upstairs apartment at 426 West Camden Street.
link to the article
Is it an expected behavior, and are there any recommended workarounds? Thank you!