Page MenuHomePhabricator

AFT5: Character & word count broken for languages other than english
Closed, ResolvedPublic

Description

Reported by VIGNERON on frwiki ([[fr:Discussion Wikipédia:Outil de retour des lecteurs#Questions]])

He says that, in [[fr:Spécial:ArticleFeedbackv5/Autel de la Paix Auguste/050ab85686d4bd50ee7e782bcb087d2e]], the feedback is 'les caractéristiques', which is 20 characters long and contains 2 words.

But AFT5 says this string contains 21 characters and 3 words.

I read some feedbacks and I think AFT5 is counting bytes instead of characters, and consider 'é' is not a valid character for a word.


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 2:39 AM
bzimport set Reference to bz58280.
bzimport added a subscriber: Unknown Object (MLST).

[Lowering priority to reflect reality, as AFTv5 is not very actively being worked on anymore.]

Jdforrester-WMF subscribed.

All development work on AbuseFilter v.5 (and indeed, previous versions) is halted. The project is archived, so having open tasks is inappropriate. Consequently, I'm closing all tasks.

SamanthaNguyen moved this task from Backlog to Bugs on the ArticleFeedbackv5 board.
SamanthaNguyen edited subscribers, added: ashley, SamanthaNguyen; removed: wikibugs-l-list.

Re-investigating per T146253

Change 521407 had a related patch set uploaded (by Jack Phoenix; owner: Jack Phoenix):
[mediawiki/extensions/ArticleFeedbackv5@master] str_word_count() struggles with UTF-8 characters, use a custom implementation instead that understands UTF-8

https://gerrit.wikimedia.org/r/521407

Change 521407 merged by jenkins-bot:
[mediawiki/extensions/ArticleFeedbackv5@master] str_word_count() struggles with UTF-8 characters, use a custom implementation instead that understands UTF-8

https://gerrit.wikimedia.org/r/521407

ashley claimed this task.