Page MenuHomePhabricator

lists.wikimedia.org encoding issues in descriptions
Closed, ResolvedPublic

Description

Author: mcdevitd

Description:
The main mailing list directory at https://lists.wikimedia.org/mailman/listinfo has some encoding errors. See the list descriptions for wikiIS-l, wikinews-zh, wikiUK-l, and wikizh-l. If you click through to the lists' own pages, they appear correctly (e.g. https://lists.wikimedia.org/mailman/listinfo/wikiuk-l).


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=40971

Details

Reference
bz37817

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:28 AM
bzimport set Reference to bz37817.
bzimport added a subscriber: Unknown Object (MLST).

Thehelpfulonewiki wrote:

I did a test on this and I think that simply copying the existing description into your clipboard, removing it from the admin web interface, saving and then pasting it and saving will fix this problem. If you know know any of the list admins for the aforementioned lists, please get them to try this.

Related: bug 40971 ("mailman's public list index (listinfo) has the wrong encoding in its Content-Type header")

Created attachment 12189
A list of suggested corrections

Some list descriptions are written correctly in UTF-8 and some are broken. All should be fixed to UTF-8. I am attaching a text file with all the correct spellings in UTF-8 and a few other suggested corrections.

Attached:

T42971 fixed this for English, but other languages have different encodings on the server... Polish has "iso-8859-2".

The description page is a strange mix of UTF-8 and something else.

Our listadmins report they can view the pages but cannot update them...

Trying to use the corrected texts still had encoding errors.

Basically I had to take:

Comité d'arbitrage de la Wikipédia francophone / French Wikipedia ArbCom

and turn it into:

Comite d'arbitrage de la Wikipedia francophone / French Wikipedia ArbCom

So I realize its horrible for the native French speakers to have to deal with that in the output, but it seems to be the only way to fix the descriptions on lists.wikimedia.org.

I'll go ahead and fix the remaining bad encoded short descriptions in a similar fashion.

As of now, the short descriptions encoding is fixed, and no more '?' symbols on lists.wikimedia.org landing page.

Of course, nothing stops the individual list admins from changing them back, or setting them in the first place, so this will come up again.

RobH claimed this task.

Let me reopen this:

https://lists.wikimedia.org/mailman/listinfo/wikimediapl-l looks fine at first sight, but the user interface texts "Zapisz się" - Meaning "Subscribe" have a broken encoding.

saper set Security to None.
saper added a subscriber: Dzahn.

Related to T105756? I thought still nothing was touched, but it looks like what I warned about in T110131#1599443

JohnLewis subscribed.

Closing as original.

The reopening comment is about an upstream interface message which we have no control over while this is about descriptions on the global list info page caused by list admins.

The comment regarding 'is this related to the upgrade' it is not as the upgrade is not for another week. Either please hold off until the upgrade as the versions have had major i18n fixes or file an issue upstream with mailman development or i18n (though they won't take it seriously if we run 6-7 year old software).

Shall I open another bug to track i18n message issue? 2.1.20 I run locally here still has ISO 8859-2 encoded messages in the .po files.

@JohnLewis I don't want to be a pain, but I am speaking from experience. One mailing list system I happen to use (but not manage) after upgrade started showing mail headers containing this:

=?utf-8?q?Wiadomo=EF=BF=BD=EF=BF=BD_na_Dialog_od_mk=40finansowan?=

which contains U+FFFD REPLACEMENT CHARACTER character (�). twice instead of the real stuff.

Currently (at least I checked "PL" only) the email interface works fine, the web interface has latin2<->utf-8 issue for the system messages.

There are two issues really:

  1. A known latin2<->utf-8 issue, probably related to how Mailman gettext files are encoded. If there is no issue open for this, I'd open one, even if that can't be fixed very easily.
  1. An additional check in the upgrade process for the mail interface (I think't it's more important than the web) and the web interface if things have not degraded after the upgrade. Sending U+FFFD would be a showstopper for me, for example. We shouldn't just hope for the best. This is what I meant by writing https://phabricator.wikimedia.org/T105756#1600624

It seems these are mailman bugs as heart which is really outside of the scope for us. If they can be fixed easily by us, document it and we'll do so otherwise if they're upstream issues - filing bugs here will do nothing and likely end in 'invalid' or 'declined' resolutions.

No, this is not related to an upgrade. I want to make sure it's clear this existed before moving it from sodium. I want it to be fixed if possible but i also don't want it to be really related to the migration. Can we wait until the version is 2.1.18 and then check again next week unless the fix is very obvious?

I think I found the fix for #1 (filed T111457).

Point #2 (check that email encoding etc. during/after migration is correct) *is* related to the migration and is not related to T111457.