Page MenuHomePhabricator

Non-latin text broken after import to etherpad lite
Closed, ResolvedPublic

Description

see https://etherpad.wikimedia.org/p/i18n-team-02.
All question marks you see there are non-latin text.


Version: wmf-deployment
Severity: major

Details

Reference
bz52831

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:13 AM
bzimport set Reference to bz52831.
bzimport added a subscriber: Unknown Object (MLST).

I just created a test pad and added some non lating characters (greek) and they were fine. Is there any chance this pad was imported from somewhere else, i.e. one of the old etherpad instances?

Yes, it was imported from old instance(same URL)

Non-latin text at the top (Hebrew etc) is displayed correctly though.
How exactly was this "imported" and from where?

(In reply to comment #3)

Non-latin text at the top (Hebrew etc) is displayed correctly though.

That part has been changed after the import.

How exactly was this "imported" and from where?

etherpad.wikimedia.org used to run Etherpad, now it runs Etherpad-lite. Someone from ops is needed to answer this question.

For non-latin text pads this would be a dataloss issue if not fixed.

Adding akosiaris who I believe did the import.

Most of etherpad.wikimedia.org running etherpad was automatically imported to etherpad-lite using the (patched by me to actually run, more info at https://rt.wikimedia.org/Ticket/Display.html?id=5464 ) convert.js script provided by etherpad-lite.

The old pads are still available at etherpad-old.wikimedia.org (albeit as read-only) so this pads content still exists unmodified at:

http://etherpad-old.wikimedia.org/i18n-team-02

I believe the error is due to the new database's character set. I will investigate further and update this ticket accordingly.

It turns out it was not just the database's character set but the conversion script's as well (it was missing a SET NAMES utf8) declaration at the beginning of the output.I reimported the entire database (it took a long time) and now the pad in question as well as some other show up correctly.

The ones listed at RT 5464 as problematic may still exhibit some problems but otherwise I consider this fixed. Please confirm and update the ticket accordingly