Page MenuHomePhabricator

Special:Import cannot import characters like ä, á, ...
Closed, DeclinedPublic

Description

Author: friedrich.gelbard

Description:
If you export a German page like http://de.wikipedia.org/wiki/Haus and reimport
the page into mediawiki the charcters like ä,ö,ü,á,<,... cause an import failure.

Conclusion how can you imoprt an exported page with these chacters?


Version: 1.9.x
Severity: normal
OS: Windows XP
Platform: PC

Details

Reference
bz9906

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:41 PM
bzimport set Reference to bz9906.
bzimport added a subscriber: Unknown Object (MLST).

Please provide very detailed steps to reproduce the problem, as this has always
worked correctly for us.

friedrich.gelbard wrote:

Thanks for the hint, which brought me to an idea.

The Bug is as follows:

We have a Linux SuSE 10.2 server where MediaWiki 1.9.2 is installed. If you use Linux clients for import and export everything works fine.

We made the mistake and used a WindowsXP SP2 client for import and export.
The export of Wiki pages works fine so far but as soon as you save the exported source code of the page under Windows and reimport it into MediaWiki no conversion of the file format between Windows and Linux is made.

At least a warning should be issued saying: "You use a different operating system - file format may not be compatible."

Thanks for your help

No such restriction exists. Please do the following:

  1. provide an exact description of how you performed these saves
  1. provide a sample of a damaged file

friedrich.gelbard wrote:

Import this file into a mediawiki and you'll get an import failure

  1. Export a Wikipedia page
  2. Copy and paste the resulting xml code into an editor under Windows (You'll get a Dos/Windows ANSI textfile)
  3. And import this manually created file into a Mediawiki. You'll get an import failure.

If you save the file directly from the web browser the import works. (You'll get a UNIX ANSI as UFT-8 textfile)

Attached:

robchur wrote:

Well, when working with UTF-8, use an editor capable of dealing with it, or the browser's "Save As XML" (or similar) option. Even IE gets this right...

If used correctly and by means of suitable browsers, this seems to work; I'm chaning it to WORKSFORME.