Page MenuHomePhabricator

Error when importing pages from English Wikipedia to Portuguese Wikibooks
Closed, ResolvedPublic

Description

If we try to use [[b:pt:Special:Import]] to import [[Template:0]] from Wikipedia the page is reporting the following error:
"Import failed: Expected <mediawiki> tag, got"
Could this be fixed?

PS: the error message should also be more descriptive (got what?)


Version: unspecified
Severity: normal
URL: https://secure.wikimedia.org/wikibooks/pt/w/index.php?title=Especial:Importar&action=submit

Details

Reference
bz28277

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:35 PM
bzimport set Reference to bz28277.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 8338
screenshot of error

(In reply to comment #2)

Try it again?

Still the same.

Attached:

Template:0.png (464×1 px, 40 KB)

Would you have a chance to track down the commit that makes this work on trunk but not deployment?

You cannot import from w:en on wikibooks, because the URL is not created correct. The url is created the same way, as {{fullurl:}} do it:

{{fullurl:w:en:Template:0}}

gives

http://pt.wikipedia.org/wiki/en:Template:0

and that url is not resolved from the Importer to

http://en.wikipedia.org/wiki/Template:0

Resolving local redirects is a bad idea, because then you cannot import a redirect, but resolving "external" redirects should by done.

(In reply to comment #5)

You cannot import from w:en on wikibooks

Actually we (usually) can, as you can check in our import log:
http://pt.wikibooks.org/wiki/Especial:Registo/import?withJS=MediaWiki:Gadget-Filtro_para_listas.js&lifilter=1&lifilterexpr=w:en:

I don't know why [[Template:0]] behaves different from other pages we imported from English Wikipedia.

(In reply to comment #6)

(In reply to comment #5)

You cannot import from w:en on wikibooks

Actually we (usually) can, as you can check in our import log:
I don't know why [[Template:0]] behaves different from other pages we imported
from English Wikipedia.

The wiki have not any success import under 1.17 from w:en, so it does not affect Template:0 alone.

Since r67684 the redirects are not followed by default ($followRedirects is false), that breaks Special:Import for "external" redirects because

http://pt.wikipedia.org/wiki/en:Special:Export/Template:0

is not resolved to

http://en.wikipedia.org/wiki/Special:Export/Template:0

I don't think the feature was done considering w:en: as valid interwikis. This looks like a hack instead.

I was concerned about importing from languages other than English, since fetching http://pt.wikipedia.org/wiki/Special:Export/Template:0 redirects you to the canonical url http://pt.wikipedia.org/wiki/Special:Export/Template:0
However, after testing you're not redirecting on POSTs, so no problem on that side.

(In reply to comment #7)

(In reply to comment #6)

I don't know why [[Template:0]] behaves different from other pages we imported
from English Wikipedia.

The wiki have not any success import under 1.17 from w:en, so it does not
affect Template:0 alone.

Indeed. I've tried to import [[MediaWiki:Gadget-contribsrange.js]] today and got the error again =(

We are still unable to import templates from English Wikipedia, as reported by another user on this edit summary:
https://secure.wikimedia.org/wikibooks/pt/w/index.php?title=Predefinição:Min&diff=0&uselang=en

From triage, mybugs.mail gave this set of steps for reproducing:

> I've filled the fields with "Source wiki/page: [w:en] [Template:Min]"
> ...and I marked only the option "Copy all history revisions for this page"
> the point is that the en.wp templates are not always available

(In reply to comment #11)

From triage, mybugs.mail gave this set of steps for reproducing:

> I've filled the fields with "Source wiki/page: [w:en] [Template:Min]"
> ...and I marked only the option "Copy all history revisions for this page"
> the point is that the en.wp templates are not always available

Note, from my local install I cannot reproduce this. Just to confirm, importing Template:Min using the interwiki prefix w:en from ptwikibooks and selecting copy entire history and not selecting copy all templates, always fails (As opposed to intermittently fails)?

(In reply to comment #11)

From triage, mybugs.mail gave this set of steps for reproducing:

I've filled the fields with "Source wiki/page: [w:en] [Template:Min]"
...and I marked only the option "Copy all history revisions for this page"
the point is that the en.wp templates are not always available

Just to clarify: this last line was refering to the workaround suggested by Adrignola: to import a page from English Wikibooks instead of the English Wikipedia. This is not always possible because the desired page/template may not exist on English Wikibooks or may be outdated.

(In reply to comment #12)

(In reply to comment #11)

From triage, mybugs.mail gave this set of steps for reproducing:

> I've filled the fields with "Source wiki/page: [w:en] [Template:Min]"
> ...and I marked only the option "Copy all history revisions for this page"
> the point is that the en.wp templates are not always available

Note, from my local install I cannot reproduce this. Just to confirm, importing
Template:Min using the interwiki prefix w:en from ptwikibooks and selecting
copy entire history and not selecting copy all templates, always fails (As
opposed to intermittently fails)?

The bug is not tied to any specific page/template. As already mentioned, it failed with [[Template:0]] and [[Template:Min]] but I can confirm it also happens on a page such as [[1500s]], with the following options:

Source wiki/page: [w:en][1500s]

Destination namespace: [all]

and leaving both "Copy all history revisions for this page" and
"Include all templates" not selected. Enabling any of these two options or changing the destination namespace doesn't change the situation, i.e., the message is always "Import failed: Expected <mediawiki> tag, got".

Tried to reproduce this on my local wiki just now with [[1500s]] and didn't see an error. I wonder if this might require someone with shell to investigate.

It may have been fixed in the meantime. Trunk in May is will be in 1.18
mybugs.mail, is it still failing?

Still not working with [[1500s]], [[Template:0]] and [[Template:Min]].

Works fine locally, not transwiki

Downloading to file and pushing back works fine

Last 3 have been imported

(In reply to comment #17)

Works fine locally, not transwiki

Downloading to file and pushing back works fine

Last 3 have been imported

Good to know that at least we can request someone to download and upload a page from en.wp when necessary.
Nonetheless, I'm reopening the bug because [[b:pt:Special:Import]] still returns an error when trying to import any random page from English Wikipedia. (or should I wait for some code revision to go live?)

Some other tests which may help finding the problem:

  • In your local tests, have you tried something which involves a interwiki prefix having two parts, such as "w" and "en" in the case of "w:en"? See bug 20552 comment 3.
  • Would it make any difference if "w:en" were changed to "en:w" on Portuguese Wikibooks wgImportSources?
  • Have anyone tried some other Wikimedia wiki having a import source in other language/project? E.g.: 'elwikibooks', which also has 'w:en' on its sources list.[1]

[1] http://noc.wikimedia.org/conf/InitialiseSettings.php.txt

  • In your local tests, have you tried something which involves a interwiki

prefix having two parts, such as "w" and "en" in the case of "w:en"?

yes, I did try that and was not able to reproduce locally.

I was able to reproduce the problem on my copy of MW 1.18.

First, I downloaded [[mw:Extension:Interwiki]] and put the following on my LocalSettings.php:
require_once("$IP/extensions/Interwiki/Interwiki.php");

$wgGroupPermissions['*']['interwiki'] = false;
$wgGroupPermissions['sysop']['interwiki'] = true;
$wgImportSources = array(
      'w2',
      'en2',
      'w2:en',
      'en2:w'
);

Then, I went to [[Special:Interwikis]] and:

After this, links such as
*[[:w2:Main Page]] points to http://pt.wikipedia.org/wiki/Main Page
*[[:en2:Main Page]] points to http://en.wikibooks.org/wiki/Main Page
*[[:w2:en:Main Page]] points to http://pt.wikipedia.org/wiki/en:Main Page
*[[:en2:w:Main Page]] points to http://en.wikibooks.org/wiki/w:Main Page
(I did this just to check if I configured the extension correctly).

Then, I opened [[Special:Import]] and typed "Main Page" in the field and unselected "Copy all history revisions for this page". The result of trying to import from each of my sources was:

  • w2: ok
  • en2: ok
  • w2:en: Import failed: Expected <mediawiki> tag, got
  • en2:w: Import failed: Expected <mediawiki> tag, got

In these two cases, there was also some warnings on top of page. The source "w2:en" caused these:

Warning: XMLReader::read(): uploadsource://0785606dfa54a2a22d337ef4c81f95d3:1: parser error : Extra content at the end of the document in /var/www/mw18/includes/Import.php on line 362

Warning: XMLReader::read(): in /var/www/mw18/includes/Import.php on line 362

Warning: XMLReader::read(): ^ in /var/www/mw18/includes/Import.php on line 362

Warning: XMLReader::read(): An Error Occured while reading in /var/www/mw18/includes/Import.php on line 362

The source "en2:w" also caused the same warnings, but with "uploadsource://c98a4f54d974dd8ad71a24618a955a83:1:" instead.

Created attachment 9298
re-allow imports of w:en:Main_Page and the like

Try the attached patch also applied at r101010

Attached:

(In reply to comment #21)

Created attachment 9298 [details]
re-allow imports of w:en:Main_Page and the like

Try the attached patch also applied at r101010

This fix the problem on my local copy of MW 1.18.

Attached: