Page MenuHomePhabricator

Special:Import error: "Import failed: Could not open import file"
Open, LowPublic

Description

Author: fran

Description:
Trying to transwiki the page [[Help:ParserFunctions]] and its history on Meta to MediaWiki.org using Special:Import, I ran into an error - it fails with the cryptic message "Import failed: Could not open import file". I have transwiki'd other pages with full histories from Meta to MW.org with no problems, but this page refuses to do so.

Notably, the transfer works if I choose to only copy over the current revision; this is of little use, though, since the GFDL requires that they all be transferred over. This is a rather large stumbling block in the process of moving all MediaWiki documentation from Meta to MediaWiki.org - is there any way to work around or fix this error?


Version: unspecified
Severity: normal

Details

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:13 PM
bzimport set Reference to bz15000.
bzimport added a subscriber: Unknown Object (MLST).

Mass compoment change: <some> -> Export/Import

lambdav wrote:

We have the same problem on the French Wikibooks to import from the Wikipédia.
Example : import the page [[w:fr:Résolution d'un sudoku]] always fail returning the following error message :
Échec de l’import : Impossible d’ouvrir le fichier à importer

The page exists. When we try to import a page which doesn't exists the message is :
Échec de l’import : Aucune page à importer.

With some pages, it may success after many number of tries (about 10 attemps), but with some other pages, it never success.

See also [https://bugzilla.wikimedia.org/show_bug.cgi?id=13807 Bug 13807]

I tried that a while back, I think it is just too big.

mike.lifeguard+bugs wrote:

*** Bug 12231 has been marked as a duplicate of this bug. ***

mike.lifeguard+bugs wrote:

*** Bug 13807 has been marked as a duplicate of this bug. ***

mike.lifeguard+bugs wrote:

*** Bug 16981 has been marked as a duplicate of this bug. ***

Same problem in el.wikipedia. The import fails when trying to import pages with a lot of revisions, so in fact the tool is mostly useless (nobody is interested importing stubs).

mike.lifeguard+bugs wrote:

(In reply to comment #7)

Same problem in el.wikipedia. The import fails when trying to import pages with
a lot of revisions, so in fact the tool is mostly useless (nobody is interested
importing stubs).

Worse than that, it actually fails ~90% of the time when trying to import only one revision. There is obviously something going on here which has nothing to do with the difficulty in retrieving page text quickly from external storage for all revisions of a page, since that's not required for importing the top revision. See bug 16875.

  • Bug 26493 has been marked as a duplicate of this bug. ***

Any updates on this?

I've tried to import the page again and I'm still getting the error described on Bug 26493.

*** Bug 21409 has been marked as a duplicate of this bug. ***

*** Bug 17281 has been marked as a duplicate of this bug. ***

john wrote:

Please do not change priority unless you are a developer, and raising the severity to blocker won't help this move.

lambdav wrote:

Why is there no action to resolve this blocking problem ??
Should we stop reporting problems as nothing has been done for this one since 2008 !?

john wrote:

This seems to be a Wikimedia problem, moving to the appropriate product.

(In reply to comment #15)

This seems to be a Wikimedia problem, moving to the appropriate product.

Are you sure? Import/export via special pages is notoriously unreliable on all wikis I've heard of.

This is mostly caused by timeouts (e.g. try to import [[Wikipedia:Sandbox]] or [[wikt:water]] with all revisions).

The solution here would be to increase $wgHTTPTimeout on the Wikimedia cluster, or to introduce a separate timeout config variable into MediaWiki that will affect imports only.

Adding ops keyword and CC'ing Reedy - they should know whether increasing the timeout is OK, and whether it will actually help to solve this problem.

(In reply to comment #17)

The solution here would be to increase $wgHTTPTimeout on the Wikimedia
cluster,or to introduce a separate timeout config variable into MediaWiki
that will affect imports only.

MediaWiki & Cluster => CC'ing Tim.

In T17000#192142, @TTO wrote:

The solution here would be to increase $wgHTTPTimeout on the Wikimedia cluster, or to introduce a separate timeout config variable into MediaWiki that will affect imports only.

Adding ops keyword and CC'ing Reedy - they should know whether increasing the timeout is OK, and whether it will actually help to solve this problem.

@fgiunchedi: I added Ops to get feedback on the consequences of increasing the value of $wgHTTPTimeout on the production cluster.

Sorry Filippo... I'm adding Ops back per my previous comment. Otherwise this will be stalled forever.

If ops don't answer, I think it's reasonable to proceed by trial and error. The first step would be to set $wgHTTPTimeout at whatever is the current timeout for API requests (at least 60 seconds?).

If ops don't answer, I think it's reasonable to proceed by trial and error. The first step would be to set $wgHTTPTimeout at whatever is the current timeout for API requests (at least 60 seconds?).

Trial and error is not advisable with important parameters that can cause serious issues if mistuned.

But I have a more fundamental question: why do you all think this issue is originated from the value of $wgHTTPTimeout? I don't see anything specific here pointing to that being the problem.

Getting into the value of the variable: at the moment the timeout for internal HTTP requests is set to 25 seconds (the MediaWiki default); I think it would be advisable to have a separated timeout for this action specifically, and maybe limit multi-page imports to some specific class of users. Any timeout longer than 60 seconds doesn't really make much sense.

I would downvote any patch increasing wgHTTPTimeout globally by more than 100% from its current value anyways.

In T17000#2907375, @Joe wrote:

But I have a more fundamental question: why do you all think this issue is originated from the value of $wgHTTPTimeout? I don't see anything specific here pointing to that being the problem.

It's not a server timeout: I can download the entire 353 MB XML dump of Wikipedia:Sandbox on enwiki (https://en.wikipedia.org/wiki/Special:Export/Wikipedia:Sandbox?history=1) with no problems. So, by process of elimination, it must be a client timeout.

Indeed if you try to import all revisions of enwiki's Wikipedia:Sandbox using transwiki import onto another WMF wiki, it fails with "Import failed: Could not open import file" after 26 or 27 seconds! Coincidence or...?

I think it would be advisable to have a separated timeout for this action specifically

Seems reasonable.

and maybe limit multi-page imports to some specific class of users.

Imports are already limited to sysops, and they're not exactly common actions anyway.

Any timeout longer than 60 seconds doesn't really make much sense.

I don't see why not; imports can be very long operations by their very nature.

Don't we already have maintenance scripts that allow these large export/imports to be done server side?

Don't we already have maintenance scripts that allow these large export/imports to be done server side?

Technically it could be done server-side, but it would be a bit tedious, and I don't see any benefit to doing it that way. It's still going to take just as long.

Change 330171 had a related patch set uploaded (by Filip):
Added $wgHTTPImportTimeout setting

https://gerrit.wikimedia.org/r/330171

Filip subscribed.

@TTO: Added $wgHTTPImportTimeout setting.

Change 330171 merged by jenkins-bot:
Added $wgHTTPImportTimeout setting

https://gerrit.wikimedia.org/r/330171

Patch got merged, so closing as 'resolved'.

Given the default for $wgHTTPImportTimeout is the same 25 seconds, and the setting was not changed on Wikimedia, this is not fixed for WMF-General-or-Unknown.

Aklapper subscribed.

The pleasure of having tasks mixing up the general MediaWiki code base and the specific configuration of Wikimedia sites. :(
Should we split of the Wikimedia site request into a separate task? Feels way cleaner.

In T17000#2908075, @TTO wrote:

Any timeout longer than 60 seconds doesn't really make much sense.

I don't see why not; imports can be very long operations by their very nature.

Because we have such timeouts set both in our API and appserver layer and even shorter timeouts in varnish IIRC; setting a higher timeout will result in the import action maybe working but at the same time the user receiving an error page.

The pleasure of having tasks mixing up the general MediaWiki code base and the specific configuration of Wikimedia sites. :(
Should we split of the Wikimedia site request into a separate task? Feels way cleaner.

Well the two things were quite entangled, but I think it would be fair to separate the tasks.

As an additional note, it also affects it.wikiversity importing this page (2000 revisions).

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)