Page MenuHomePhabricator

Importing (transwiki) only the top revision should succeed reliably
Closed, ResolvedPublic

Description

Author: mike.lifeguard+bugs

Description:
Currently importing only the top revision using transwiki import fails for no good reason I can come up with - and in fact fails more often than it succeeds. This shouldn't happen.

The reason importing many revisions fails, is: "The slow part is loading the required data from external storage, but even if it sped it up by, say, 5 times, it would still time out because some transwiki jobs are just that big... to transwiki a page it might have to load up hundreds of megabytes of data from various hard drives - non-contiguous data, it'll have to seek." (Tim Starling) - but that doesn't apply to fetching only one revision, I hope.

So, something must be going wrong here.


Version: 1.14.x
Severity: normal

Details

Reference
bz16875

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:29 PM
bzimport set Reference to bz16875.

Transwiki importing times out very easily :( If the server takes too long to build the XML, the connection will time out. If it's too big, it might time out. Or if it's just transferring slowly (too much network congestion?) it might time out.

Short of rethinking the interwiki imports (which might be nice), the easiest solution is making $wgHTTPTimeout higher than the default 3 seconds and being patient.

mike.lifeguard+bugs wrote:

(In reply to comment #1)

Transwiki importing times out very easily :( If the server takes too long to
build the XML, the connection will time out. If it's too big, it might time
out. Or if it's just transferring slowly (too much network congestion?) it
might time out.

Short of rethinking the interwiki imports (which might be nice), the easiest
solution is making $wgHTTPTimeout higher than the default 3 seconds and being
patient.

All true, but for *one revision* there should be no problem. We load pages for viewing (ie the top revision only) fast enough; why can't we generate xml for the same data fast enough?

EN.WP.ST47 wrote:

Is this process still unreliable?

So here we are in 2014; what's the status, still fails? I guess I got given the bug by default...

So here we are in 2014

2015 even...

I just managed to import the top revision of the notoriously long "Fanny Crosby" article from enwiki to testwiki, and it completed very quickly (few seconds). It may be an intermittent issue, these type of transwiki import bugs tend to be transient...

ArielGlenn set Security to None.
TTO claimed this task.
TTO added a subscriber: demon.

the easiest solution is making $wgHTTPTimeout higher than the default 3 seconds and being patient.

It seems that $wgHTTPTimeout defaults to 25 in MediaWiki core, and WMF cluster is using that default. On labs it is set to 10 seconds.

Not really sure if there is anything to fix here anymore. On the WMF cluster at least, speed is no longer an issue. As for other MW installations, transwiki import of large pages is never going to be lightning-fast.