Page MenuHomePhabricator

Upload problems : Slow / timeouts
Closed, ResolvedPublic

Description

I've been having some upload problems recently. I've tested this in different situations

  • Upload Wizard upload is very slow (see also #30027)
  • Uploads using a bot from my home connection using the api will time out with larger files
  • Uploads using a bot from the Toolserver using the api will time out with larger files
  • Uploads from my (fast) work connection using classic special:upload and using the api will time out with larger files

Version: unspecified
Severity: critical

Details

Reference
bz30086

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedReedy

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 21 2014, 11:53 PM
bzimport set Reference to bz30086.
  • Bug 30027 has been marked as a duplicate of this bug. ***

Do you have some sample files and sample code to upload them that regularly reproduces the timeouts?

mattwj2002 wrote:

It has been very slow uploading to the Wikimedia Commons on both the basic upload and the regular upload.

In addition, I have been having problems with uploading using upload.py I have the newest version of the subversion.

It eventually uploads, but an upload that should take minutes is taking hours.

I have a 22 Mbps / 7 Mbps connection.

Here is a log of an example upload:

http://pastebin.com/A5Upvr31

Please fix this as soon as possible. Some people probably are not uploading because it is so slow. I think this issue is very important.

mattwj2002 wrote:

Correction, the pastebin should be the following:

http://pastebin.com/A5Upvr31

Sorry!

mattwj2002 wrote:

Bugzilla is having issues. When I post the link it is changing it. I am trying it with a space.

http://pastebin.com/ A5Upvr31

I'm definitely seeing a verrrry slow upload of a 78mb Ogg file to Commons, though I can't be sure whether it's the server end or the Wikimania network.

It seems to be spiking up briefly, then halting for a while, which could be an indication of lost packets delaying the upload stream as it waits to time out.

Peaks are 60-130 KB/sec, but ongoing rates are often ..... 6, 12, 25.

I cannot upload files ~2-3MB from the toolserver eighter:
Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes...

ralf wrote:

in the Commonist 0.4.17 "unexpected response: HTTP/1.0 502 Bad Bateway)

sumanah wrote:

Just heard another report of this from Martina Nolte, who ten minutes ago tried again to upload via Commonist: "Commonist now starts to upload a second image and then fails with "HTTP/1.0 502 Bad Gateway"."

(In reply to comment #7)

I can't be sure whether it's the server end or the Wikimania network.

It's not a Wikimania problem. "Homies" have the same bug since 3. Aug.:
http://commons.wikimedia.org/wiki/Commons_talk:Tools/Commonist#Upload_problem
http://commons.wikimedia.org/wiki/Commons:Forum#unexpected_response:_HTTP.2F1.0_502_Bad_Gateway

sumanah wrote:

Ryan Lane just mentioned to me that this seems like a problem with the Java app (Commonist?); any issues on the Wikimedia side seem fixed.

(In reply to comment #12)

Ryan Lane just mentioned to me that this seems like a problem with the Java app
(Commonist?); any issues on the Wikimedia side seem fixed.

Mark fixed an issue with one of the API apaches being incorrectly configured earlier today. Waiting to see if that fixes the 502 issues we've been seeing

Still not working with pywikipedia from the toolserver:

Uploading file to commons:commons via API....
HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...

HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'.

neilk wrote:

For everyone following this bug -- the 502/504 error issue is being separately tracked in #30201.

The API specific errors have been fixed, I wonder if this has any benefit on the upload issues..

Commonist upload runs perfectly now. Thanks to all who helped!

inductiveload wrote:

Upload though the upload form is still pretty slow: 16 minutes for a 40MB file (i.e. average speed of 42kBps). My connection is 7+ Mbps upload, (tested just after uploading), and the Internet Archive uploads are as fast as I expected (about a minute), so it must be a Commons-related problem.

I have heard, but not yet checked myself, that pywikipedia has upload token issues too.

No guys. This is not resolved at all. I guess we have two problems giving the same result: Upload problems

  1. Bugged proxy giving 502's (solved in #30201)
  2. General slowness of upload

Just take a look at the timeline at https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT to see how slow it is.
These are pictures uploaded from the toolserver.

Just take a look at the timeline at
https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT
to see how slow it is.
These are pictures uploaded from the toolserver.

I don't know anything about that bot, but, using the API, I charted
the time between uploads against the size of the uploads (the closest
approximation I could think of for speed). I did notice a little
slowdown yesterday but it seems to be back now.

The timeline, AFAICT, does not support your assertion that something
is still unresolved.

Feel free to repen and let me know what I should look for in the timeline of
that bot if you feel like there is still a problem.

neilk wrote:

Mark: how far back does your chart go? Reedy believes this started to become an issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this from lots of people. It might be localized to Europe, like the last problems were.

I'll post a chart for as far back as I can once I've generated it.

I'm a user. I have a problem. I open an incident. If the user confirms it, you'll close the incident, don't just close it *twice* because you think it's solved.

Commons upload is slow as hell, so yes, this is still an issue. So please, before you close this incident again: Verify with the user who reported this if it's really solved.

In reply to comment #22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I have data going back to March, now.

(In reply to comment #24)

Commons upload is slow as hell, so yes, this is still an issue.

I'm just trying to get some numbers to back up these data-less assertions. I know people don't usually keep numbers like this handy, so I'm sympathetic to what you're saying. However, objective numbers are more reliable than user reports of "slowness". I'll work with NeilK to get some.

I still get the following error:
Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 8 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 16 minutes...

(In reply to comment #25)

I have data going back to March, now.

And now, back to 2009 for BotMultichillT. I've posted the raw data at http://mah.everybody.org/chart.zip (8mb). I have also asked a researcher if she could help with visualizing the data.

There are problems with it, so I'm going to see if clean it up some. I also saw problems with the API while generating the report.

I did some test using my office pc (very fast uplink). I downloaded two +/- 10MB files from http://www.openbeelden.nl/ . That took about 1 second.

Uploading a file through the upload wizard took about 6 minutes = 30 KB/sec
Uploading a file through the old upload seems to take about the same amount of time.

I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA could do the same test (download a file from http://www.openbeelden.nl/ and upload it to Commons and time it) to see if the problem might be location related.

(In reply to comment #28)

I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA
could do the same test (download a file from http://www.openbeelden.nl/ and
upload it to Commons and time it) to see if the problem might be location
related.

I did note that before somewhere, possibly another bug, all the reporters seemed EU based, but it wasn't necessarily a complete survey

inductiveload wrote:

(In reply to comment #27)

And now, back to 2009 for BotMultichillT.

A simple graph of the data from 2008-2011 with a 1000-element moving average can be see at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2008-2011.png

A graph of the data from 2011 with a 100-element moving average can be seen at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2011.png

The moving average is very buggy and a couple of very fast outliers distort it badly, but a dramatic reduction can be seen firstly in January this year and again in July.

If it helps, I am based in the UK, but I have heard about this problem from American editors too.

neilk wrote:

Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5 minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.

Download (from http://www.openbeelden.nl/) was much faster, and took about 30 seconds.

Re: the graph -- the reduction in upload speed might coincide with how we introduced UploadWizard. It may be that the API method has always been slower. That would not explain why the upload speed seems to be dramatically slowing down in recent months, since we haven't altered anything about the upload protocol recently.

Multichill -- I would like to see the same graph with outliers removed, if you please?

(In reply to comment #31)

Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5
minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.

Download (from http://www.openbeelden.nl/) was much faster, and took about 30
seconds.

Upload vs download is, of course, not the same and depends on you provider.

We did some debugging last night. The chain when uploading is:

Me -> Europe squid -> US squid -> application server (apache) -> NFS -> ms7

Multiple people on different continents have this problem so it's probably not the Europe squids.
NFS copy from the apache to the nfs share on ms7 is fast so that doesn't seem to be the bottleneck either.
Upload to http://test.wikipedia.org is fast, but upload to secure test is very slow (even slower than Commons).
Unsecure test uses different apaches than secure test or secure/unsecure Commons.

Could an operations person please look into this? Bumping this do highest because Commons is becoming unusable. Lot's of reports are coming in

(In reply to comment #33)

Upload to http://test.wikipedia.org is fast, but upload to secure test is very
slow (even slower than Commons).

Slower than Commons, really? How does it compare to Commons via secure? I just wanna know whether we really are on to something here or whether we're just noticing a 'tax' being added by the secure gateway.

I just tried uploading a 3.06MB file from the Toolserver to Commons via the API. It took a little over 2 minutes, so roughly equivalent to the speed Neil was reporting.

juancho2291 wrote:

I'm from colombia and I've the same problem.

We realize this issue is affecting many users and we're looking into various causes of the problem. If people could avoid "me too" style comments that would help keep the signal:noise ratio down.

Chad: That's what you get if an problem like this stays open for more than two weeks.

Who from the operations team is actually working on getting this fixed right now? The bug is assigned to Sam and AFAIK he's not on it.

(In reply to comment #38)

Chad: That's what you get if an problem like this stays open for more than two
weeks.

I understand the problem is frustrating, but +1s don't help :)

Who from the operations team is actually working on getting this fixed right
now? The bug is assigned to Sam and AFAIK he's not on it.

I was working with Roan, Sam, RobLa, and Asher last night on this (so there's 4 people plus me). We added some additional profiling late last night that today should give us some more insights.

ebe123_wiki wrote:

Commons Helper is going so slow because of commons. Ebe123

ebe123_wiki wrote:

Commons Helper is going so slow because of commons. Ebe123(In reply to comment #22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this
from lots of people. It might be localized to Europe, like the last problems
were.

I'm in Halifax, and it is taking forever. Ebe123

(In reply to comment #41)

Commons Helper is going so slow because of commons. Ebe123(In reply to comment
#22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this
from lots of people. It might be localized to Europe, like the last problems
were.

I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed (and was in their tests) as of a couple of hours ago

Uploading speed via the API seems to be about 3 times faster now. It would be nice if a baseline speed were defined which could be tested against (or a range of speeds), so that we don't have to rely on people deciding that uploading is just "too slow" and filing a bug before anyone takes notice.

Over the last couple of days several people turned around the whole cluster trying to pinpoint the bottleneck. Squid were ruled out, ms7 and nfs was ruled out. It ended up being a low level problem:

[11:57] mark it was a nasty problem with TSO/GRO being broken with linux 802.1q tagged interfaces
[11:57] multichill So really low level problem?
[11:58] mark yeah
[11:58] mark so, the nic on lvs4 was reassembling tcp packets into jumbo packets before presenting them to the OS
[11:58] mark after which LVS would forward them
[11:58] mark and then they wouldn't be split back up again by the nic after sending out
[11:58] multichill And fragmentation?
[11:58] mark and dropped as jumbo packets
[11:58] mark so, tcp delays, icmp "frag needed" messages being sent
[11:58] mark really hard to see because on the wire, they were < 1500 byte packages as usual
[12:00] mark the fix was disabling GRO on all lvs servers
[12:00] mark no idea why it was on by default anyway, on most servers it isn't
[12:00] mark probably some nic drivers enable it, most don't
[12:01] mark i bet TSO wasn't happening because of the added 802.1q vlan tag

Thanks everyone for debugging this problem. I confirmed on Commons that upload is fast again (17MB file uploaded in less than 10 seconds).

Closing this bug as resolved.

ebe123_wiki wrote:

(In reply to comment #42)

(In reply to comment #41)

Commons Helper is going so slow because of commons. Ebe123(In reply to comment
#22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this
from lots of people. It might be localized to Europe, like the last problems
were.

I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed
(and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia.

ebe123_wiki wrote:

(In reply to comment #42)

(In reply to comment #41)

Commons Helper is going so slow because of commons. Ebe123(In reply to comment
#22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this
from lots of people. It might be localized to Europe, like the last problems
were.

I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed
(and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia. Its still an issue.

(In reply to comment #46)

Canada, the capital of nova scotia. Its still an issue.

So how slow are uploads for you?

I doubt this is server side. See for example how fast https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=US+National+Archives+bot is going.

Etienne: How as is you internet connection (up and download)? What file size are you trying to upload and how long did this take?

Etienne: You are https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=Ebe123 right? What tool do you use for that? Maybe the tool is just slow (I know commonshelper can be very slow).....

neilk wrote:

Smallman: other people are using the API successfully... I think that issue has to be either transient or local to your own situation.

We can't just keep reopening the same bug any time somebody has a network issue connecting to Commons.

neilk wrote:

I just want to clarify: I'm not saying your problem isn't real. I'm saying that we can't keep abusing Bugzilla so that we keep reopening the same bug for any and all network issues.

Please document your issue in a way we can replicate. Your issue seems to be some asymmetry between secure.wikimedia.org and https://commons, which might be a problem, but it's not THIS problem.

(In reply to comment #49)

Uploads to https://commons.wikimedia.org/w/api.php timeout (response times out)
whereas uploads to https://secure.wikimedia.org/wikipedia/commons/w/api.php
work fine.

That is a different problem than the one described here. Please open a new bug.

Resolved invalid? Don't think so. This was fixed back in August.