Page MenuHomePhabricator

[performance regression]: Web server responds very slow to action=delete submissions in the File namespace
Closed, ResolvedPublic

Description

Author: saibotrash

Description:
For some reason apparently logged actions get done twice... https://commons.wikimedia.org/w/index.php?title=Special:Log&page=File%3AAfrican+Cup+of+Nations+2012.png I just reverted once: two reverts.
... I think I did not click twice.. ;-)

Also noted: the revert takes ages (~ 30 seconds to 2:30 minutes) to return (browser waits...). The long revert duration was confirmed three times (with two different browsers (FF10 and Opera11.6) and accounts) by me at https://commons.wikimedia.org/w/index.php?title=File:AaatestSonnepalmenstrand-portrait_new.jpg&action=history

Reproduce: Just do a revert to a previous file version at this test file and see what you get. Note that the revert is done "fast". Just it takes ages for the server feedback.

In case it is a Europe-only problem: I am in Germany.


Version: unspecified
Severity: major

Details

Reference
bz34717

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:15 AM
bzimport set Reference to bz34717.

It does seem a bit slow: [23:06:40.415] POST https://commons.wikimedia.org/wiki/File:AaatestSonnepalmenstrand-portrait_new.jpg [HTTP/1.1 200 OK 22412ms] (and irc entry at 23:07:02)

I can't reproduce the "double actions", though.

In theory, a POST operation should only be attempted once, not retried, regardless of how long it takes. But maybe something in the network path here is getting impatient and sending the request again. It could be on our side, in the ISP or Saibo's browser. But since that's not the subject of the bug, it should be filed as a separate report if it's a problem.

I just tried upload, reupload, and revert on testwiki and it's fairly fast.

saibotrash wrote:

(In reply to comment #3)

https://commons.wikimedia.org/w/index.php?title=File:AaatestSonnepalmenstrand-portrait_new.jpg&action=purge&forceprofile=true

Purge is seems to be fast for now, though I thought it was slow earlier.

Purge is okay - not fast - but about 10 seconds.

(In reply to comment #4)

I just tried upload, reupload, and revert on testwiki and it's fairly fast.

on testwiki?! Sorry, that is out of relevance for me. ;-) But it seems to have returned to normal at Commons again. We can close this... unresolved...

Yeah, it's not really 'fast' so much as 'not terrible' ;)

From https://en.wikipedia.org/w/index.php?diff=479777598

Currently hitting the delete key on the image does not return anything for over 20-30 seconds, often resulting in "wikimedia foundation error..." The tech details are oft similar to "Request: POST http://en.wikipedia.org/w/index.php?title=File:Logo_FEPLP.jpg&action=delete, from 208.80.154.8 via cp1007.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.140 (10.64.0.140) Error: ERR_READ_TIMEOUT, errno [No Error] at Fri, 02 Mar 2012 06:09:30 GMT". The 208.80 is familiar, but the 10.64 isn't (Durham, NC?). A ping from me to 208.80 is consistently 99ms, so it doesn't seem to be a timeout from this end.

Is this in faster after the changes for bug 35047?

(In reply to comment #8)

Is this in faster after the changes for bug 35047?

Pinging this bug for more info.

nurtsch-cv60 wrote:

The problems are still persisting, when deleting a page I'm frequently returned to the '&action=delete' page and have to confirm the action one more time (or I'm just getting the error mentioned by Mark above). Other pages such as my watchlist are loading fairly slow.

I'm experiencing these problems *just* on Commons, please note that I also access from *Germany*.

(updating target milestone from 1.20wmf1 to 1.20wmf3, 1.20wmf has been deployed already)

I just went through a few copyright violations on Wikimedia Commons. Files with only 1 entry in the file history (the original upload), and about 5 page revisions.

Submitting the delete action took almost 20 seconds for any of them to respond (the POST request to get a html response).

I noticed in the recent changes feed in another tab that the action itself was processed long before the server responded with the rest.

I just deleted one file. One image revision, two history entries, no usage.
"Served by mw4 in 18.267 secs"

Normal (uncached) page views take less than 1 second...

This hasn't fallen off of our radar, it's just that we haven't communicated very well about status. Ben, Aaron and I discussed this one last week, and we have some ideas to try. I believe Ben tried something last week that unfortunately didn't help. We have some hardware on order that we think is the correct fix, but we've also got more ideas on short-term solutions. Sorry for the inconvenience here.

Have you profiled it? What step is being slow? swift deletion?

The part that's slow is the container listing, which seems to be slow because the listing is getting paged out of memory. With the SSDs that Ben is installing, the container listings should load much faster off of disk. Assigning this bug to Ben, since this is mostly out of Aaron's hands now.

Container listing? You mean listing the contents of a folder?
We don't have so much files... It seems too slow even if it needs to read it from disk.

bhartshorne wrote:

Graph showing response time for swift container listings

Attached:

Screen_Shot_2012-07-10_at_11.13.41_AM.png (463×812 px, 58 KB)

bhartshorne wrote:

We have moved container listings to SSDs within the swift backend storage nodes. Over the past week (moving the containers little by little) 50th percentile response time dropped from ~2s to ~0.12s and 90th % dropped from 30s to 0.9s. The container listing times were the largest contributor to sporadic long delete times; this change should eliminate that delay. I've attached the graph showing this change (from graphite).