Page MenuHomePhabricator

Clearing large (39k+) watchlist @ Special:EditWatchlist/clear fails, HTTP error 503
Closed, ResolvedPublic

Description

Author: ab.zachaeus

Description:
Background: Grow tired of not being able to edit massive (41k+) watchlist by normal means (clicky-clicky or raw) (see bug T41510). Manually whittle watchlist down to 39k+ entries. Dig out https://commons.wikimedia.org/w/api.php?action=query&generator=watchlistraw&gwrlimit=max and hand-craft a list of watchlist entries to be kept after a watchlist purge.

The act: go to https://commons.wikimedia.org/wiki/Special:EditWatchlist/clear and click the button. Watch nothing happen. Get following error message:

Request: POST http://commons.wikimedia.org/wiki/Special:EditWatchlist/clear, from 10.64.0.104 via cp1055 cp1055 ([10.64.32.107]:3128), Varnish XID 3575162694
Forwarded for: 84.250.106.149, 91.198.174.103, 208.80.154.134, 10.64.0.104
Error: 503, Service Unavailable at Thu, 05 Jun 2014 19:07:13 GMT

Version: 1.24rc
Severity: normal
See Also:
T41510: Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager)
T66074: Watchlist for bot too long so /raw triggers an error

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:18 AM
bzimport set Reference to bz66212.
bzimport added a subscriber: Unknown Object (MLST).

If you try it again, does it happen consistently?

A /clear is supposed to be more lightweight, but still, the problem may be in the database server, because it needs to remove 41k+ rows of a large table on a single transaction, although a database server error should normally trigger the usual WSOD ("this site is experiencing technical difficulties")

ab.zachaeus wrote:

(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #1)

If you try it again, does it happen consistently?

A /clear is supposed to be more lightweight, but still, the problem may be
in the database server, because it needs to remove 41k+ rows of a large
table on a single transaction, although a database server error should
normally trigger the usual WSOD ("this site is experiencing technical
difficulties")

Yes, I've tried about five times now, about three or four hours apart.

ab.zachaeus wrote:

Would it be possible to make a .js action thingy (to use on personal .js files) for the /clear page that removes, say, the first 2500 lines of the list at a time, until the bug gets a proper fix?

What exactly is the difference to bug 39510?

ab.zachaeus wrote:

@Andre Klapper, #39510 is about editing the watchlist, this is about clearing it. Those are two different tools. (Three actually but it seems original reporter for #39510 didn't specify which of the editing tools was failing for them, if not both.)

On a semi-related note, I'm on my way to do an experiment to see what thresholds I can get for edit tools (and possibly the clear tool) failing. I'll be back soon.

(In reply to UP from comment #3)

Would it be possible to make a .js action thingy (to use on personal .js
files) for the /clear page that removes, say, the first 2500 lines of the
list at a time, until the bug gets a proper fix?

You can use this script to unwatch items of your watchlist in batch:

https://www.mediawiki.org/wiki/User:Ciencia_Al_Poder/unwatchapi.js

Just preview that script in your personal JS. The progress is written in the browser's debug console

Sounds like we might need to break large very deletions into batches or something. Bah.

Note that /clear *does not clear, but just responds with /raw*!

Jesús' workaround is one option; I have provided a pywikibot-based option in https://bugzilla.wikimedia.org/show_bug.cgi?id=64074 .

...or rather, that was what I remembered from when I looked at it back then. There seems to be a button now :-)

  • Bug 68559 has been marked as a duplicate of this bug. ***

ab.zachaeus wrote:

I've been whittling my watchlist down (originally +39K), just broke 36K and raw editing works fine now. I'll assume the list could be emptied as well. Normal edit still fails.

Aklapper lowered the priority of this task from High to Medium.Apr 13 2015, 11:54 AM
Aklapper subscribed.

Lowering priority from high to normal to reflect reality (no progress here for months; feel free to increase once an assignee is set); workaround available in T68212#698926.

Would like to mention that somewhere between 33,5K and 35K entries, Special:EditWatchlist began to load and work, if very slowly. Selecting more than 10-20 entries causes increasing lag.

Lowering priority from high to normal to reflect reality (no progress here for months; feel free to increase once an assignee is set); workaround available in T68212#698926.

This only works for clearing the watchlist but does not allow selective editing, and needs to be combined with a manually reduced API raw output feed (to enter in the raw edit).

Would like to mention that somewhere between 33,5K and 35K entries, Special:EditWatchlist began to load and work, if very slowly. Selecting more than 10-20 entries causes increasing lag.

Lowering priority from high to normal to reflect reality (no progress here for months; feel free to increase once an assignee is set); workaround available in T68212#698926.

This only works for clearing the watchlist but does not allow selective editing, and needs to be combined with a manually reduced API raw output feed (to enter in the raw edit).

I'm wondering what kind of "selective editing" do you expect to perform on a 39K list

I'm wondering what kind of "selective editing" do you expect to perform on a 39K list

  • Removal of most non-existing pages, categories, and files.
  • Removal of large sets with uniform file name type, when checking one confirms I'm not interested in watching any of them (for instance, a couple of dozen files names to the effect of Rosa spec. cultivar 'Sommerwind' at Plozhofener Rosegarden in Weldham 0222.JPG)
  • Removal of all files containing a certain string (Plozhofener Zoo)
  • Removal of all IP user talk and user pages
  • Removal of all RfD pages
  • Removal of most RDs

Perhaps this is something which we should use the job queue for. Downside would be it might sometimes take several hours for the jobs to be executed but I think users can accept longer waits for performing heavy operations like this.

Change 277436 had a related patch set uploaded (by Addshore):
Use WatchedItemStore in SpecialEditWatchlist

https://gerrit.wikimedia.org/r/277436

Change 277436 merged by jenkins-bot:
[mediawiki/core@master] Use WatchedItemStore clearing in SpecialEditWatchlist

https://gerrit.wikimedia.org/r/277436

So the merging of the above patch and closing of T132564 should see watchlist clearing using the job queue enabled with the next train.

So, I'm going to mark this as closed as the patches have been deployed so clearing large watchlists should now be possible.