Page MenuHomePhabricator

Timeout when updating complex pages
Open, HighPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1399/
Reported by: malafaya
Created on: 2012-01-17 00:22:50
Subject: Updating complex pages
Original description:
When updating complex pages, it's common to get a Timeout, because the Wikimedia server does not process and return the page within the expected time. In suchs cases \(when a timeout exception is thrown\), my suggestion si that pywikipedia should try to fetch the page again and check if there are any differences against the new page to be saved. If not, then it should proceed and not block indefinitely in such pages.


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1399

Details

Reference
bz55219

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:27 AM
bzimport set Reference to bz55219.
bzimport added a subscriber: Unknown Object (????).

This is the way the bot works. It trys to put the page for several times which is given by maxretries in the \(user\_\)config.py. Edit conflicts are detected \(by the mw api\) except you are using your bot account for multiple edits on the same page in the same time.

Hmmm, I'm not sure you understood. I'm not updating the page more than once simultaneoulsy. It's just one bot run. As the page is a complicated one, the server does not respond on time \(you can try \[\[Europa\]\] at pt.wiktionary\). The bot then tries again, but obviously the same happens. The difference is that the page has already been updated in the first try, even if the server has not responded. In operations such as replace.py, where it's common to edit long pages, you get in a long loop.

I'm talking about this error:

Updating page \[\[Sri Lanka\]\] via API
HTTPError: 504 Gateway Time-out

The page to be updated is quite big so the server does not reply on time.
1\) Is there a way to increase the timeout? I believe this is controlled by the server, not the HTTP client...
2\) The page was updated on the first try but as the page is not refreshed between retries, the bot doesn't know and will try to update it "forever"

  • Bug 56884 has been marked as a duplicate of this bug. ***

Checking this morning with Faebot, 1.6% of get/put transactions have failed out of a sample of more than 1,000. These were small size category changes rather than file uploads or large page edits. I believe most failures have been on putting pages rather than getting them, but I have seen getting pages causing this failure.

As everyone appears affected, not just API users, I have asked for feedback at the Village pump (http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&diff=prev&oldid=109634734).

I am not convinced that this is a pywikipediabot specific problem, it does not relate to any changes in pywikipediabot which has never before had this problem with this frequency, so the bug report (1399) above may well be a dead end.

503 is also happening:

Sleeping for 7.9 seconds, 2013-11-13 11:20:55

Updating page \[\[File:Русский энциклопедический словарь Березина 4.2 077.jpg\]\] via API

Result: 503 Service Unavailable

Traceback (most recent call last):
(hidden)

File "(hidden)/pywikipedia/wikipedia.py", line 2242, in put
  sysop=sysop, botflag=botflag, maxTries=maxTries)
File "(hidden)/pywikipedia/wikipedia.py", line 2339, in _putPage
  back_response=True)
File "(hidden)/pywikipedia/pywikibot/support.py", line 121, in wrapper
  return method(*__args, **__kw)
File "(hidden)/pywikipedia/query.py", line 138, in GetData
  site.cookies(sysop=sysop))
File "(hidden)/pywikipedia/wikipedia.py", line 6977, in postForm
  cookies=cookies)
File "(hidden)/pywikipedia/wikipedia.py", line 7021, in postData
  f = MyURLopener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
  response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
  'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
  return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
  result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
  raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

urllib2.HTTPError: HTTP Error 503: Service Unavailable

humayunmirza88 wrote:

The problem is still ongoing

warnckew wrote:

I'd like to second this. When saving large complex pages, I frequently get 503 responses. As Daniel Schwen notes in bug 56884, it would be great to be able to tell Pywikibot to _not_ retry and instead manually check if the edit went through.

warnckew wrote:

I patched my local copy of Pywikibot core, adding a max_retries parameter to editpage() to only allow it to attempt an edit once. No changes to other files appear necessary since Page.save() passes on any additional parameters. Should I propose that as a patch? If so, what format is preferred?

If you could upload it to gerrit (either via git directly, or via the patch uploader at https://tools.wmflabs.org/gerrit-patch-uploader/ ), that would be really nice.

I'm a bit confused however, as data.api.Request seems to get max_retries from the config file. Does it get passed another value of max_retries somewhere? I can't find where that would be...

warnckew wrote:

data.api.Request does kwargs.pop(), so if it gets instantiated with a max_retries parameter it will use that value, otherwise it reads the config parameter.

In my case I found that I can just set pywikibot.config.max_retries instead of passing it as a parameter to Page.save(). Arguably nicer than passing a parameter around, which requires some way of handling a default value. Sorry about not figuring that out earlier.

I'm still a bit confused by Daniel's comment:

Now pywikipediabot tries again by itself an apparently infinite amount of times
Despite having set max_retries to 2 in my user-config.py

but this does seem to work for me (at least: setting max_retries in user-config.py sets pywikibot.config.max_retries). Strange.

daniel wrote:

Ahhrgh! I changed the max_retries setting in ./user-config.py but core reads ~/.pywikibot/user-config.py

Sorry. Will try again with the new setting.

On the wikimedia side see also bug 57026. (Not a dupe since Pywikipedia should also handle these situations gracefully.)

  • Bug 55162 has been marked as a duplicate of this bug. ***
  • Bug 55179 has been marked as a duplicate of this bug. ***