Page MenuHomePhabricator

Write attempts to wiki API from toolserver timeout
Closed, ResolvedPublic

Description

Author: dr.trigon

Description:
Hello everybody!

Since 7th (may be 6th) of June my bot is not able to write pages on
dewiki larger than ~130KB. The bot code was not changed but now occurs
MaxTriesExceeded errors with <urlopen error timed out> all the time.
Before it was able to write pages >600KB. It looks like the uplink
has slowed down (bottleneck?) or the timeout was reduced... I am
using pywikipedia framework and know of at least 1 other user having
the same issue. Any idea what might be the problem here?

Thanks a lot and greetings!
DrTrigon

ps: This was originally reported to http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005013.html


Version: wmf-deployment
Severity: major
OS: Solaris

Details

Reference
bz37536

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:30 AM
bzimport set Reference to bz37536.
bzimport added a subscriber: Unknown Object (MLST).

Simple test script, based on http://nl.wikipedia.org/wiki/Lijst_van_alle_Radio_2_Top_2000's (1,189,202 bytes) . These were run from willow.toolserver.org.

/* ----------------------

import wikipedia
import datetime

p_get = wikipedia.Page('nl', "Lijst_van_alle_Radio_2_Top_2000's")
p_put = wikipedia.Page('nl', 'Gebruiker:Valhallasw/lange pagina')
text = p_get.get()
print len(text)
text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
p_put.put(text)

  • */

Under IPv6 (default), the output is the following:

/* --------------------
(...snip...)

print len(text)

1189202

text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
p_put.put(text)

Sleeping for 3.8 seconds, 2012-06-13 21:50:10
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
<urlopen error timed out>

WARNING: Could not open 'http://nl.wikipedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... -------------------- */

Under IPv4 (with the patch shown below), the output is the following:

/* --------------------
(...snip...)

print len(text)

1189202

text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
p_put.put(text)

Sleeping for 4.0 seconds, 2012-06-13 21:48:27
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
(302, 'OK', {u'pageid': 2846006, u'title': u'Gebruiker:Valhallasw/lange pagina', u'newtimestamp': u'2012-06-13T21:49:21Z', u'result': u'Success', u'oldrevid': 31455180, u'newrevid': 31455194})

  • */

The hack to test this is the following:

Index: families/wikipedia_family.py

  • families/wikipedia_family.py (revision 10117)

+++ families/wikipedia_family.py (working copy)
@@ -44,7 +44,7 @@

if family.config.SSL_connection:
    self.langs = dict([(lang, None) for lang in self.languages_by_size])
else:
  • self.langs = dict([(lang, '%s.wikipedia.org' % lang) for lang in self.languages_by_size])

+ self.langs = dict([(lang, '91.198.174.225') for lang in self.languages_by_size])

  1. Override defaults self.namespaces[1]['ja'] = [u'ノート', u'トーク']

Index: wikipedia.py

  • wikipedia.py (revision 10117)

+++ wikipedia.py (working copy)
@@ -5437,6 +5437,7 @@

'User-agent': useragent,
'Content-Length': str(len(data)),
'Content-type':contentType,

+ 'Host': 'nl.wikipedia.org',

}
if cookies:
    headers['Cookie'] = cookies

Index: pywikibot/comms/http.py

  • pywikibot/comms/http.py (revision 10117)

+++ pywikibot/comms/http.py (working copy)
@@ -54,6 +54,7 @@

headers = {
    'User-agent': useragent,

+ 'Host': 'nl.wikipedia.org',

#'Accept-Language': config.mylang,
#'Accept-Charset': config.textfile_encoding,
#'Keep-Alive': '115',

Note, however, that this could also be a bug in the python http stack...

These issues sound a lot like MTU (network) problems. I've just made some (manual) changes on the servers involved. Could you please check if the situation is different now?

Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now comparable:

(IPv6)
Time to get page: 1.133143 s
Time to put page: 63.166557 s

(IPv4)
Time to get page: 1.369060 s
Time to put page: 57.909367 s

Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at least it's consistent for the two, and there is no timeout.

dr.trigon wrote:

(In reply to comment #2)

These issues sound a lot like MTU (network) problems. I've just made some
(manual) changes on the servers involved. Could you please check if the
situation is different now?

This seems to solve the get issues, my bot did not complain anymore since moring of 15th! Thanks so far! (I did not check what max. page size is limitting now)

dr.trigon wrote:

(In reply to comment #3)

Yes, this has improved the situation. The behaviour over IPv4 and IPv6 are now
comparable:

(IPv6)
Time to get page: 1.133143 s
Time to put page: 63.166557 s

(IPv4)
Time to get page: 1.369060 s
Time to put page: 57.909367 s

Although the transfer rate (1.1MB in 60s = 19kB/s) is not very spectacular - at
least it's consistent for the two, and there is no timeout.

Indeed looks good (or better ;) now!

Could it be the case that the get times increased too, e.g. compared to May?