Page MenuHomePhabricator

core hangs on killed http(s) connection
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1657/
Reported by: valhallasw
Created on: 2013-08-22 15:14:10.237000
Subject: core hangs on killed http(s) connection
Original description:
Steps to reproduce:

p = pywikibot.Page(pywikibot.getSite('nl', 'wikipedia'), 'Wikipedia')
p.get()

now kill the connection, e.g. using tcpview in windows

p = pywikibot.Page(pywikibot.getSite('nl', 'wikipedia'), 'Wikipedia')
p.get()

hangs

Expected result:
pywikibot reconnects and executes the .get()

Actual result:
complete hang (deadlock?)


Version: core-(2.0)
Severity: major
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1657

Details

Reference
bz55127

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:21 AM
bzimport set Reference to bz55127.
bzimport added a subscriber: Unknown Object (????).

Stack trace during hang:

  • STACKTRACE - START ***

ThreadID: 140208457426688

File: "/usr/lib/python2.7/threading.py", line 524, in __bootstrap

self.__bootstrap_inner()

File: "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner

self.run()

File: "connectionkilltest.py", line 16, in run

for filename, lineno, name, line in traceback.extract_stack(stack):

ThreadID: 140208472311552

File: "/usr/lib/python2.7/threading.py", line 524, in __bootstrap

self.__bootstrap_inner()

File: "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner

self.run()

File: "pywikibot/comms/threadedhttp.py", line 343, in run

item.data = self.http.request(*item.args, **item.kwargs)

File: "pywikibot/comms/threadedhttp.py", line 219, in request

max_redirects, connection_type

File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1444, in request

(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)

File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1196, in _request

(response, content) = self._conn_request(conn, request_uri, method, body, headers)

File: "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1166, in _conn_request

response = conn.getresponse()

File: "/usr/lib/python2.7/httplib.py", line 1030, in getresponse

response.begin()

File: "/usr/lib/python2.7/httplib.py", line 407, in begin

version, status, reason = self._read_status()

File: "/usr/lib/python2.7/httplib.py", line 365, in _read_status

line = self.fp.readline()

File: "/usr/lib/python2.7/socket.py", line 430, in readline

data = recv(1)

File: "/usr/lib/python2.7/ssl.py", line 241, in recv

return self.read(buflen)

File: "/usr/lib/python2.7/ssl.py", line 160, in read

return self._sslobj.read(len)

ThreadID: 140208510514944

File: "pwb.py", line 103, in <module>

run_python_file(fn, args)

File: "pwb.py", line 49, in run_python_file

exec compile(source, filename, "exec") in main_mod.__dict__

File: "connectionkilltest.py", line 36, in <module>

p.put_async('test edit')

File: "pywikibot/page.py", line 887, in put_async

async=True, callback=callback, **kwargs)

File: "pywikibot/page.py", line 872, in put

async=async, callback=callback, **kwargs)

File: "pywikibot/page.py", line 782, in save

if not force and not self.botMayEdit():

File: "pywikibot/page.py", line 710, in botMayEdit

username = self.site.user()

File: "pywikibot/site.py", line 206, in user

if self.logged_in(True):

File: "pywikibot/site.py", line 842, in logged_in

if sysop and 'sysop' not in self.userinfo['groups']:

File: "pywikibot/site.py", line 910, in getuserinfo

uidata = uirequest.submit()

File: "pywikibot/data/api.py", line 284, in submit

body=paramstring)

File: "pywikibot/comms/http.py", line 118, in request

request.lock.acquire()

File: "/usr/lib/python2.7/threading.py", line 323, in acquire

self.__cond.wait()

File: "/usr/lib/python2.7/threading.py", line 243, in wait

waiter.acquire()
  • STACKTRACE - END ***

So: probably not a deadlock, just the ssl library hanging.

Works without any issues in windows - could be either related to linux (v. windows) or killing the connection locally or upstream (the original was run from a VM). Reducing priority.

  • Group: confirmed --> rewrite

I do not know whether this is similar this bug reported above and am unable to investigate the reason of that bug. I only found the last output where the bot checks a wiki page every 15 seconds:

defendant: Textkontrolle accuser: Neozoon time: 2013 Okt 20 21:26

05:20:37:

defendant: Textkontrolle accuser: Neozoon time: 2013 Okt 20 21:26

05:20:54:

...
a lot of lines deleted
...

05:42:19:
05:42:33:

...

05:53:41:
05:53:55:
05:54:10:
05:54:24:
05:54:38:
05:54:53:
05:55:07:
05:55:21:

now it stops, CTL-C does not work, I have to kill the process via task manager.
I checked my router and found it had an autoreset of the internet connection 5 minutes before 6 a.m. this morning and I guess this was also yesterday. On the other hand the same bot worked well the three days before.

Additional comment:
pwb.py version

Pywikibot: pywikibot/__init__.py (r-1 (unknown), ???????, 2013/09/21, 15:53:03,
OUTDATED)
Release version: 2.0b1
Python: 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
unicode test: ok

Yes my bug depends on a killed internet connection while my router was off for 20 seconds!

Here are the last log entries before the bot hanged:

2013-11-20 05:50:37 api.py, 685 in iter: DEBUG PropertyGenerator received [u'7165976']; limit=-1
2013-11-20 05:50:51 vm-auto-erl-neu.py, 79 in output: INFO >> 05:50:51:
2013-11-20 05:50:51 threadedhttp.py, 97 in pop_connection: DEBUG Retrieved connection from 'https:de.wikipedia.org' pool.
2013-11-20 05:50:51 threadedhttp.py, 215 in request: DEBUG ('https://de.wikipedia.org/w/api.php', 'POST', 'maxlag=5000&format=json&bkprop=user%7Cby%7Ctimestamp%7Cexpiry%7Creason&list=blocks&bkusers=&meta=userinfo&action=query&bkend=20131120005959&uiprop=blockinfo%7Chasmsg', {'cookie': 'centralauth_Token=8c1df3ec61f7a92677f4c0b561c30585; centralauth_User=Xqbot; forceHTTPS=1; forceHTTPS=true; dewikiUserName=Xqbot; dewikiSession=3144518b40881659814792e3c0fc34cc; dewikiUserID=627628', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'pwb/r-1 (unknown) Pywikipediabot/2.0'}, 5, None)
2013-11-20 05:50:51 api.py, 308 in submit: DEBUG API response received:
{"query":{"blocks":[],"userinfo":{"id":627628,"name":"Xqbot"}}}
2013-11-20 05:50:51 api.py, 611 in update_limit: DEBUG PropertyGenerator: Set query_limit to 5000.
2013-11-20 05:50:51 threadedhttp.py, 97 in pop_connection: DEBUG Retrieved connection from 'https:de.wikipedia.org' pool.
2013-11-20 05:50:51 threadedhttp.py, 215 in request: DEBUG ('https://de.wikipedia.org/w/api.php', 'POST', 'maxlag=5000&format=json&rvprop=ids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Ccontent&prop=info%7Crevisions&titles=Wikipedia%3AVandalismusmeldung&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo%7Chasmsg', {'cookie': 'centralauth_Token=8c1df3ec61f7a92677f4c0b561c30585; centralauth_User=Xqbot; forceHTTPS=1; forceHTTPS=true; dewikiUserName=Xqbot; dewikiSession=3144518b40881659814792e3c0fc34cc; dewikiUserID=627628', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'pwb/r-1 (unknown) Pywikipediabot/2.0'}, 5, None)

Change 108325 had a related patch set uploaded by Ladsgroup:
[BUGFIX] fixing bug 55127

https://gerrit.wikimedia.org/r/108325

Minimal test case for the SSL issue:

import httplib2

h = httplib2.Http()
resp, content = h.request("https://en.wikipedia.org/", "GET")
raw_input("kill the connection...")
resp, content = h.request("https://en.wikipedia.org/", "GET")
print "done"

So there are three issues:

  • A connection can hang indefinitely if the connection is killed. There should be some timeout for receiving some data here.
  • The framework does not respond to ctrl-c when a thread doesn't return. This can be solved by changing pywikibot/comms/http.py to be non-blocking (e.g. in a loop with a 100ms time.sleep)
  • Threads do not receive KeyboardInterrupts (which is OK!) but this means a non-returning call will hang forever (i.e. will not be killed when python exits)

Amirs solution might be useful for the last issue, but I'm not 100% sure about that. I think I'd rather hook the atexit handler to kill all processing threads. Maybe we can also start threads as daemons, which should kill them automatically when the main thread exits...

I tried this on _windows_ but I didn't got it hanging:

Traceback (most recent call last):

File "<pyshell#4>", line 1, in <module>
  resp, content = h.request("https://en.wikipedia.org/", "GET")
File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1591, in request
  (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1333, in _request
  (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "C:\Pywikipedia\ssh\pywikibot\core\externals\httplib2\python2\httplib2\__init__.py", line 1289, in _conn_request
  response = conn.getresponse()
File "C:\Python27\lib\httplib.py", line 1030, in getresponse
  response.begin()
File "C:\Python27\lib\httplib.py", line 407, in begin
  version, status, reason = self._read_status()
File "C:\Python27\lib\httplib.py", line 365, in _read_status
  line = self.fp.readline()
File "C:\Python27\lib\socket.py", line 430, in readline
  data = recv(1)
File "C:\Python27\lib\ssl.py", line 241, in recv
  return self.read(buflen)
File "C:\Python27\lib\ssl.py", line 160, in read
  return self._sslobj.read(len)

error: [Errno 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat

Change 110853 had a related patch set uploaded by Merlijn van Deen:
(bug 55127) Implement network timeouts

https://gerrit.wikimedia.org/r/110853

This solves the basic issue, but does not fix the other two issues (the main thread will not respond to ctrl-c until the timeout is reached; also the thread will not be stopped until the timeout is raised)

Change 110866 had a related patch set uploaded by Merlijn van Deen:
(bug 55127) Allow user to cut off pending network requests at shutdown

https://gerrit.wikimedia.org/r/110866

Change 110867 had a related patch set uploaded by Merlijn van Deen:
(bug 55127) Allow user to abort during http request

https://gerrit.wikimedia.org/r/110867

And with the latest two patches, this should also be fixed.

Change 110853 merged by jenkins-bot:
(bug 55127) Implement network timeouts

https://gerrit.wikimedia.org/r/110853

Change 110866 merged by jenkins-bot:
(bug 55127) Allow user to cut off pending network requests at shutdown

https://gerrit.wikimedia.org/r/110866

Change 110867 merged by jenkins-bot:
(bug 55127) Allow user to abort during http request

https://gerrit.wikimedia.org/r/110867