Page MenuHomePhabricator

en.wikipedia.org load takes 12+ seconds in specific Source IP range, as one specific call to bits.wikimedia.org/.../load.php times out
Closed, ResolvedPublic

Details

Reference
bz42653

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:13 AM
bzimport set Reference to bz42653.
bzimport added a subscriber: Unknown Object (MLST).

Works for me - is this still an issue, and on which continent are you based? There were some load issues but they are mostly fixed now.

bugzilla.wikimedia.org.76374 wrote:

Works from 81.169.0.0/16, but *NOT* from 78.52.0.0/16 (Europe/Berlin/dynamic IP for DSL endpoint)

(In reply to comment #2)

Works from 81.169.0.0/16, but *NOT* from 78.52.0.0/16 (Europe/Berlin/dynamic
IP
for DSL endpoint)

Is the routing to the bits servers ok from both ip address ranges?

I'm wondering if this is a dup of bug 41130?

bugzilla.wikimedia.org.76374 wrote:

This appears to be a *script bug* in http://bits.wikimedia.org/en.wikipedia.org/load.php because the server itself shows no problems at all.

As we've changed data centers in late January and the related setup, could you please test if this is still a problem?

bugzilla.wikimedia.org.76374 wrote:

Yes, it is still a problem. See the screenshot.

bugzilla.wikimedia.org.76374 wrote:

screenshot of timeout in firebug

Attached:

wiki.en.test.png (986×1 px, 240 KB)

Just realizing that this question still stands:

Is the routing to the bits servers ok from both ip address ranges?

bugzilla.wikimedia.org.76374 wrote:

Look at the screenshot. Just one out of multiple calls to the same script fails. So the routing is definitely ok.

bugzilla.wikimedia.org.76374 wrote:

Andre: we have already determined that it is a problem related to the source ip range.

Bug 39322 came to my mind, so I'd still appreciate the output of the following commands

traceroute bits.wikimedia.org

curl -i http://bits.wikimedia.org/

This seems to something like my ongoing Wikipedia/Wikimedia problem. In last 12 hours most of queries to http://bits.wikimedia.org from my home IP are failed, but not all. However https://bits.wikimedia.org seems to work normally. I also asked in community portal of fiwiki if wikipedia has been slow and answer was it has been worked normally.

Commands asked:

  • CLIP ----

curl -i http://bits.wikimedia.org/
curl: (7) Failed to connect to 2620:0:862:ed1a:🅰️ Network is unreachable

  • CLIP ----

curl -i https://bits.wikimedia.org/
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Wed, 08 May 2013 05:45:54 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Last-Modified: Thu, 12 Aug 2010 16:12:20 GMT
ETag: "b2-48da2a1772100"
X-Varnish: 693529009
Via: 1.1 varnish
Accept-Ranges: bytes
X-Varnish: 1488468153
Age: 0
Via: 1.1 varnish
X-Cache: strontium miss (0), cp3021 miss (0)

<html>
<head><title>bits and pieces</title>

		<meta http-equiv="refresh" content="1;url=http://www.wikimedia.org/" />

</head>
<body>
bits and pieces live here!
</body>
</html>

  • CLIP ----

curl -i http://bits.wikimedia.org/geoiplookup
HTTP/1.1 200 geoiplookup
Server: Varnish
Content-Type: text/javascript
Content-Length: 111
Last-Modified: Wed, 08 May 2013 05:58:57 GMT
Cache-Control: private, max-age=86400, s-maxage=0
Accept-Ranges: bytes
Date: Wed, 08 May 2013 05:58:57 GMT
X-Varnish: 2987799549
Age: 0
Via: 1.1 varnish
Connection: close
X-Cache: cp3022 miss (0)

Geo = {"city":"REMOVED","country":"FI","lat":"REMOVED","lon":"REMOVED","IP":"91.153.84.229","netmask":"24"}

  • CLIP ----

traceroute bits.wikimedia.org
traceroute to bits.wikimedia.org (91.198.174.233), 30 hops max, 60 byte packets
1 kotiboksi.Elisa (192.168.100.1) 0.662 ms 0.861 ms 1.079 ms
2 tkueur1.fi.elisa.net (91.153.48.1) 14.876 ms 15.553 ms 15.781 ms
3 ge1-0-0.tkubra-p1.fi.elisa.net (139.97.9.13) 13.879 ms 15.328 ms 16.106 ms
4 so7-2-0.esptnl-p1.fi.elisa.net (139.97.6.1) 19.842 ms 21.924 ms 22.199 ms
5 ip-rr.hkika234.fi.elisa.net (139.97.0.162) 22.454 ms 23.130 ms 23.885 ms
6 213.192.184.109 (213.192.184.109) 25.150 ms 17.379 ms 18.065 ms
7 as0-0.bbr1.ams1.nl.eunetip.net (213.192.191.226) 54.601 ms 50.913 ms 47.484 ms
8 * * *
9 * * *
10 * * *
....

Discussion in community portal:
http://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_%28tekniikka%29#Wikipedian_hidastelu

And little bit more. In my case it is maybe just only DNS updating problem. In my home ip bits.wikimedia.org resolves to 91.198.174.233, 2620:0:862:ed1a::a, bits.esams.wikimedia.org.

In .kapsi.fi where everything is working fine bits.wikimedia.org resolves to 2620:0:861:ed1a::a, 208.80.154.234, bits-lb.eqiad.wikimedia.org

Yeah, bits was having issues earlier today. https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&oldid=69483 and http://status.wikimedia.org/8777/156486/Static-assets-(CSS/JS) seem to confirm this. I think separate issues are getting mixed in to a single bug, though.

Correct, comment 14 - comment 17 are a different issue.

Every time I look (e.g. on webpagetest.org), I always find one or two "failing" load.php calls however I test it. Usually it's the CentralNotice taking too much, I don't see what can be done in this component.

(In reply to Macavity from comment #19)

This happens reliably on the IP range my ISP uses (looks like
86.141.0.0/16), when jqueryMsg is in the query string:

This times out:

Works for me as I get a cached response /* cache key: enwiki:resourceloader:filter:minify-js:7:519abda9a1640250a7c561ccfd0f22f8 */
I don't see ongoing CentralNotice campaigns for UK.

We no longer use bits to serve this content. So this can probably be closed now ?

We no longer use bits to serve this content. So this can probably be closed now ?

What makes you think the issue was caused by the fact of being served from that domain?

Aklapper changed the task status from Open to Stalled.Jun 30 2015, 2:48 PM
Aklapper lowered the priority of this task from Medium to Low.

Does this problem still happen nowadays?

Does this problem still happen nowadays?

Who are you asking? The reporter is one of those made into ghosts by the bugzilla murder.

@Nemo_bis: I'm asking anybody in the CC list as someone might have faced the same problem in that IP range.
If I had asked anybody specifically, I would have prefixed a name. :)

Krinkle claimed this task.
Krinkle edited projects, added Performance-Team; removed Performance Issue.
Krinkle subscribed.

Closing for now as most likely the original cause can no longer cause a page to take 12 seconds to load given the following major changes that have happened since then.

Every time I look (e.g. on webpagetest.org), I always find one or two "failing" load.php calls however I test it. Usually it's the CentralNotice taking too much

From looking at the original description I tend to think it was a CentralNotice issue indeed. On the other hand, bannercontroller stuff was refactored in the meanwhile.