Page MenuHomePhabricator

Purging does not work on deployment-prep / beta labs
Closed, ResolvedPublic

Description

I've enabled ULS on all deployment-prep projects, but ?action=purge and anything else I can think of is ineffective at getting the ULS trigger to show up.

Tried on:


Version: unspecified
Severity: major

Details

Reference
bz48203

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:17 AM
bzimport set Reference to bz48203.

Just setting priority (to High) as this is a pretty important one for ULS testing.

I do not really have anytime right now to have a look at it. If one wants to investigate:

  • text cache is a Squid on deployment-squid instance
  • logs are in /var/log/squid (need to be a project sysadmin to look at them)
  • Mediawiki logs written on a NFS server. They can be read by connecting on deployment-bastion and looking under /data/project/logs.

From there one could look at the debug log or use eval.php via: mwscript eval.php --wiki=enwiki

The SquidPurgeClient class has a log() method which uses the 'squid' logging group. Might want to enable logging for that debug group on beta.

Related URL: https://gerrit.wikimedia.org/r/66073 (Gerrit Change I25e7e77c8b3d3e5dbf8ce4bc9f6bd8ca8aa22d1c)

This issue seems to be rapidly gaining attention after having had a high priority and no action for almost a month. Involved are WMF QA and LangEng

Tried again tonight:

$ mwscript purgeList.php --wiki=enwiki
http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
Purging 1 urls
Done!
$

On the squid size, the /var/log/squid/squid.log has:

2013/06/28 21:29:32| Parser: retval 1: from 0->31: method 0->4; url 6->20; version 22->30 (1/1)
2013/06/28 21:29:32| The request PURGE http://deployment.wikimedia.beta.wmflabs.org:80/wiki/Main_Page is ALLOWED, because it matched 'web'
2013/06/28 21:29:32| storeLocateVary: accept-encoding=, cookie=
2013/06/28 21:29:32| storeLocateVaryRead: MATCH! 7774E3A88F0B8BB19DE820258F110479 (null)
2013/06/28 21:29:32| clientCacheHit: Vary detected!
2013/06/28 21:29:32| clientProcessVary: HIT key=7774E3A88F0B8BB19DE820258F110479 etag=NONE
2013/06/28 21:29:32| The reply for PURGE http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page is ALLOWED, because it matched 'all'
2013/06/28 21:29:32| storeLocateVaryCallback: DONE

So it is definitely receiving the PURGE request

I did curl requests on the MainPage using :

  1. curl -I
  2. curl -I -H 'Accept-Encoding:gzip,deflate'

Then did an edit and tried again both curl requests. The one without accept-encoding got properly purged, the compressed one did not get purged.

Will attach full headers

Created attachment 12710
curl requests with and without Accept-Encoding: gzip, deflate

Attached:

I have migrated the text cache from squid to varnish. Mark Bergsma did the puppet changes a while back, just add to change the public IP for *.beta.wmflabs.org to point to the new instance deployment-cache-text1.pmtpa.wmflabs.

Purge is definitely not working there. There is an access list that only allow purges from 127.0.0.1 whereas in beta they will be sent by application servers.

I got rid of the old $wgSquidServers on beta (https://gerrit.wikimedia.org/r/71348) and replaced that with HTCP multicast routing feature: purge requests are sent to a host depending on some regex rules. The change is https://gerrit.wikimedia.org/r/71345

Still have to send the PURGE requests to both text and mobile caches :( That is not supported by MediaWiki right now.

That needs $wgHTCPMulticastRouting to be able to send purges to several IP / groups. https://gerrit.wikimedia.org/r/#/c/71597/

I found out last week that the resource loader url (load.php) was pointing to the text cache ( en.wikipedia.beta.wmflabs.org/w/load.php ) instead of bits ( bits.beta.wmflabs.org/en.wikipedia.beta.wmflabs.org/w/load.php ).

When resourceloader cache is unvalidated, there is no purge sent to the text cache so we had an old Javascript version being delivered. That most probably caused the issue reported there.

I have deployed a change on beta a few minutes ago that points load.php to bits.beta.wmflabs.org : https://gerrit.wikimedia.org/r/#/c/70322/ . I guess that solve it.

So at least we have proper cache invalidation for bits material :-]

This is the never ending mess. deployment-cache-text1 has a vhtcpd purge daemon running with the following options:

cat /etc/default/vhtcpd
DAEMON_OPTS="-F -m 239.128.0.112 -c 127.0.0.1:80 -c 127.0.0.1:3128"

The -m is a multicast address to subscribe to. Since beta does not have multicast, we need vhtcpd to handle request send over unicast. The daemon does listen on udp:

netstat -ulnp|grep vhtcpd

udp 0 0 0.0.0.0:4827 0.0.0.0:* 21339/vhtcpd

Filling another bug.

vhtcpd was not an issue (bug 51874).

(In reply to comment #12)

That needs $wgHTCPMulticastRouting to be able to send purges to several IP /
groups. https://gerrit.wikimedia.org/r/#/c/71597/

Merged.

The configuration https://gerrit.wikimedia.org/r/#/c/76918/ does send purge requests to both text and mobile caches. Seems to fix the remaining issue we had.