Page MenuHomePhabricator

Outdated JS served for ext.echo modules
Closed, DeclinedPublic

Description

This is another incarnation of bug 42452.

Right now on MediaWiki.org this URL...
https://bits.wikimedia.org/www.mediawiki.org/load.php?debug=false&lang=en&modules=ext.echo.base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z&*
serves JS that was replaced with new code via a scap over 6 hours ago.

The correct code can be seen here:
https://bits.wikimedia.org/test.wikipedia.org/load.php?debug=false&lang=en&modules=ext.echo.base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z&*

The old code includes the string 'html-light'. The new code includes the string 'flyout'. Using debug=true always serves the newer code.

Here are all of the headers from the bad version on mediawiki.org:

Response Headers
Accept-Ranges bytes
Age 74247
Cache-Control public, max-age=2592000, s-maxage=2592000
Connection keep-alive
Content-Encoding gzip
Content-Type text/javascript; charset=utf-8
Date Wed, 09 Jan 2013 23:44:49 GMT
Expires Fri, 08 Feb 2013 03:07:22 GMT
Last-Modified Wed, 09 Jan 2013 02:57:49 GMT
Server nginx/1.1.19
Vary Accept-Encoding
Via 1.1 varnish
X-Cache strontium hit (135)
X-Content-Type-Options nosniff
X-Powered-By PHP/5.3.10-1ubuntu3.4+wmf1
X-Varnish 3394214363 3019335717
X-Vary-Options Accept-Encoding;list-contains=gzip
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Cache-Control max-age=0
Connection keep-alive
DNT 1
Host bits.wikimedia.org
If-Modified-Since Wed, 09 Jan 2013 02:57:49 GMT
Response Headers From Cache
Accept-Ranges bytes
Age 74247
Cache-Control public, max-age=2592000, s-maxage=2592000
Content-Encoding gzip
Content-Length 5764
Content-Type text/javascript; charset=utf-8
Date Wed, 09 Jan 2013 23:44:49 GMT
Expires Fri, 08 Feb 2013 03:07:22 GMT
Last-Modified Wed, 09 Jan 2013 02:57:49 GMT
Server nginx/1.1.19
Vary Accept-Encoding
Via 1.1 varnish
X-Cache strontium hit (135)
X-Content-Type-Options nosniff
X-Powered-By PHP/5.3.10-1ubuntu3.4+wmf1
X-Varnish 3394214363 3019335717
X-Vary-Options Accept-Encoding;list-contains=gzip

Here are all of the headers from the good version on test.wikipedia.org:

Response Headers
Accept-Ranges bytes
Age 0
Cache-Control public, max-age=2592000, s-maxage=2592000
Connection keep-alive
Content-Encoding gzip
Content-Type text/javascript; charset=utf-8
Date Wed, 09 Jan 2013 23:52:22 GMT
Expires Fri, 08 Feb 2013 23:52:22 GMT
Last-Modified Wed, 09 Jan 2013 08:13:35 GMT
Server nginx/1.1.19
Vary Accept-Encoding
Via 1.1 varnish
X-Cache niobium miss (0)
X-Content-Type-Options nosniff
X-Powered-By PHP/5.3.10-1ubuntu3.4+wmf1
X-Varnish 3151962049
X-Vary-Options Accept-Encoding;list-contains=gzip
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Cache-Control max-age=0
Connection keep-alive
DNT 1
Host bits.wikimedia.org
If-Modified-Since Wed, 09 Jan 2013 08:13:35 GMT
Response Headers From Cache
Accept-Ranges bytes
Age 0
Cache-Control public, max-age=2592000, s-maxage=2592000
Content-Encoding gzip
Content-Length 5800
Content-Type text/javascript; charset=utf-8
Date Wed, 09 Jan 2013 23:52:22 GMT
Expires Fri, 08 Feb 2013 23:52:22 GMT
Last-Modified Wed, 09 Jan 2013 08:13:35 GMT
Server nginx/1.1.19
Vary Accept-Encoding
Via 1.1 varnish
X-Cache niobium miss (0)
X-Content-Type-Options nosniff
X-Powered-By PHP/5.3.10-1ubuntu3.4+wmf1
X-Varnish 3151962049
X-Vary-Options Accept-Encoding;list-contains=gzip

It doesn't seem to matter which varnish servers are hit--arsenic, stronium, niobium, and palladium all give the same results consistently: the old code on mediawiki.org and the new code on test.wiki. Both mediawiki.org and test.wiki are running the same version of MediaWiki (wmf7) and should both be updated by the scap.


Version: unspecified
Severity: major

Details

Reference
bz43805

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:32 AM
bzimport set Reference to bz43805.
bzimport added a subscriber: Unknown Object (MLST).

test2.wikipedia.org also serves the correct up-to-date code.

This particular instance of the bug is no longer reproducible.

The exact urls are cached for 30 days, and can do so because they include the version timestamp. If the module has been updated it will have a newer timestamp in the startup module and the request url would be different.

If ResourceLoader (In reply to comment #0)

Right now on MediaWiki.org this URL...
https://bits.wikimedia.org/www.mediawiki.org/load.
php?debug=false&lang=en&modules=ext.echo.
base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z&*
serves JS that was replaced with new code via a scap over 6 hours ago.

The correct code can be seen here:
https://bits.wikimedia.org/test.wikipedia.org/load.
php?debug=false&lang=en&modules=ext.echo.
base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z&*

Did you manually construct one of these urls or the browser make a request to these exact urls from mediawiki.org and test.wikipedia.org? (meaning they both got the exact same version timestamp)?

Because if they did in fact both get the same timestamp from the startup module but serve different versions of of the javascript, that means the module was changed in such a way that the timestamp wasn't updated, and one of them was simply not cached during the previous version (i.e. nobody requested that module on test.wikipedia.org in the last few days).

bug 37812?

These URLs were not constructed manually. They are the automatically constructed URLs that RL uses to load the JavaScript (from actual browser requests). What specifically triggers RL to realize that the module has been updated and change the version timestamp? In this case the JS file was heavily modified and has been in constant use on both mediawiki.org and test.wiki. Does a change have to be made to the module definition itself?

This isn't bug 37812 as we didn't add or remove any files. We just changed a bunch of existing JS and CSS.

I think the timestamp was correctly updated in this case, but the contents that were served for the new URL did not include the new code until a day later.

If you look at https://bits.wikimedia.org/www.mediawiki.org/load.php?debug=false&lang=en&modules=ext.echo.base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z it has the new up-to-date code now, but that exact same URL served old code for at least 10 hours after it was first generated. I realize this doesn't make any sense, but that's what happened.

(In reply to comment #6)

https://bits.wikimedia.org/www.mediawiki.org/load.
php?debug=false&lang=en&modules=ext.echo.
base%2Cicons%2Coverlay&skin=vector&version=20130109T025749Z
it has the new up-to-date code now, but that exact same URL served old code
for at least 10 hours after it was first generated.

It being incorrectly generated at first might a race condition (it being generated from a request cache-miss during the scap).

What is more strange however is that it changed after 10 hours. They should be fixed once cached. Did you have the url purged in Varnish?

Yeah, the fact that it changed is mysterious. We didn't do anything on our end to try to purge it from Varnish. In fact we were hoping the bug would persist for long enough to have other people take a look at it.

rkaldari: Can this still be seen, or was this intermittent (plus now that we've moved to a new datacenter)?

@Andre: What do you mean by 'this'?

If you mean the specific instance described in comment 1, no, that isn't happening anymore.

If you mean the bug in general, I have no idea, but someone should investigate. I imagine the migration to git-deploy would affect this bug more than the move to Ashburn.

Three months passed. Still happening? Still 'high critical'?

Happened again today. A bug fix for TimedMediaHandler was deployed today as part of the 1.22wmf5 train. The updated JS is served if you use debug=true, but otherwise outdated JS is served.

To see the difference, go to https://en.wikipedia.org/wiki/American_Sign_Language and play one of the videos. After the video plays, the title will be linked to the Commons page if you use debug=true, but will not be linked if you're using the regular page.

The regular page loads:
https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=fullScreenApi|jquery.debouncedresize%2CembedMenu%2ChoverIntent|jquery.ui.touchPunch|mediawiki.UtilitiesTime%2CabsoluteUrl%2Cclient|mw.Api%2CEmbedPlayer%2CEmbedPlayerNative%2CEmbedTypes%2CMediaElement%2CMediaPlayer%2CMediaPlayers%2CMediaSource%2CMediaWikiPlayerSupport%2CPlayerSkinKskin%2CTextSource%2CTimedText%2CajaxProxy&skin=vector&version=20130531T021616Z&*

The debug=true page loads:
https://bits.wikimedia.org/static-1.22wmf5/extensions/TimedMediaHandler/resources/mw.MediaWikiPlayerSupport.js

If you search both files for 'mwe-embedplayer-credit-title' (with single quotation marks), you'll see the difference in the code.

I've tried purging the article and hard-refreshing the JS pages, but no luck.

(In reply to comment #12)

Happened again today. A bug fix for TimedMediaHandler was deployed today as
part of the 1.22wmf5 train. The updated JS is served if you use debug=true,
but otherwise outdated JS is served.

This is most likely just bug 37812. For whatever reason the file in that module didn't increase in timestamp when the file was deployed. A simple touch/sync should fix it.

(In reply to comment #3)

The exact urls are cached for 30 days

Krinkle: Do you know where exactly this is defined/configured?

(In reply to comment #15)

(In reply to comment #3)

The exact urls are cached for 30 days

Krinkle: Do you know where exactly this is defined/configured?

This is configured by $wgResourceLoaderMaxage['versioned']['server'].

However this number should not be modified. If anything, it could be made bigger because the versioned urls are unique. If there is a newer version, the client will know about it through the startup module and construct a different url, thus avoiding the old cache. This is a central piece in the design of resourceloader.

If a url has the correct version number but serves old content, it is likely due to a race condition in the wmf deployment script where one server has the new version, and within seconds another server still having the old version gets a request for a module and thus serves it under the new timestamp. Thus causing some cache servers to have the correct version and others the incorrect, even though they are at the same url.

Anyway, to avoid these different generic "Outdated module served" bugs from mixing up and being re-used in something that sounds equally generic, I'm closing this one because it's several months old, the immediate problem was solved, and there's no way to find out now that it was or whether a future bug will be the same.