Page MenuHomePhabricator

ResourceLoaderModule::getDefinitionMtime fails to use value from cache, always returns current time
Closed, DeclinedPublic

Description

ResourceLoaderModule.php's getDefinitionMtime function contains the following:

		$data = $cache->get( $key );
		if ( is_int( $data ) && $data > 0 ) {
			// We've seen this hash before, re-use the timestamp of when we first saw it.
			wfProfileOut( __METHOD__ );
			return $data;
		}

When the $key's value is, for example, wiki:resourceloader:moduledefinition:jquery.client:b86c8a395c01a1a635e65f1f03de1194,the $cache->get( $key ) call returns a string of digits (not an integer primitive).

Thus, $data is never returned. Instead, the function returns a timestamp that's guaranteed to be late in the future (the current time, as per time()).

		$timestamp = time();
		$cache->set( $key, $timestamp );

		wfProfileOut( __METHOD__ );
		return $timestamp;

When ResourceLoaderFileModule.php's getModifiedTime ( ResourceLoaderContext $context ) relies upon ResourceLoaderModule.php's timestamp comparison

		$this->modifiedTime[$context->getHash()] = max(
			$filesMtime,
			$this->getMsgBlobMtime( $context->getLanguage() ),
			$this->getDefinitionMtime( $context )
		);

this means that the getDefinitionMtime argument will always win in the max() determination.

In production, the (far) future dated cache settings on the ResourceLoader responses mean that in general there shouldn't be lots of unnecessary origin hits. So even though the current timestamp will version the URL at that very instant, it doesn't seem to matter too much from what I can tell.

That said, I'm mostly interested in the context of the main startup scripts. While trying to generate an accurate AppCache that only fluctuates URLs and datetime stamps when needed, I observed the following behavior.

Look at the last part of the ISO datetime stamp toward the end of the URLs.

Change I7250d819628dfa9ee8c41941d843e9d7bd24bf3f (https://gerrit.wikimedia.org/r/103407) has been submitted to try to work around the case where the returned value is int-like. It may make more sense to look into RedisBagOStuff.php's unserialize ( $data ) function if this behavior is generally problematic, but I'm not clear on the other non-RL contexts where messing with RedisBagOStuff.php could have unintended consequences. It seems that adjusting things in ResourceLoader makes sense in this case, but I'm not sure.

This currently blocks a dependent change for getting the startup URL for supporting an HTML5 AppCache concept, which in turn is necessary for offline back/forward functionality, useful especially on places with connectivity issues.


Version: 1.23.0
Severity: normal

Details

Reference
bz59623

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:43 AM
bzimport set Reference to bz59623.
bzimport added a subscriber: Unknown Object (MLST).

Marking as unconfirmed for now, updating bug title case.

The problem seems to stem from type cast errors. To my knowledge it is supposed to be reliable that our object caching layer takes care of this. Both locally for me and in production, integers passed to $wgMemc->set( $key, $value ) come back out the same with ->get(), they're not implicitly converted to strings.

They are most likely converted to strings for storage purposes (Redis, Memcached, MySQL etc.), but the object cache interface should take care to serialise and unserialise this properly, and to my knowledge that is happening correctly.

If that isn't happening correctly, I would recommend looking into the object cache interface first as that will most likely a wide range of other significant problems. Definitely something that must absolutely not be handled in some random method (getDefinitionMtime in this case).

Plus, in that case we'd probably not want to keep separate code paths for is_int and ctype_digit, as is_int would be redundant and even incorrect if they are "expected" to be strings.

RedisBagOStuff returns it as a string. Like you suggest, MemcachedPeclBagOStuff returns it as an integer.

The underlying Redis library's get() is documented as returning stuff as a String type, and PECL memcached's get() is documented as returning mixed types.

Modification of RedisBagOStuff's serialize and unserialize to make it more like PECL memcached-like would break the Redis INCR/DECR commands.

Given that we know getDefinitionMtime needs to return an integer type, can we move forward with this? If not, what do you propose?

I'm fine unbasing the dependent change for the URL given that we know in production things will continue working, but I also want to future proof this in case the memory cache provider changes.

(In reply to comment #2)

The underlying Redis library's get() is documented as returning stuff as a
String type, and PECL memcached's get() is documented as returning mixed
types.

Modification of RedisBagOStuff's serialize and unserialize to make it more
like PECL memcached-like would break the Redis INCR/DECR commands.

Can you elaborate on this? Afaik memcached has incr/descr like commands as well. However afaik this shouldn't be an issue. It is up to the BagOStuff subclass to ensure stuff roundtrips properly.

In case of Memcached, which supports scalars natively, it does PHP serialise/unserialise for anything else, thus turning it into a string and ensuring proper roundtrip. And integers presumably stay integers within Memcached as well.

If Redis doesn't support integers separate from strings, would it be sensible to abstract this limitation in our BagOStuff subclass and serialise them in such a way that we get back out what we put in?

I take it serialising integers in a string (e.g. "int/12345" or using php serialise() format) is problematic as one can then no longer do INCR/DECR on it. But I'm confused though as to how Redis does this internally. Surely it has to distinguish between numbers and strings in some way in order to support INCR/DECR at all?

Would it perhaps be sensible to do it the other way around (have the Redis BagOStuff get() method convert the string to integer before returning if it looks like something that should be an integer). Might als cause side effects, as a string might come out as an integer that way, but then again, the opposite is causing side effects as well (as proven by the existence of this bug).

I'm fine with work arounds, but the reason I'm somewhat leery of implementing this workaround is because it seems the wrong place to deal with this.

The code I wrote there assuming an integer comes out when an integer is put in, I'm happy to be proven wrong, but I wasn't aware of that being unreliable. From my perspective it is our RedisBagOStuff implementation that is inconsistent with the status quo (I understand now why RedisBagOStuff does this, but that doesn't make it right to break backwards compatibility).

Basically, if we accept this work-around, we should start actively discouraging usage of actual integers in cache storage (in that, we shouldn't assume they come out as integers, regardless of which cache storage). So perhaps we should then modify the MemcachedBagOStuff to convert integers to strings in get()?

Given that we know getDefinitionMtime needs to return an integer type, can we
move forward with this? If not, what do you propose?

Short of changing RedisBagOStuff::get just for this, I'd recommend changing the expression assigned to $data from $cache->get( $key ); to (int) $cache->get( $key ) in:
https://gerrit.wikimedia.org/r/#/c/103407/3/includes/resourceloader/ResourceLoaderModule.php

And then have only one if branch, not two. Having two if-branches there is highly confusing in my opinion and bound to get messed up in future maintenance as there is no obvious reason for the $data type to be variable. This should be abstracted by the cache storage, just like we do for database abstractions. That sometimes means we can't make use of all the features a storage system has to offer, but I guess that's worth it for the benefit of consistency and predictability (in this case we'd give up getting integers from Memcached, we could still keep them as integers in Memcached internally and probably have to for INCR/DECR to work there, but we'd patch our get() to always return a string, like Redis does).

Anyway, let's handle the cache storage system inconsistency in a separate bug.

I'm fine unbasing the dependent change for the URL given that we know in
production things will continue working, but I also want to future proof this
in case the memory cache provider changes.

I'm not sure what you mean here.

Per face-to-faces with Timo and Roan, this issue is non-impactful for RL in production due to use of PECL memcached. It would also technically be masked if Redis were used, due to edge cache stuff with Varnish. But for direct connections with Redis backing, the inconsistency would surface (and does surface). Timo T and Tim S are aware of the issue. Where's the other bug covering this issue? I want to mark this as a duplicate and close it out, and defer to the other. I guess we should probably cross-reference.

Closing as this will be dealt with by bug 60563.

Not really sure what to close it as, sort of a "current code works-for-me", "issue is duplicate of other bug" and "actual problem is not in ResourceLoader".