Page MenuHomePhabricator

BagOStuff should behave consistently with regards to integer values
Closed, ResolvedPublic

Description

Backstory:

When using Redis as main cache backend (e.g. like mediawiki-vagrant does), various things in MediaWiki (at least ResourceLoader) break in unexpected ways due to $cache->get() returning a string even if an integer was given to $cache->set().

A patch (to just ResourceLoader) to mitigate this was submitted:

https://gerrit.wikimedia.org/r/#/c/103407

This works in Wikimedia production because we use Memcached (specifically, a client that preserves the integer type in such a way that doesn't involve php serialise() and still maintains compatibility with native INCR/DECR).

I would prefer not to have to deal with this in random parts of the code base where someone ran into a bug when they use an object cache implementation that doesn't properly roundtrip integers.

I'm proposing one of two things:

  • Change our cache interfaces to always return a string, even for clients of Memcached that support maintaining it. This will avoid code from being written like in ResourceLoader from working in one environment and breaking in another. Instead it will never work and thus force us to consistently deal with this after retrieval. We'd get patterns like proposed in https://gerrit.wikimedia.org/r/#/c/103407 in more places, but at least it would become part of expected behaviour and we'd have to do it consistently.
  • Figure out a way to support integers even for backends like Redis that don't support it fully. Note that though Redis doesn't support storing integers in a way that comes back out, it does have integer operations (e.g. it has native INCR, and it will work on values that look like integers, regardless of whether PHP provided these as integers or strings, they all become strings).

See Also: T58069: incr destroys expiration on RedisBagOStuff

Details

Reference
bz60563

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:53 AM
bzimport set Reference to bz60563.

A discussion on #wikimedia-dev between TimStarling, superm401 and myself came up with the following:

  • Krinkle and superm401 mentioned that serialising integer values would be a problem as that would mean we'd have to resort to BagOStuff's emulation of increment().
  • TimStarling proposed we instead serialise all non-integer values, including simple strings (even if they look like strings). And on the retrieval end, always cast to an integer if the value contains only digits, and unserialise otherwise.

This latter is close to what we have already, except that we don't cast it right now.

"(even if they look like strings)" should read "(even if they look like integers)".

Change 103407 had a related patch set uploaded by Krinkle:
resourceloader: Account for cache backend casting int value to string

https://gerrit.wikimedia.org/r/103407

You could uses hashes. RedisBagOStuff::set would perform:

$redis->hMset( $key, array( 'type' => gettype( $value ), 'value' => $value ) );

RedisBagOStuff::get would use $redis->hmGet( $key, array( 'type', 'value' ) ) and cast as needed.

Hash fields can be incremented atomically using either HINCRBYFLOAT or HINCRBY. We could decline to type-check, on the principle of GIGO.

If we pick this approach, we should be careful and make sure the overhead is acceptable. The Redis documentation provides some cause for optimism, though: "A hash with a few fields (where few means up to one hundred or so) is stored in a way that takes very little space, so you can store millions of objects in a small Redis instance."

Change 103407 abandoned by Dr0ptp4kt:
resourceloader: Account for cache backend casting int value to string

Reason:
Although there is a bug in the sense of inconsistent behavior across Bag O' Stuff subclasses, in production it's currently a non-issue for RL due to use of PECL memcached and the Varnish edge case. If it were using Redis, it would be problematic, although the symptom would be masked by Varnish. If using Redis without an edge cache, say like on a localhost installation with a direct connection, it would and does surface. We're going to close this, and Timo and Tim Starling are aware of the issue. For anyone running with direct connections using Redis as the backing store, they will need to make sure that PECL memcached is used instead for the time being (presuming no other fixing patch has sneaked in since this started).

https://gerrit.wikimedia.org/r/103407

Summary from bug 59623:

So yeah, I think the best approach would be to go for the pure integer vs. non-integer approach.

If giving the cache store an integer, we keep it as it. The cache store may or may not support this type natively, but at least we send it to the cache store as-is, and it will store it as either an integer or string.

Then, anything that isn't an integer (not just arrays an objects, but even booleans and plain strings) we php serialise().

So that on the way out, we'll either get a serialised packet that we unserialise, or a number-like value which either is an integer already (if the backend supports it) or a string that we'll then cast to a number within the bagostuff class before returning it to the consuming code.

Again, the reason we don't want to serialise just everything is to still allow the backend to optimise by being able to store number values efficiently and allow operations on them like INT/DECR and what not.

A minor down side to this is that simple strings will inflate a bit in terms of occupied space (s:3:"foo"; vs "foo"), but has the upside of never being ambiguous. And besides, there is no advantage in storing strings as bare strings (other than saving a few bytes) as there are (afaik) no operations we need the backends to do on the values themselves.

This seems like a perfect solution, but I foresee two possible issues:

  1. Are there other types that we need the backends to perform native operations on? For example, for integers there is INT/DECR which is why we use that is the deciding factor. But this works only as long as there are no other types we need to distinguish (e.g. do some backends have native boolean value store types with toggle commands on them, and if so, is that worth it?)
  1. We'll still need to keep the fallback code for interpreting non-serialise non-number values because production environments will have existing cache values that aren't serialised the same way yet.

Seems RedisBagOStuff has had php serialize for all. So, I'm not sure how it could've returned strings for integers.

Change I3e0d1c4888062544 made it use native integers where possible but.. Ah, right. It makes two mistakes:

  • encode: Uses ctype_digit to let strings with numbers pass as numbers
  • decode: Doesn't cast them to int. And the is_int check is pointless as they always come out as strings.

Change 142450 had a related patch set uploaded by Krinkle:
objectcache: Actually inserialize integers as integers

https://gerrit.wikimedia.org/r/142450

Change 142450 merged by jenkins-bot:
objectcache: Actually unserialize integers as integers in RedisBagOStuff

https://gerrit.wikimedia.org/r/142450

RedisBagOStuff was fixed. Do any other other BagOStuff sub classes have the same bug? If not, this can be closed.

I don't know. I've worked on and tested Redis because MediaWiki-Vagrant uses it.

Krinkle removed a project: MW-1.24-release.
Krinkle set Security to None.
Krinkle assigned this task to aaron.