Page MenuHomePhabricator

Mobile pages out of date
Closed, ResolvedPublic

Description

side by side showing mobile site delay

Mobile pages are occasionally not in sync with the regular pages. In one instance that I know of (see attachment) a change was made 11 hours ago but still not updated on the mobile site. Noticed after email sent to OTRS.


Version: .5
Severity: normal
URL: http://en.m.wikipedia.org/wiki/Photo_Booth

Attached:

untitled.PNG (737×1 px, 116 KB)

Details

Reference
bz20653

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:50 PM
bzimport set Reference to bz20653.

The problem is that Wikimedia Mobile uses a naive caching system with a fixed expiry time and no method, manual or automatic, to purge cache entries following an article update. There were further reports on IRC today when an image was deleted from the main page, causing long-term breakage of the mobile main page.

mobile1 is currently not reaching 100% CPU even when the cache is empty. So I reduced the cache size by a factor of 5 to reduce the effective expiry time. This will reduce the impact of this bug.

I suggest either removing the cache altogether or implementing a proper invalidation scheme. We can easily scale up the CPU usage by adding more servers.

hcatlin wrote:

The aggressive caching came from using our initial server, which was extremely
slow and not at all prepared to handle traffic. Moving the system over to the
mobile1 box has given us much more power and a lot more space.

Mobile does use a very standard caching system. Since it is not integrated
with the main Wikipedia, Brion and I felt that simple caching would avoid a lot
of complex integration work.

Personally, I don't believe this is as big of an issue as its being made out
to be. Its been this way for months and besides people commenting that the
homepage is different, there doesn't seem to be a giant problem here.

The problem with *not* caching isn't CPU, its page loading time. The server
must download the entire page, parse it, modify it, and then jam it into the
layout, which takes about 0.3 seconds. As opposed to grabbing it from cache and
throwing it in the layout which takes about 9ms.

Certainly though, we can tweak the cache settings. A smaller cache is A-OK with
me. Its doing about 50% cache hit rate right now, which is totally fine. I'd
even be fine with doing only an hour of cache.

I just don't think either "removing the cache altogether or implementing a proper
invalidation scheme" are really good options. First one means a slower site,
second one means a ton of breakable, complex integrations.

it isn't that complex to implement invalidation, mediawiki app writes invalidation streams, listening to them isn't that difficult, all you have to do is map the stream events to memcached objects and delete them...

hcatlin wrote:

I've moved us to a 1 hour expiration time and we can keep the 5GB cache.

Caching is much, much less aggressive now.

Is this satisfactory to everyone for the time being?

hcatlin wrote:

I'm closing this bug. 2 hours seems to be pretty satisfactory.