Page MenuHomePhabricator

Cache multimedia limn JSON datasources
Closed, DeclinedPublic

Description

Second view shouldn't reload all that JSON stuff: http://www.webpagetest.org/result/141119_VN_MP8/6/details/cached/


Version: wmf-deployment
Severity: normal
URL: http://multimedia-metrics.wmflabs.org/dashboards/mmv

Details

Reference
bz73611

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:54 AM
bzimport set Reference to bz73611.
bzimport added a subscriber: Unknown Object (MLST).

Is that a feature in limn that we could enable by getting the configuration right? Or a webserver configuration issue? Or neither, in which case this should be a feature request (which would probably get ignored as limn is on its way out)?

limn right now just cache-busts every datasource (at the time, that's what everyone using it wanted). But yeah, it would be pretty simple to add options to the datasource that would make it stop cache-busting. Or maybe we can do like daily cache-busting. If you can describe the feature very clearly, and if this is really useful to someone, I would probably be able to do it in my volunteer time.

Thanks both.

Krinkle> General practice: Either make sure requests for data have a version or timestamp in them and cache very long (e.g. 30+ days), or purge it when you detect a change server side (ideal), or cache 5-10 minutes server side (smax-age, varnish) and client-side

$ curl -I http://multimedia-metrics.wmflabs.org/dashboards/mmv
HTTP/1.1 200 OK
Server: nginx/1.5.0

Just add something to your nginx config à la

location ~* \.(json|csv)$ {

expires 10 m;

}

?

Though, you could even do "expires @1h00m;" or something like that, since the data only needs updating after a cronjob, doesn't it?

that particular solution does not apply here, because the nginx is just a proxy, this is being served through apache from nodejs. And besides, limn bypasses anything the server does with client-side cache busting. But as I say, the fix is not too bad and I can do it if this is important for someone (keep in mind a lot of people ask for a lot of important stuff).

The slowness of limn is a major usability problem; if we are going to use it for a long time, I think it's important to improve it. My understanding is that it is going to be replaced soon-ish, though, in which case I don't think it's worth spending time on fixing it. (Also, I don't know how much the lack of caching contributes to the performance problems, although the 20 sec request linked by Nemo sounds pretty bad.)

As for implementing caching, retrieving the data does not seem to be much of a performance concern. In the network log linked by Nemo, only the last request (the tsv file) contains actual data; all others are limn configuration files. Those are always local so limn could just use their last modification date as a cache-buster string. I don't think it is a big deal to leave the tsv file uncached (it is generally updated once a day, but when working on the dataset-generating code, not being able to see updates immediately would be a major inconvenience), although maybe the cache buster could be removed so that normal ETag- or Last-Modified-based caching works.

The slowness of limn is a complicated problem. It's not the caching, it has more to do with how it renders all the graphs even if they're not on a visible tab. I've tried to solve this but it leads to other problems. Limn will be replaced by a dashboarding system that we're trying to design right now.

So it doesn't sound like there's anything simple we can do right now to make anyone's life better in the short term. One suggestion, though, is that we might want to try and make the metadata inferring logic in Limn a little smarter and able to handle a few more parameters.

Right now, most graphs are added as a graphId to the dashboard. This graphId is looked up on the server, fetched and retrieved. It then loads one or more datasources which are fetched and retrieved. Those each load a datafile and that's why Limn takes forever.

If you add "some valid URL" instead of a graphId to the dashboard, limn will infer the graph and datasource metadata making it much faster. So, one thing we could do is instead of just the URL we could pass some bare minimum parameters to make it draw what we want like:

{url: '...', type: 'bar|world|line', title: 'Custom'}

Like I say, we're working on a better way to dashboard, but if that's happening too slow, this is the most bang for our buck and I'd be happy to help make it happen.

milimetric explained this in more detail, and it looks really neat: basically the kind of TSV files we have can be added to a dashboard without any metainformation as long as we make sure the files are partitioned the same way the graphs should be. Example: http://mobile-reportcard.wmflabs.org/dashboards/reportcard.json?pretty

If we switched the MediaViewer schema to use a string instead of an enum as an action name, adding new actions would be as simple as writing the logging code and adding a new field to the SQL query; neither the union all list nor the limn config needed to be modified at all. (Now to build a time machine, go back to last May and tell this to ourselves...)

Milimetric claimed this task.

at this point we're very unlikely to spend more effort making limn better. We made sure dashiki deals with caching very well and we'll be migrating limn dashboards to dashiki more actively next year as part of project code-named {frog}