Page MenuHomePhabricator

search pageimages feature causes significant extra infrastructure load
Closed, ResolvedPublic

Description

This is the top 50 misses from a random Varnish mobile frontend:

110.52 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=&pithumbsize=80&pilimit=50
 33.80 TxURL          /wiki/List_of_NCAA_Men's_Division_I_Basketball_champions
 10.76 TxURL          /wiki/NCAA_Men's_Division_I_Basketball_Championship
  6.56 TxURL          /wiki/Special:Search?search=
  5.65 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Race+and+ethnicity+in+the+United+States+Census%7CRussia%7CRomanization%7CRock+music%7CRomania%
  5.54 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Rick+Ross%7CRicky+Martin%7CRick+Rubin%7CRick+Perry%7CRick+Leach%7CRicky+Gervais%7CRick+Wakeman
  4.80 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Mollusca%7CMoth%7CMexico%7CMidfielder%7CMarriage%7CMember+of+Parliament%7CMajor+League+Basebal
  4.68 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Keyboard+instrument%7CKenya%7CKentucky%7CKerala%7CKorea%7CKansas%7CKent%7CKolkata%7CKorean+War
  4.58 TxURL          /wiki/Special:RecordImpression?result=hide&reason=empty&country=XX&uselang=en&project=wikipedia&db=enwiki&bucket=1&anonymous=false&device=android
  4.09 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=John+Cage%7CJohn+Calvin%7CJohn+Cale%7CJohn+Carpenter%7CJohn+Cassavetes%7CJohn+Carradine%7CJohn
  3.48 TxURL          /w/index.php?title=Special:UserLogin&returnto=Special%3AUploads
  3.18 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Brazil%7CBakhsh%7CBBC%7CBillboard+(magazine)%7CBass+guitar%7CBelgium%7CBoston%7CBerlin%7CBaske
  3.18 TxURL          /w/index.php?title=Wikispecies_talk:Village_Pump/Archive_17&action=edit&redlink=1
  3.17 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=La+La%7CLA+Law%7CLa-La+Land+Records%7CLA+Lakers%7CLa+La+Anthony%7CLa+Laguna%7CLa+La+La+(Naught
  3.12 TxURL          /wiki/Special:CentralAutoLogin/start?type=1x1
  2.90 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Anatolia%7CAnarchism%7CAnatomical+terms+of+location%7CAnaheim%2C+California%7CAnatomy%7CAnahei
  2.80 TxURL          /wiki/2012_NCAA_Men's_Division_I_Basketball_Tournament
  2.74 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Thumba%7CThumba+Equatorial+Rocket+Launching+Station%7CThumbay+Moideen%7CThumbavi%7CThumbay+Gro
  2.66 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=The+New+York+Times%7CTurkey%7CTexas%7CThe+Guardian%7CToronto%7CTaiwan%7CThailand%7CTokyo%7CThe
  2.62 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Estados+Unidos%7CEspa%C3%B1a%7CEspecie%7CEstado+de+los+Estados+Unidos%7CEuropa%7CEndopterygota
  2.54 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=X-Men%7CX-Men+(TV+series)%7CX-Men%3A+The+Last+Stand%7CX-Men+(film)%7CX-Men%3A+First+Class%7CX-
  2.53 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=S%7CSMAP%7CSuica%7CSD%E3%83%A1%E3%83%A2%E3%83%AA%E3%83%BC%E3%82%AB%E3%83%BC%E3%83%89%7CSOD%E3%
  2.49 TxURL          /
  2.43 TxURL          /wiki/Special:Random
  2.39 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=District+1%7CDistrict+13%7CDistrict+13%3A+Ultimatum%7CDistrict+1%2C+D%C3%BCsseldorf%7CDistrict
  2.38 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Boston%7CBosnia+and+Herzegovina%7CBolivia%7CBoxing%7CBollywood%7CBoston+Red+Sox%7CBob+Dylan%7C
  2.32 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Nicholas%7CNicholas+II+of+Russia%7CNicholas+I+of+Russia%7CNicholas+Briggs%7CNicholas+Monroe%7C
  2.29 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=John+Calipari&pithumbsize=80&pilimit=50
  2.14 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Andrew+Jackson%7CAndrew+Lloyd+Webber%7CAndrew+Johnson%7CAndrew+Carnegie%7CAndrew+the+Apostle%7
  2.11 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Louisiana%7CLouisville%2C+Kentucky%7CLouis+XIV+of+France%7CLouisiana+State+University%7CLouis+
  2.04 TxURL          /wiki/Pandora's_box
  2.01 TxURL          /w/index.php?title=Special:RecentChanges&feed=atom
  2.00 TxURL          /w/api.php?action=featuredfeed&feed=onthisday&feedformat=atom
  1.99 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Stan+(South+Park)&pithumbsize=80&pilimit=50
  1.97 TxURL          /w/api.php?hidebots=1&days=7&limit=50&hidewikidata=1&action=feedrecentchanges&feedformat=atom
  1.95 TxURL          /w/index.php?title=%D9%88%DB%8C%DA%98%D9%87:%D8%AA%D8%BA%DB%8C%DB%8C%D8%B1%D8%A7%D8%AA_%D8%A7%D8%AE%DB%8C%D8%B1&feed=atom
  1.95 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Jeopardy!%7CJeopardy!+broadcast+information%7CJeopardy!+Tournament+of+Champions%7CJeopardy!+Ul
  1.95 TxURL          /wiki/Special:BannerRandom?uselang=es&sitename=Wikipedia&project=wikipedia&anonymous=false&bucket=1&country=XX&device=android&slot=8
  1.95 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Kyrgyzstan%7CKyoto%7CKylie+Minogue%7CKyushu%7CKyiv%7CKyushu+Railway+Company%7CKyoto+Protocol%7
  1.94 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Lieutenant%7CLie%7CLieutenant+colonel%7CLieutenant+general%7CLi%C3%A8ge%7CLiechtenstein%7CLieu
  1.92 TxURL          /wiki/Asia's_Next_Top_Model_(cycle_2)
  1.92 TxURL          /wiki/1996_NCAA_Men's_Division_I_Basketball_Tournament
  1.92 TxURL          /wiki/Bah%C3%A1'%C3%AD_Faith
  1.84 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=CuSO4&pithumbsize=80&pilimit=50
  1.82 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Christian%7CChristianity%7CChristmas%7CChristchurch%7CChrist+Church%2C+Oxford%7CChristina+Agui
  1.80 TxURL          /wiki/Harry_Potter_and_the_Philosopher's_Stone_(film)
  1.80 TxURL          /wiki/Ballon_d'Or_(1956%E2%80%932009)
  1.78 TxURL          /w/api.php?format=json&action=query&prop=pageimages&titles=Gumbo%7CGumby%7CGumball+3000%7CGumboot%7CGumboro%7CGumbo+fil%C3%A9%7CGumbranch%7CGumbasia%7CGu

This is just one Varnish server *backend* — just compare the hits that / is getting (at place 23) to get a sense of the scale. The Varnish req/s graphs are broken on the yearly scale right now, but I'd bet pageimages is a significant portion of our requests per second based on the above

Even further down the stack, the API appservers network usage (and respective load) has tripled over the last two months:
http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=API+application+servers+eqiad&m=ap_busy_workers&s=by+name&mc=2&g=network_report
and RPS have risen by 40-50% since January:
http://ganglia.wikimedia.org/latest/stacked.php?m=ap_rps&c=API%20application%20servers%20eqiad&r=year&st=1396059695&host_regex=

Correct me if I'm wrong but from the URLs it looks like the pageimages search feature that got moved from beta to stable with commit I1c657fd46a2b5be4f27aa508a5cc0d946d6b98a8 (Story 1462: Move new search overlay to stable). The commit was merged in 1.23wmf12 which was deployed between Jan 30th (group 0) & Feb 6th (group 2), so it looks like it correlates well with the graphs above.

Besides infrastructure problems, I also imagine that this is killing UA performance/total page load time for search on mobile. I'll Cc Ori.

Moreover, I suppose that this feature creates yet another thumbnail size that we didn't have before which could potentially be problematic (not so in bytes, as they're tiny, but in object count). I haven't noticed any significant difference in object creation count nor bytes stored in Swift, however, so that's probably OK for now. It'd be nice to have had some estimates & advance warning about this beforehand, though.

I'd revert this commit while you investigate, but the commit is

20 files changed, 103 insertions(+), 319 deletions(-)

and I see no obvious config tweak that would allow me to turn it off gracefully. It may be a good idea to do such beta->stable rollouts in the future in two steps, with a small turning-the-knob commit at the end so that they are more easily revertable in emergencies or severe problems.


Version: unspecified
Severity: critical

Details

Reference
bz63248

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 2:55 AM
bzimport set Reference to bz63248.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1820

Change 121930 had a related patch set uploaded by MaxSem:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/121930

^^^ kills the most popular cache miss.

I don't think the cacheable bit will help much. The pageimages API parameter will be for a more-or-less unique combination of articles OR'ed with each other, as they resulted from the search query. I'm not sure of the distribution of our search queries, but I'm guessing caching those combinations won't make a huge difference (and it will be a waste of cache memory/disk).

It won't help client performance either, as clients will still do 17 extra requests on every search for this feature. I think the design of this whole feature honestly sounds a bit naive to me. We should really go a step backwards and rethink the best way to do this with server- and client-side performance in mind.

Assuming images on search results is something that is not hurtful from a performance PoV and a requirement for where mobile is going, then we should probably have one single search endpoint in the API which returns the JSON result set enriched with image thumbnails URLs. There is no reason for this to be in an extension or a separate request, AIUI.

The dozen upload.wikimedia.org requests to fetch a 1-2KB thumb eachs is still going to kill client-side performance though. There's a tremendous overhead of data for the request and round-trips that will delay the total page load time from a viewer's perspective. It's also high-latency requests, as these 50px images in most cases will need to be scaled on the fly. Unfortunately our architecture makes it difficult to return this directly in the result set as e.g. data URIs but if that'd work maybe we could find a way.

Let's think of other ways to improve this feature in general. I'd be happy to discuss performance with you, as well as help in case some deeper infrastructure change (Swift/Varnish) could help make this feature better for everyone. I'm sure Ori could be of help here as well, with his client-side performance expert hat on.

Ideally the search results API would return these results. This would remove the need for the or query and the additional get itself. Could the results of the search api query be piped into page images?

As an interim would it make any difference if we just fetched the top 5 results and threw in some client side caching so that additional images are not retrieved?

If we need to completely turn this off we will have to quickly check with design what is preferable and whether we can have a placeholder image there so it doesn't look strange.

(In reply to Faidon Liambotis from comment #5)

I don't think the cacheable bit will help much. The pageimages API parameter
will be for a more-or-less unique combination of articles OR'ed with each
other, as they resulted from the search query. I'm not sure of the
distribution of our search queries, but I'm guessing caching those
combinations won't make a huge difference

The frequency dump says otherwise:

  • The top line would have been cached (but we're killing it with 121930 anyway).
  • From line 5 onwards there are a bunch of popular requests with titles from prefix search for "r", "ri", "m", "k", "john" etc - these are not random fluctuations and will always be present with some noticeable frequency so caching them will be highly beneficial.

(and it will be a waste of cache memory/disk).

API results with cache mode = 'public' have a lifetime of 12 hours, this should be enough to prevent infinite hoarding of cache objects. Anyway, most query modules' results are public so this will not change the situation much.

It won't help client performance either, as clients will still do 17 extra
requests on every search for this feature. I think the design of this whole
feature honestly sounds a bit naive to me. We should really go a step
backwards and rethink the best way to do this with server- and client-side
performance in mind.

The dozen upload.wikimedia.org requests to fetch a 1-2KB thumb eachs is
still going to kill client-side performance though. There's a tremendous
overhead of data for the request and round-trips that will delay the total
page load time from a viewer's perspective. It's also high-latency requests,
as these 50px images in most cases will need to be scaled on the fly.
Unfortunately our architecture makes it difficult to return this directly in
the result set as e.g. data URIs but if that'd work maybe we could find a
way.

The current implementation tries to work around this by waiting for 500ms after search results are outputted before displaying the images. The problem with it is that doesn't retrieve the page images information in the same request as search, resulting in extra API requests.

(In reply to Jon from comment #6)

Ideally the search results API would return these results. This would remove
the need for the or query and the additional get itself. Could the results
of the search api query be piped into page images?

Currently it's not possible as prefix search is available only with the non-query opensearch module. It should be trivial to add a similar generator module, however - I wonder why it hasn't been done yet:)

As an interim would it make any difference if we just fetched the top 5
results and threw in some client side caching so that additional images are
not retrieved?

How reliable would it be to detect which list items are actually visible on screen? This would increase cache fragmentation though because currently you're always requesting as many results as you have.

Change 121930 merged by jenkins-bot:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/121930

Change 122582 had a related patch set uploaded by MaxSem:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/122582

Change 122583 had a related patch set uploaded by MaxSem:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/122583

Change 122582 merged by jenkins-bot:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/122582

Change 122583 merged by jenkins-bot:
Don't request pageimages for 0 pages

https://gerrit.wikimedia.org/r/122583

Change 122594 had a related patch set uploaded by MaxSem:
Update MobileFrontend, PageImages and TextExtracts for bug 63248

https://gerrit.wikimedia.org/r/122594

Change 122596 had a related patch set uploaded by MaxSem:
Update MobileFrontend, PageImages and TextExtracts for bug 63248

https://gerrit.wikimedia.org/r/122596

Change 122596 merged by jenkins-bot:
Update MobileFrontend, PageImages and TextExtracts for bug 63248

https://gerrit.wikimedia.org/r/122596

Change 122594 merged by jenkins-bot:
Update MobileFrontend, PageImages and TextExtracts for bug 63248

https://gerrit.wikimedia.org/r/122594

(In reply to Max Semenik from comment #7)

(In reply to Jon from comment #6)

Ideally the search results API would return these results. This would remove
the need for the or query and the additional get itself. Could the results
of the search api query be piped into page images?

Currently it's not possible as prefix search is available only with the
non-query opensearch module. It should be trivial to add a similar generator
module, however - I wonder why it hasn't been done yet:)

https://gerrit.wikimedia.org/r/123118 adds an API module needed for this, after it's merged this will need some FE work to use this module.

Change 123811 had a related patch set uploaded by MaxSem:
Don't request page images separately for search results

https://gerrit.wikimedia.org/r/123811

Change 123811 merged by jenkins-bot:
Don't request page images separately for search results

https://gerrit.wikimedia.org/r/123811

Looks like this should be taken care of now...