Page MenuHomePhabricator

Find previous/next links in the navbar more efficiently
Closed, ResolvedPublic

Description

Author: molly.white5

Description:
Each time a page is loaded, currently, the prev/next links in the navbar are generated by looping through the entire sections block of the JSON and finding the current page. This is probably quite slow for very large books.

Relevant IRC discussion:
14:33 < GorillaWarfare> mwalker: If you look at BookManagerv2.hooks.php on line 399, you'll see it's getting the prev/next sections by looping through each section
14:33 < GorillaWarfare> mwalker: But some books have literally 40,000 sections
14:33 < GorillaWarfare> mwalker: Any thoughts on how to deal wiht that better?
14:33 < mwalker> looking
14:34 < GorillaWarfare> Sure
14:38 < mwalker> GorillaWarfare: so two things come to mind -- first that this if you keep it this way could be cached
14:38 < mwalker> the second thing is; why is it unsuitable to use the list order given in the JSON?
14:39 < mwalker> that way it could be numerically indexed and you wouldn't have this problem
14:39 < GorillaWarfare> Use the list order?
14:40 < mwalker> the order in which they were listed in the raw file
14:42 < mwalker> you have an implicit ordering there which you can actually use as an index -- and if you wanted on page save; you could turn that into an explicit ordering which you can then pass down to JS
14:43 < mwalker> or... because I'm an idiot and now see what you're trying to do
14:43 < mwalker> this hook gets run for a specific page
14:44 < mwalker> so... find the page you're rendering for; and then use that implicit index to construct just those forward/backward links?
14:54 < GorillaWarfare> How would it know which page was which index?
14:56 < mwalker> GorillaWarfare: for more generic cases it's in_array; but in this case you can use array_walk()
14:57 < mwalker> or
14:59 < mwalker> hurm... that's not impressively efficient; your best bet might just be to make an array of index => section name on json page save
14:59 < mwalker> save that in cache
14:59 < mwalker> and then retrieve it
15:00 < mwalker> this is where I wish php had proper trees and tree searching
15:03 < GorillaWarfare> Wait, how is index => section name different from what it's doing now?
15:05 < mwalker> it's not; except for the bit where you can't search for a section name to find it's index
15:05 < mwalker> what you're doing right now is actually probably totally fine so long as you cache it
15:05 < mwalker> it's just that you dont want to do this operation for every page
15:05 < GorillaWarfare> Even for 40,000 sections?
15:05 < GorillaWarfare> Ohh; it is doing it for every page
15:06 < GorillaWarfare> And the navbars aren't currently cached
15:06 < mwalker> check
15:06 < GorillaWarfare> So it could be slooow
15:06 < mwalker> indeed!
15:06 < GorillaWarfare> Hmm... so how would we do this..
15:06 < GorillaWarfare> Create each navbar at once and cache them sometime?
15:06 < mwalker> the smaller advantage of just having a simple lookup table without the rest of the metadata is that then you dont have to store all the metadata into memcache
15:07 < mwalker> well; pre-generate the metadata required for each navbar
15:08 < mwalker> but with 20k sections you're going to run into other problems -- like how do you even display that
15:09 < GorillaWarfare> Yeaaaah
15:09 < GorillaWarfare> When Raylton was talking about "big books", I don't think I quite envisioned the true scale
15:09 < mwalker> do they have subsections?
15:09 < mwalker> because then you could cache it sort of as a tree
15:11 < GorillaWarfare> We can't count on it
15:12 < mwalker> grrr
15:12 < mwalker> ok -- so we know we have some limited structured data here
15:13 < mwalker> it might make sense to put this into a database table and then it's persistent and queryable
15:13 < GorillaWarfare> Eek
15:13 < GorillaWarfare> That... doesn't sound simple
15:13 < mwalker> probably easier than you think; but it will force you to make some modifications to your algorithms
15:14 < mwalker> because you'll pull in the section ordering data from the database; and the rest of the book metadata from json
15:14 < mwalker> this might help http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
15:14 < mwalker> just for a concept


Version: unspecified
Severity: normal

Details

Reference
bz52435

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:45 AM
bzimport set Reference to bz52435.

Could this bug be good candidate for Google Code-in? Meaning, could an eperienced contributor like e.g. mwalker write a atch to fix this problem within 2-3 hours?

If so, can you describe in a new comment the recommended solution? Sending GCI students (or anybody external to this project) to read between the lines in an IRC log is not a good idea.

Thank you!

(In reply to comment #1)

Could this bug be good candidate for Google Code-in? Meaning, could an
eperienced contributor like e.g. mwalker write a atch to fix this problem
within 2-3 hours?

I don't think this is a good GCI task.

Change 134627 had a related patch set uploaded by Deepali:
Efficient navigation through sections

https://gerrit.wikimedia.org/r/134627