Find previous/next links in the navbar more efficiently
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Aug 2 2013, 2:07 AM

Description

Author: molly.white5

Description:
Each time a page is loaded, currently, the prev/next links in the navbar are generated by looping through the entire sections block of the JSON and finding the current page. This is probably quite slow for very large books.

Relevant IRC discussion:
14:33 < GorillaWarfare> mwalker: If you look at BookManagerv2.hooks.php on line 399, you'll see it's getting the prev/next sections by looping through each section
14:33 < GorillaWarfare> mwalker: But some books have literally 40,000 sections
14:33 < GorillaWarfare> mwalker: Any thoughts on how to deal wiht that better?
14:33 < mwalker> looking
14:34 < GorillaWarfare> Sure
14:38 < mwalker> GorillaWarfare: so two things come to mind -- first that this if you keep it this way could be cached
14:38 < mwalker> the second thing is; why is it unsuitable to use the list order given in the JSON?
14:39 < mwalker> that way it could be numerically indexed and you wouldn't have this problem
14:39 < GorillaWarfare> Use the list order?
14:40 < mwalker> the order in which they were listed in the raw file
14:42 < mwalker> you have an implicit ordering there which you can actually use as an index -- and if you wanted on page save; you could turn that into an explicit ordering which you can then pass down to JS
14:43 < mwalker> or... because I'm an idiot and now see what you're trying to do
14:43 < mwalker> this hook gets run for a specific page
14:44 < mwalker> so... find the page you're rendering for; and then use that implicit index to construct just those forward/backward links?
14:54 < GorillaWarfare> How would it know which page was which index?
14:56 < mwalker> GorillaWarfare: for more generic cases it's in_array; but in this case you can use array_walk()
14:57 < mwalker> or
14:59 < mwalker> hurm... that's not impressively efficient; your best bet might just be to make an array of index => section name on json page save
14:59 < mwalker> save that in cache
14:59 < mwalker> and then retrieve it
15:00 < mwalker> this is where I wish php had proper trees and tree searching
15:03 < GorillaWarfare> Wait, how is index => section name different from what it's doing now?
15:05 < mwalker> it's not; except for the bit where you can't search for a section name to find it's index
15:05 < mwalker> what you're doing right now is actually probably totally fine so long as you cache it
15:05 < mwalker> it's just that you dont want to do this operation for every page
15:05 < GorillaWarfare> Even for 40,000 sections?
15:05 < GorillaWarfare> Ohh; it is doing it for every page
15:06 < GorillaWarfare> And the navbars aren't currently cached
15:06 < mwalker> check
15:06 < GorillaWarfare> So it could be slooow
15:06 < mwalker> indeed!
15:06 < GorillaWarfare> Hmm... so how would we do this..
15:06 < GorillaWarfare> Create each navbar at once and cache them sometime?
15:06 < mwalker> the smaller advantage of just having a simple lookup table without the rest of the metadata is that then you dont have to store all the metadata into memcache
15:07 < mwalker> well; pre-generate the metadata required for each navbar
15:08 < mwalker> but with 20k sections you're going to run into other problems -- like how do you even display that
15:09 < GorillaWarfare> Yeaaaah
15:09 < GorillaWarfare> When Raylton was talking about "big books", I don't think I quite envisioned the true scale
15:09 < mwalker> do they have subsections?
15:09 < mwalker> because then you could cache it sort of as a tree
15:11 < GorillaWarfare> We can't count on it
15:12 < mwalker> grrr
15:12 < mwalker> ok -- so we know we have some limited structured data here
15:13 < mwalker> it might make sense to put this into a database table and then it's persistent and queryable
15:13 < GorillaWarfare> Eek
15:13 < GorillaWarfare> That... doesn't sound simple
15:13 < mwalker> probably easier than you think; but it will force you to make some modifications to your algorithms
15:14 < mwalker> because you'll pull in the section ordering data from the database; and the rest of the book metadata from json
15:14 < mwalker> this might help http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
15:14 < mwalker> just for a concept

Version: unspecified
Severity: normal

Details

Reference: bz52435

Related Objects
Search...

Status	Subtype	Assigned	Task
Open	Feature	None	T66475 Make crosswiki bits and pieces truly global (tracking)
Open	Feature	None	T17073 Create a way for handling meta-organization of books
Open		None	T179790 Offer PDF export of entire wikisource proofread books
Open	Feature	None	T28448 Make Collection extension to automatically create collections for existing books on Wikibooks/Wikisources
Open		None	T14908 Option to protect all subpages and talk pages when protecting the page
Open	Feature	None	T17072 Protect, watchlist or delete a whole book at once
Open	Feature	None	T33254 Provide "Random page of this book" feature for use on Wikibooks projects
Open	Feature	None	T12092 Separate reference page for glossaries, bibliographies, etc
Resolved		None	T17070 Allow <ref>s from multiple pages to be collected into one <references/> on another page
Open	Feature	None	T17075 Per book, category and/or template CSS and JavaScript
Open	Feature	None	T17074 List, count and search all books
Open	Feature	None	T17071 Wikibooks/Wikisource needs means to associate separate pages with books
Declined		None	T52386 Create basic tool for reading books (group of pages)
Resolved		None	T54435 Find previous/next links in the navbar more efficiently

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 1:45 AM

• bzimport added a project: MediaWiki-extensions-BookManagerv2.

• bzimport set Reference to bz52435.

• bzimport created this task.Aug 2 2013, 2:07 AM

Could this bug be good candidate for Google Code-in? Meaning, could an eperienced contributor like e.g. mwalker write a atch to fix this problem within 2-3 hours?

If so, can you describe in a new comment the recommended solution? Sending GCI students (or anybody external to this project) to read between the lines in an IRC log is not a good idea.

Thank you!

(In reply to comment #1)

Could this bug be good candidate for Google Code-in? Meaning, could an
eperienced contributor like e.g. mwalker write a atch to fix this problem
within 2-3 hours?

I don't think this is a good GCI task.

Change 134627 had a related patch set uploaded by Deepali:
Efficient navigation through sections

https://gerrit.wikimedia.org/r/134627

Find previous/next links in the navbar more efficientlyClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Find previous/next links in the navbar more efficiently
Closed, ResolvedPublic
Actions

Related Objects
Search...