Page MenuHomePhabricator

CategoryTree is inefficient
Closed, ResolvedPublic

Description

categorytree forces scans of millions of rows (for large categories) to return 0 row datasets quite often.

this has been discussed in the past a lot, I'll write it down yet again.

there're few fixes, one is by prefixing cl_sortkey with character not used in category names and filtering by it - would need rewriting sortindex only for subset of rows.

another is creating separate columns, creating or replacing indexes, etc - in categorylinks table

another is maintaining subcategories table per category


Version: unspecified
Severity: enhancement

Details

Reference
bz23682

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:59 PM
bzimport set Reference to bz23682.

In fact this feature has been disabled today: Please check if it's really necessary to disable this in all projects. Maybe this ist usefull for large projects such as wikipedias, but I don't believe this feature must be kicked off in the small ones.

On the wiki's it is disabled it would be nice for the extension to contain some dummy code or empty string return to prevent errors all over the place.

This extension is/was widely used. And to discourage it from being unlinked everywhere (since we optimists all asume it might be effecient one day) a blank return on the wiki's it's disabled on is better then an unexpected red error message or raw return of the <categorytree /> syntax

ayg wrote:

We also want to mark subcategories specially in categorylinks so that we can page subcategories separately from actual category members. Currently in a large category, you'd never find any subcategories, because they're on page 236 under "D" or whatever. People hack around this by setting manual sort keys like " ".

Stepro,
small problems doing inefficient stuff means there are thousand projects doing inefficient stuff :(

Aryeh,
sure, we could just use automatic sort key hacks too - or have separate column for this or ... (I prefer sortkey hacks :-)

Krinkle,
good idea - do you have any placeholder code for filling in extension parser tags with ""?

ayg wrote:

I also prefer sortkey hacks. Much easier to deploy.

Thanks Domas, your point is taken. I've re-enabled CategoryTree so that we can have functional wikis while a solution is being developed.

dpld wrote:

Thanks Tim for putting common sense first and re-enabling the extension until a real solution is provided.

Proposed fix for the problem of expensive queries returning zero rows in r66987.

I knew it!!! :-)

zero rows is just one of cases, there are other expensive queries that return 10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks that is a feasible fix.

I don't know the technical issues here, but from a user perspective, disabling this without notification when it's widely used across many wikis is less than ideal. If there are performance issues, wouldn't it be best to work them out and put a change in... not just turn it off, unless there is critical reason to do so.

Philippe, developers should've known that subcategory fetches are absolutely inefficient (and features like that have been disabled quite a few times, starting with simple categories listing, then API/ajax, then this).

Instead of fixing the root problem, some people prefer to hide this functionality in 'extensions', probably expecting to have different roll-out and less review/attention (just like with DPL :)

OTOH, users don't announce about their million-page-sized actions beforehand, do they?

(In reply to comment #9)

I knew it!!! :-)

zero rows is just one of cases, there are other expensive queries that return
10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks
that is a feasible fix.

I didn't say it was a feasible fix for the whole bug, just that it was a feasible fix for the case where there are zero subcategories.

makes sense! :) Thanks Tim! This would cover some edge cases like 'Living people', until, of course, there's a single subcategory there.

(In reply to comment #11)
I remember asking before at bug 19640 if this was an issue with CategoryTree too, but never got a response there. Apparently it is, then. Maybe now that Brion is no longer CTO, someone could finally implement the sortkey hack you proposed? ;)

I've committed r67179 now, which comes closer to a complete workaround. I'll deploy it shortly.

For a final fix, we should aim to have a carefully considered design, possibly including solutions for bug 164, bug 1211, etc.

ayg wrote:

I'll be working on this bug. I hope to have a (proper) solution coded up within a couple of weeks. I wrote a post to wikitech-l about it, and encourage people to respond there rather than here (since this involves several bugs):

http://lists.wikimedia.org/pipermail/wikitech-l/2010-July/048399.html

ayg wrote:

This should be fixed in r71174, although I don't understand CategoryTree well enough to claim that it's fixed for all uses of the extension. Running the query that MediaWiki now produces, on a category with 1000 subcategories and 1000 other pages in the category:

mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+

Variable_nameValue

+----------------------------+-------+

Handler_commit0
Handler_delete0
Handler_discover0
Handler_prepare0
Handler_read_first0
Handler_read_key0
Handler_read_next0
Handler_read_prev0
Handler_read_rnd0
Handler_read_rnd_next62
Handler_rollback0
Handler_savepoint0
Handler_savepoint_rollback0
Handler_update0
Handler_write60

+----------------------------+-------+
15 rows in set (0.00 sec)

mysql> SELECT page_id,page_namespace,page_title,page_is_redirect,page_len,page_latest,cl_to,cl_from,cat_id,cat_title,cat_subcats,cat_pages,cat_files FROM page JOIN categorylinks ON ((cl_from = page_id)) LEFT JOIN category ON ((cat_title = page_title AND page_namespace = 14)) WHERE cl_to = 'Test' AND cl_type = 'subcat' ORDER BY cl_type, cl_sortkey LIMIT 200;
...snip...

mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+

Variable_nameValue

+----------------------------+-------+

Handler_commit1
Handler_delete0
Handler_discover0
Handler_prepare0
Handler_read_first0
Handler_read_key404
Handler_read_next199
Handler_read_prev0
Handler_read_rnd0
Handler_read_rnd_next62
Handler_rollback0
Handler_savepoint0
Handler_savepoint_rollback0
Handler_update0
Handler_write60

+----------------------------+-------+
15 rows in set (0.00 sec)