Page MenuHomePhabricator

Multiple ranges can be specified in backlinks query, also implicit equality propagation used
Closed, ResolvedPublic

Description

first of all, page_id is not in pagelinks table index, pl_from should be used as predicate (I assume people enjoyed 5.0 behavior here ;-)

also, if multiple plnamespace/pltitles are specified, should not be possible to limit pl_from/page_id as index reads are terminated by multiple-reads from plnamespace/pltitle level, and efficient range read from index is not possible.


Version: 1.14.x
Severity: major
URL: http://p.defau.lt/?jn_bId8zq_f3pcb3ugLkWA

Details

Reference
bz16076

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:17 PM
bzimport set Reference to bz16076.

Could you be a little more clear in your suggestions for improvement? From the first paragraph I deduced that page_id>=123 should be pl_from>=123, which I assume is what you mean by "implicit equality propagation" in the summary.

As to the second paragraph: does this mean that if multiple values for (pl_namespace, pl_title) are queried, the page_id>=123 (or pl_from>=123) clause shouldn't be there?

Fixed in r42494 and r42512.

For your entertainment: it turns out the filesort that happened when pl_from>=123 was set and multiple (pl_namespace, pl_title) pairs were queried actually pointed me to a more fundamental bug, which involved list=backlinks&blredirect dropping results under certain conditions. The lesson here is that because we have sane indices on the pagelinks table, non-indexed (and therefore inefficient) queries are usually buggy. I never expected this database performance hell to actually fix my bugs for me...

About the "certain conditions": assume B and C are redirects to A, D and E link to B and F links to C. Also, E's pageID is larger than D's, while F's is smaller than D's. If bllimit is set in such a way that the result is cut off after D (i.e. D is the last result), the continued query (with query-continue) will list E but not F, while F should be listed. I know this is complex like hell and it took me about 20 minutes to come up with this example. The moral of this story is that you should always build your queries around indices and never throw in unindexed stuff unless you really know what it does.