Page MenuHomePhabricator

Random page in this category feature
Closed, ResolvedPublic

Description

Author: hemanshu_desai

Description:
Mediawiki should have a random page in this category feature


Version: unspecified
Severity: enhancement

Details

Reference
bz2170

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:29 PM
bzimport set Reference to bz2170.
bzimport added a subscriber: Unknown Object (MLST).

wouldn't [[special:Random/Category]] do that?

robchur wrote:

No. That retrieves a random category. The request is for a way of generating a
random page from a specified category.

Wiki.Melancholie wrote:

*** Bug 5399 has been marked as a duplicate of this bug. ***

anthony.bentley wrote:

My thoughts are that this could be either by adding arguments to
SpecialRandompage or by creating a new special page for this purpose.
SpecialRandompage currently has the argument of the namespace in which it
searches.
If carrying out a search within a category then shouldn't be required as
the search would always be over the main namespace. Not aware that users
etc. can assign themselves to categories?
Would this only be required over the main namespace or is there a good
reason for allowing choice of namespace?
Is category (cl_from) the only index that it's worth filtering on or is it
worth having additional functionality to choose the index?

badock wrote:

I had the same idea, but with Portals instead of Categories.

dan.bolser wrote:

The DPL extension can do this (random n pages in category X).

DPL should be a part of MediaWiki :D

ayg wrote:

This probably needs a cl_random column. Marking schema-change.

dan.bolser wrote:

How about some docs as to how the feature works? Should I add a 'documentation' bug?

ayg wrote:

Note, the patch might not be acceptable for efficiency concerns. It should stay in the software but might not be enabled on Wikimedia sites.

Wiki.Melancholie wrote:

*** Bug 17068 has been marked as a duplicate of this bug. ***

Wiki.Melancholie wrote:

As this is not fixed for Wikimedia wiki, a nice toolserver link:

happy.melon.wiki wrote:

*** Bug 23181 has been marked as a duplicate of this bug. ***

Vasiliev implementation was reverted on r27436.

However, I don't think we can make it faster without adding a category_random, it seems quite good:

EXPLAIN SELECT page_namespace, page_title FROM page USE INDEX(page_random) JOIN categorylinks ON page_id = cl_from WHERE page_is_redirect = 0 AND page_random >= 0.15564 AND cl_to = 'GFDL' ORDER BY page_random LIMIT 1;

Both select_types SIMPLE:
+-------------+------+-----------------+-------------+------------------------+

tabletypekeykey_lenrefExtra

+-------------+------+-----------+-----+-------------+------------------------+

pagerangepage_random8NULLUsing where
categorylinkseq_refcl_from261page_id,constUsing where; Using index

+-------------+------+-----------+-----+-------------+------------------------+

It would change to Using temporary; Using filesort if we weren't using a LIMIT, but that's not the case.
Accessing the pages on the category is O(1), the problem is that for all the results it needs to go to page to see the page_random. And for large categories that would be a worse case of checking thousands of entries.
My testing shows that in practise it is run in a fraction of second, probably due to the index + random numbers uniformly distributed.

ayg wrote:

AFAICT, that will have to scan the entire page_random index in the worst case, e.g., if there are no actual pages in the category. You left out part of the EXPLAIN -- this is full thing for me on enwiki (on toolserver).

  • 1. row ******* id: 1 select_type: SIMPLE table: page type: range

possible_keys: page_random

    key: page_random
key_len: 8
    ref: NULL
   rows: 10001064
  Extra: Using where
  • 2. row ******* id: 1 select_type: SIMPLE table: categorylinks type: eq_ref

possible_keys: cl_from,cl_timestamp,cl_sortkey

    key: cl_from
key_len: 261
    ref: enwiki.page.page_id,const
   rows: 1
  Extra: Using where; Using index

Note rows: 10001064. Try running that on a large database with 'GFDL' replaced by 'Nonexistent category' and you'll see it takes forever. It's O(N) in number of pages in the worst case, only acceptable for very small sites.

Yes, I tried to fit it into bugzilla width.
Hmm. You are right. Is it checking page table before categorylinks? Checking categorylinks first, it should be immediate if there are no pages in the category, but O(N) pages in the category otherwise.
mysql should have implemented a 2choose a random row matching this" :(

ayg wrote:

If it checks category first, it can't use the page_random index, so it's O(N log N) in the size of the category to sort its contents. You may as well skip the page table join and ORDER BY RAND() in that case.

I don't think it would have been trivial for MySQL to implement efficient "pick a random row" without some kind of special index. In any event, they don't, so we need cl_random if we really want this enough.

  • This bug has been marked as a duplicate of bug 15824 ***

*** Bug 15824 has been marked as a duplicate of this bug. ***

Either there is no documentation for this fix or the fix was never reinstated after it was reverted. http://www.mediawiki.org/wiki/Help_talk:Random_page/Archive_1 It is being asked for by others, and now I've found need of something like this myself on en.wikipedia. There is a banner for Today's Article For Improvement that is currently populated using a bot and a lot of "template" style pages with less than optimal code that could greatly be simplified if I could use a [[:Special:Random/Category:This_weeks_TAFIs]] to pick a random article from a category for the week. The only "extra" that I would ask is that the page would be picked on page load instead of clicking on the link so that the link when using the "piping trick" would show the name of the article it was going to take you to. Can this be done?

(In reply to comment #22)

Either there is no documentation for this fix or the fix was never reinstated
after it was reverted.
http://www.mediawiki.org/wiki/Help_talk:Random_page/Archive_1 It is being
asked for by others, and now I've found need of something like this myself on
en.wikipedia. There is a banner for Today's Article For Improvement that is
currently populated using a bot and a lot of "template" style pages with less
than optimal code that could greatly be simplified if I could use a
[[:Special:Random/Category:This_weeks_TAFIs]] to pick a random article from a
category for the week. The only "extra" that I would ask is that the page
would be picked on page load instead of clicking on the link so that the link
when using the "piping trick" would show the name of the article it was going
to take you to. Can this be done?

http://en.wikipedia.org/wiki/Wikipedia_talk:Today%27s_articles_for_improvement#Teahouse_TAFI_banner is the link to the full discussion for using this feature.

(In reply to comment #22)

Either there is no documentation for this fix or the fix was never reinstated
after it was reverted.

No, the extension in comment 20 was simply never deployed to WMF sites.

(In reply to comment #24)

(In reply to comment #22)

Either there is no documentation for this fix or the fix was never reinstated
after it was reverted.

No, the extension in comment 20 was simply never deployed to WMF sites.

Can this bug be re-opened since there was no actual implemented fix?

(In reply to comment #25)

Can this bug be re-opened since there was no actual implemented fix?

No, this bug (implementing such a feature in MediaWiki) is fixed. If you want it deployed on enwiki or another WMF site, file a new bug under the Wikimedia category.