Page MenuHomePhabricator

Limiting sitemap items by namespace
Closed, ResolvedPublic

Description

Author: sergey.chernyshev

Description:
Namespace limit patch for generateSitemap.php

Sometimes in addition to restricting crawling by robots.txt, it's a good idea to limit a list of what goes to a sitemap.

E.g. it might be helpful in case some extensions create non-content namespaces (similar to MediaWiki namespace) and it doesn't make sense to include them into sitemap.

Attached is a patch to allow user specify a list of namespaces for which to generate sitemaps.


Version: unspecified
Severity: enhancement

attachment sitemaps_namespace_limit.patch ignored as obsolete

Details

Reference
bz12860

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:02 PM
bzimport set Reference to bz12860.
bzimport added a subscriber: Unknown Object (MLST).

Perhaps do this via command-line options instead of a site config var?

sergey.chernyshev wrote:

Not sure - there are quite a lot of namespaces in configuration usually and it's pain in the neck to put them all into command line, besides, LocalSettings.php is usually changed a lot anyway and adding more to it is quite usual (I have tons of things in there).

Also, it seems that having black list instead of white list might also be good idea because there are fewer namespaces to exclude and this list is usually constant - if someone adds new namespace, it's most probably content and should be indexed by crawlers.

I've added changes for exclusion and fixed a bug with undefined variable.

sergey.chernyshev wrote:

Patch to implement blacklisting and whitelisting namespaces for sitemap generation

Attached:

sergey.chernyshev wrote:

Oops. Patch also adds full server URL to sitemap - I believe I saw a bug for it, but can't remember where.
Feel free to remove the change if you don't feel like adding it.

robert wrote:

Fixed in r33498 using slightly modified version of the first patch. Exclusion may be added later, but it seams a rather large jump from no discrimination whatsoever; I suggest you open another bug for that, however.

Perhaps add an example of usage. Say: In LocalSettings.php put:
$wgSitemapNamespaces=array

(NS_MAIN,
 NS_TALK,

...

NS_CATEGORY,
NS_CATEGORY_TALK,
);

Actually it seems a waste to put it in LocalSettings.php as is might be used only 1/999999 of the times that file is read...