Page MenuHomePhabricator

Sitemap doesn't count language variant entries against the url_limit, causing it to be rejected
Open, LowPublicFeature

Description

I am using Manual:GenerateSitemap.php to generate sitemap now.

Is it possible to disable language variants link in sitemap?
For example the sitemap will contain language variants in URL which actually all point to same page.

I only want to keep the link to "http://zh.moegirl.org/Help:DynamicPageList". How?

<url> <loc>http://zh.moegirl.org/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url> <url> <loc>http://zh.moegirl.org/zh-cn/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url> <url> <loc>http://zh.moegirl.org/zh-tw/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url>


Version: 1.22.4
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:03 AM
bzimport set Reference to bz63098.
bzimport added a subscriber: Unknown Object (MLST).

Hi zoglun. This does not sound like something is wrong in the code of MediaWiki (a so-called "bug"), but instead like a support request (how to change settings, questions how to do something, etc.). bugzilla.wikimedia.org is only for specific bug reports and enhancement requests.
Please use https://www.mediawiki.org/wiki/Project:Support_desk for support requests to make sure that functionality does not exist; afterwards an enhancement request can be filed in Bugzilla. Thanks!

This is quite confusing, I ask same question at the support dest and people said I should report it here: https://www.mediawiki.org/wiki/Thread:Project:Support_desk/How_to_separate_sitemap_into_series_small_file%3F

Neither Google nor Baidu(largest search engine in China) accept my sitemap due to "over maximum links limitation"

Their is a "$wgDefaultLanguageVariant = "zh-cn";" which allows me to set the default language variant while still keep the variant function. But the GenerateSitemap.php still generate three link for three exactly the same page.

The purpose of sitemaps is to allows search engine collect pages. When these engine refuse to record the links in sitemap.xml, that's a bug. Isn't it?

(In reply to zoglun from comment #2)

I ask same question at the support dest and people said I should report it here

That's very good and information which is already welcome when reporting a ticket here. :)

The phrasing of this ticket as a question makes it sound like you simply do not know if it's possible and have a support question; however the comments in the Support Desk ("I don't think you can split the sitemap up like that") make clear that it's very likely not possible currently and hence a valid feature request.

Nbdd0121 subscribed.

This is a bug indeed. The 50,000 url limit is currently imposed on number of articles, instead of number of entries generated.

Change 290143 had a related patch set uploaded (by Nbdd0121):
Count language variant sitemap entries for url_limit

https://gerrit.wikimedia.org/r/290143

Ciencia_Al_Poder renamed this task from Separate sitemap into a series of smaller files (for search engines) to Sitemap doesn't count language variant entries against the url_limit, causing it to be rejected.Aug 1 2019, 6:16 PM

Change 609513 had a related patch set uploaded (by VulpesVulpes825; owner: VulpesVulpes825):
[mediawiki/core@master] Write language varaint link as child element rather than individual entry in sitemap

https://gerrit.wikimedia.org/r/609513

Change 609513 abandoned by VulpesVulpes825:

[mediawiki/core@master] Write language varaint link as child element rather than individual entry in sitemap

Reason:

https://gerrit.wikimedia.org/r/609513

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM
Aklapper removed a subscriber: wikibugs-l-list.

Change 808009 had a related patch set uploaded (by PleaseStand; author: PleaseStand):

[mediawiki/core@master] generateSitemap.php: Fix a couple limit checking bugs

https://gerrit.wikimedia.org/r/808009

Aklapper added a subscriber: VulpesVulpes825.

@VulpesVulpes825: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!