Page MenuHomePhabricator

generateSitemap.php neglects newlines for unpopulated namespaces
Closed, ResolvedPublic

Description

Gentlemen, let us observe what happens to the output one sees in reports
made to STDOUT (not the sitemaps themselves) by generateSitemap.php when
not all of the e.g.,

$wgSitemapNamespaces=array(NS_MAIN,NS_PROJECT,NS_TEMPLATE_TALK,NS_HELP_TALK,NS_CATEGORY_TALK,NS_USER);

in ones LocalSettings.php are populated:

0 () /home/jidanni/mediawiki/sitemap-mwabj-NS_0-0.xml.gz
4 (ABJ)11 (Template talk)13 (Help talk)15 (Category talk)2 (User) /home/jidanni/mediawiki/sitemap-mwabj-NS_2-0.xml.gz

The problem is if there are no items to print for a given namespace, the line

$this->output( "\t$this->fspath$filename\n" );

will not fire, thus no "\n" will get printed. Thus the next line just
gets stuck upon the first, garbling the report!


Version: 1.17.x
Severity: trivial

Details

Reference
bz26134

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:22 PM
bzimport set Reference to bz26134.
bzimport added a subscriber: Unknown Object (MLST).

Care to provide a patch if, and I do believe your tag, this is easy?

reedy@ubuntu64-esxi:~/mediawiki/trunk/phase3/maintenance$ php generateSitemap.php
0 () /home/reedy/mediawiki/trunk/phase3/maintenance/sitemap-wikidb-mw_-NS_0-0.xml.gz
2 (User) /home/reedy/mediawiki/trunk/phase3/maintenance/sitemap-wikidb-mw_-NS_2-0.xml.gz
6 (File) /home/reedy/mediawiki/trunk/phase3/maintenance/sitemap-wikidb-mw_-NS_6-0.xml.gz
14 (Category) /home/reedy/mediawiki/trunk/phase3/maintenance/sitemap-wikidb-mw_-NS_14-0.xml.gz

I'm failing to see an issue here? Please elabourate more

I dare not look at any more code, but instead just prove it to you again:
22:46 ~$ cd transgender-taiwan.org/maintenance/
22:46 maintenance$ php generateSitemap.php
0 () /home/jidanni/mediawiki/maintenance/sitemap-transgender-NS_0-0.xml.gz
4 (蝶園)22:46 maintenance$
Note ^the neglected newline?
That's because my namespace 4 just happens to have 0 articles on my wiki.

So try again with an unpopulated namespace or two added to the list of candidates.

Ok, I can see that.

I'm just trying to replicate it. If I just add a new namespace, it doesn't generate a sitemap for it.

And just does the ones that do.

Have you done anything else to get it to appear there? Set it as a content namespace or anything?

Seemingly, the fix is
Change
$this->output( "\t$this->fspath$filename\n" );
to
$this->output( "\t$this->fspath$filename" );

And then add

$this->output( "\n" );

after the if.

I'll attach a patch, can you confirm if this fixes your issue?

Created attachment 8121
add newline

attachment bug26134.patch ignored as obsolete

Now it's
$ php generateSitemap.php
0 () /home/jidanni/mediawiki/maintenance/sitemap-transgender-NS_0-0.xml.gz

4 (蝶園)01:08 maintenance$

If you can tell me how to replicate the lack of a path being printed, i can try and help more

Fill up your $wgSitemapNamespaces=array(NS_MAIN,NS_PROJECT... with lots of never uses namespaces, then run generateSitemap.php, and post what you get.

It's ok that no path is printed. It is not OK that no newline is printed.

What if we formatted the output like this (have local patch, looks ok to me)

maintenance chad$ php generateSitemap.php
0 ()
/www/maintenance/sitemap-wikidb-mw_-NS_0-0.xml.gz
2 (User)
/www/phase3/maintenance/sitemap-wikidb-mw_-NS_2-0.xml.gz
14 (Category)
3 (User talk)
/www/phase3/maintenance/sitemap-wikidb-mw_-NS_3-0.xml.gz
maintenance chad$

Chad, that looks a reasonably sane way of outputting it...

Why is it 0 2 14 3, and not 0 2 3 14?

Because $wgSitemapNamespaces isn't sorted, and I put them in my list in a semi-random order.

Created attachment 8215
hard to understand

I still say this is very hard to understand. Especially when done for more than one site. Perhaps a header could be added saying what the columns mean or something.

Attached: