Page MenuHomePhabricator

add comment to generateSitemap.php saying what is encoded
Closed, ResolvedPublic

Description

In generateSitemap.php kindly add a comment explaining what

$title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" );

is all about.
Say what is encoded in the string. Or what encoding it is.


Version: 1.15.x
Severity: trivial

Details

Reference
bz17961

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:31 PM
bzimport set Reference to bz17961.
bzimport added a subscriber: Unknown Object (MLST).

It seems to be constructing a title consisting of 255 bytes, each of which needs to be URL-encoded: that should be about as long as a URL for a page in the given namespace can get.

The title is valid UTF-8, and consists of 63 repeats of the 4-byte character

WTF... Bugzilla truncated my comment when I tried to include the 4-byte Unicode character in it. Trying again with the actual characters removed:

The title is valid UTF-8, and consists of 63 repeats of the 4-byte character U+28B81 followed by the 3-byte character U+5583. My browser can't display the first character, but Googling for it led me to [[Prince of Tang (Shaowu)]] where it is used in the subject's personal name and discussed in a footnote. The second character apparently means "keep talking, chattering; mumble" according to http://en.wiktionary.org/wiki/%E5%96%83

Ps. Explanatory comment added in r79769.

OK, so they are just using some fun arbitrary characters maybe.
Perhaps add a note somewhere saying how such strings will be used. It's not too clear from the code.