Page MenuHomePhabricator

$wgCapitalLinks should be a per-namespace setting
Closed, ResolvedPublic

Description

In talking on IRC, it was discussed that being able to customize the link capitalization per namespace might be helpful. In trying to implement this, I wrote a new method (Namespace::isUpperCaseNS). However, in trying to implement it, I ran into several issues:

  1. At a few points, $wgCapitalLinks is invoked before a namespace has been selected, as a ucfirst/lcfirst is often applied before passing it to Title::newFromText(). In some cases (such as Special:Upload and FileRepo), it is easy to guess it. However, a few places make it hard to figure it out.
  1. I haven't encountered an issue with this yet, but I didn't know if having it /per/ namespace might have an issue. If one made their content namespace upper, and their associated talk namespace free-form, would this cause any issues in seemingly unrelated areas?

I'm not sure what to do about those places where no namespace has been initialized. As for the second issue, to prevent the issue of mis-matched cases between content/talk, I was thinking of requiring that the namespaces be somehow required in pairs to be upper or free-form.

Hopefully I can submit a patch for this later today.


Version: unspecified
Severity: enhancement

Details

Reference
bz13750

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:06 PM
bzimport set Reference to bz13750.

Created attachment 4817
Patch against 33344

This takes care of everything except for AjaxFunctions, still some debate over whether that's needed.

attachment diff.patch ignored as obsolete

Created attachment 4818
Update against 33356

This also includes the fixes to AjaxFunctions.php (seems to be $wgCapitalLinks isn't needed there after all.

attachment diff.patch ignored as obsolete

Adding this to the list of things which will be solved with the Title Rewrite.

Under the current title setup, trying to alter case sensitivity things like this is going to result in either an extremely hacky method which cannot be relied on. Or complete failure.

The title rewrite allows for extending of the Normalizing system, so when that is all offloaded into a single extensible and modifiable system it becomes possible to define things which should only be done in certain namespaces by testing the data being passed around.

I'll also tag bug 5134.

I'll mark this as LATER for us to come back to. If the Title rewrite will fix a lot of this, then we can wait for that.

Thanks.

I see no reason to believe this requires a massive rewrite to implement.

Most likely, all it needs is a tweak to Title::secureAndSplit(), and some cleanup of a few bits that flip titles around manually.

Adding some more dependencies. Should be fixable pretty soon, cleaning up the work on a patch right now.

Created attachment 5156
Updated patch

This updated patch should take care of all remaining uses of $wgCapitalLinks and now make it a per-namespace setting. Setting to bool retains old behavior of keeping it as a site-wide setting.

The only breakages I see with current behavior is by using the new method of namespace-based settings causes a change in API and export behavior. The siteinfo "case" attribute is no longer shown on sites not using the global bool setting. However, this information is now output in the namespace information, in addition to their ID and subpage support.

Also checked parser tests to make sure it didn't mess up linking, and it all appears to be fine (16 failed tests before and after patching).

attachment Capital.patch ignored as obsolete

Requires XML schema update to export data:

  • 'id' => $ns

+ 'id' => $ns,
+ 'case' => MWNamespace :: isCapitalizedNamespace( $ns ) ? 'first-letter' : 'case-sensitive',

Patch would change current behavior and links -- would make any custom namespaces etc change to the broken case-sensitive behavior. Default needs to retain backwards-compatibility.
-$wgCapitalLinks = true;
+$wgCapitalLinks[ NS_MAIN ] = true;
+$wgCapitalLinks[ NS_PROJECT ] = true;
+$wgCapitalLinks[ NS_IMAGE ] = true;
+$wgCapitalLinks[ NS_TEMPLATE ] = true;
+$wgCapitalLinks[ NS_HELP ] = true;
+$wgCapitalLinks[ NS_CATEGORY ] = true;

PHP 5.3 breakage w/ 'Namespace' references:
+ if ( $this->initialCapital != Namespace::isCapitalizedNamespace( NS_IMAGE ) ) {
+ if( Namespace::isCapitalizedNamespace( NS_IMAGE ) ) {
+ if( Namespace::isCapitalizedNamespace( NS_MAIN ) ) { // Only searching the mainspace anyway
+ if( Namespace::isCapitalizedNamespace( $this->mNamespace ) && $this->mInterwiki == '') {
etc

This doesn't make sense to me; part of the point of making it configurable might presumably be to allow non-caps usernames on offsite wikis:

	/**

+ * These namespaces should always be first-letter capitalized, now and
+ * forevermore. Historically, they could've probably been lowercased too,
+ * but some things are just too ingrained now. :)
+ */
+ private static $alwaysCapitalizedNamespaces = array( NS_SPECIAL, NS_USER, NS_MEDIAWIKI );

I don't like this function name; it's long and redundant (we know it's a namespace, since there's a big fat "MWNamespace::" right before it every time we call)
+ public static function isCapitalizedNamespace( $index ) {

This is incompatible with the array/bool dichotomy, and would also spew errors in cases where there's an array but no specific entry for NS_IMAGE:

  • 'initialCapital' => $wgCapitalLinks,

+ 'initialCapital' => $wgCapitalLinks[ NS_IMAGE ], // No namespace class yet :(

Created attachment 5212
Updated patch

Tweaked the previous patch to incorporate feedback from Brion.

(In reply to comment #9)

Requires XML schema update to export data:

  • 'id' => $ns

+ 'id' => $ns,
+ 'case' => MWNamespace ::
isCapitalizedNamespace( $ns ) ? 'first-letter' : 'case-sensitive',

Patch would change current behavior and links -- would make any custom
namespaces etc change to the broken case-sensitive behavior. Default needs to
retain backwards-compatibility.
-$wgCapitalLinks = true;
+$wgCapitalLinks[ NS_MAIN ] = true;
+$wgCapitalLinks[ NS_PROJECT ] = true;
+$wgCapitalLinks[ NS_IMAGE ] = true;
+$wgCapitalLinks[ NS_TEMPLATE ] = true;
+$wgCapitalLinks[ NS_HELP ] = true;
+$wgCapitalLinks[ NS_CATEGORY ] = true;

I wouldn't think so. If a particular namespace is undefined in $wgCapitalLinks, it ends up returning true. Also, with the is_bool() check, current true/false settings for people will remain as they currently are.

PHP 5.3 breakage w/ 'Namespace' references:
+ if ( $this->initialCapital !=
Namespace::isCapitalizedNamespace( NS_IMAGE ) ) {
+ if( Namespace::isCapitalizedNamespace( NS_IMAGE ) ) {
+ if( Namespace::isCapitalizedNamespace( NS_MAIN ) ) { // Only searching
the mainspace anyway
+ if( Namespace::isCapitalizedNamespace( $this->mNamespace ) &&
$this->mInterwiki == '') {
etc

Fixed, oops.

This doesn't make sense to me; part of the point of making it configurable
might presumably be to allow non-caps usernames on offsite wikis:

/**

+ * These namespaces should always be first-letter capitalized, now and
+ * forevermore. Historically, they could've probably been lowercased
too,
+ * but some things are just too ingrained now. :)
+ */
+ private static $alwaysCapitalizedNamespaces = array( NS_SPECIAL,
NS_USER, NS_MEDIAWIKI );

Made two tweaks to User where names are force-capitalized to require that the capitalization of said namespace be checked. Thus, setting $wgCapitalLinks[ NS_USER ] allows for lowercase usernames now.

I don't like this function name; it's long and redundant (we know it's a
namespace, since there's a big fat "MWNamespace::" right before it every time
we call)
+ public static function isCapitalizedNamespace( $index ) {

Fixed. Now called MWNamespace::isCapitalized().

This is incompatible with the array/bool dichotomy, and would also spew errors
in cases where there's an array but no specific entry for NS_IMAGE:

  • 'initialCapital' => $wgCapitalLinks,

+ 'initialCapital' => $wgCapitalLinks[ NS_IMAGE ], // No
namespace class yet :(

$wgLocalFileRepo is now configured in Setup without establishing it's initialCapital setting now. Instead, FileRepo defalts to NS_IMAGE's capitalization during setup rather than defaulting true (which was broken, should've defaulted to $wgCapitalLinks).

attachment Capital.patch ignored as obsolete

Bryan.TongMinh wrote:

Still applies cleanly and works :)

There's several calls to Namespace::isCapitalized -- these will fail on PHP 5.3 and later. Make sure all calls are to MWNamespace class.

It may be worth adding a standard function for normalizing a title prefix -- either on MWNamespace or Title -- since I see a lot of these are checks around a ucfirst() call for a term to be used in a prefix search.

The change to the export format requires an update to the export schema -- new version and updated schema file.

DefaultSettings.php fails to establish an initial values for $wgCapitalLinks -- register_globals vulnerability and may show an E_NOTICE warning.

The default set for $wgCapitalLinks set for specific namespaces means that all custom namespaces will end up being fully case-sensitive (not enforcing the initial caps) which is an unacceptable change in behavior.

No longer returning the site-wide case setting in API siteinfo and Special:Export siteinfo could lead to compatibility problems with bot tools.

(In reply to comment #12)

No longer returning the site-wide case setting in API siteinfo and
Special:Export siteinfo could lead to compatibility problems with bot tools.

Perhaps at some point we should add a title normalization or other type of title handling module to the API. And recommend bots start to make use of the module rather than trying to imitate MW's Title class and normalize everything on their own for comparison.

Other than just per-namespace case sensitivity there are a number of other title normalization alteration requests floating around. Complete case insensitivity, changing how underscores and spaces are treated (in some cases wiki want - rather than _), and so on.

(In reply to comment #13)

(In reply to comment #12)

No longer returning the site-wide case setting in API siteinfo and
Special:Export siteinfo could lead to compatibility problems with bot tools.

Perhaps at some point we should add a title normalization or other type of
title handling module to the API. And recommend bots start to make use of the
module rather than trying to imitate MW's Title class and normalize everything
on their own for comparison.

We already have that. http://en.wikipedia.org/w/api.php?action=query&titles=uSeR_tAlK:catrope|main_page will normalize your titles just fine. And with &redirects it'll even resolve redirects.

Bryan.TongMinh wrote:

Updated patch

  • Added Title::capitalize function
  • Return at least something for the case setting in ApiQuerySiteInfo and Export
  • If nothing set for a namespace it will default to capitalize

I have no clue about XML schemas so somebody else'd better do that.

attachment patch.txt ignored as obsolete

Created attachment 5863
Upated patch

Handles everything identical to Bryan's patch, plus a few things

  • Updated code to current standards (and applies cleanly to head)
  • Update XML schema from 0.3 to 0.4 (updated Export to indicate this).

attachment wgcapitallinks.patch ignored as obsolete

+<schema xmlns="http://www.w3.org/2001/XMLSchema"
+ xmlns:mw="http://www.mediawiki.org/xml/export-0.3/"
+ targetNamespace="http://www.mediawiki.org/xml/export-0.3/"
+ elementFormDefault="qualified">

^ Either the version number should be bumped here or we should go ahead and clean up the schema/namespace.... Probably the namespace URL should *not* include the version, and the version should be in a separate 'version' element. It *should* be in the XML Schema URL, of course.

+ $wgCapitalLinks is a per namespace setting
+
Return something sensible so that bots don't choke
+ $data['case'] = 'per-namespace';

If the default site behavior is going to be unchanged from the old default, we probably shouldn't change our output here. Either the default should still be a blanket 'true', or we should go ahead and output a firm answer here if everything's set to true.

+ * @since 1.14 - This can now be set per-namespace. Some special namespaces (such
^ needs updating to 1.15 :)

-$wgCapitalLinks = true;
+$wgCapitalLinks = array();
+$wgCapitalLinks[ NS_MAIN ] = true;
+$wgCapitalLinks[ NS_USER ] = true;
+$wgCapitalLinks[ NS_PROJECT ] = true;
+$wgCapitalLinks[ NS_FILE ] = true;
+$wgCapitalLinks[ NS_TEMPLATE ] = true;
+$wgCapitalLinks[ NS_HELP ] = true;
+$wgCapitalLinks[ NS_CATEGORY ] = true;
^ If the default is 'true' for all unset namespaces, there's no need to list any explicitly.

It may make more sense to keep $wgCapitalLinks as a way to set the default, and have a second config variable with per-namespace overrides. That would also allow a site which is mostly case-sensitive to define a single forced-capital namespace without explicitly setting every other standard and custom namespace.

Created attachment 5866
Update again

  • $wgCapitalLinks is now a boolean again, setting the default across all namespaces ('per-namespace' has been dropped from XSD and API/Export output)
  • $wgCapitalLinkOverrides is the array that allows per-namespace values. This info is still given in the case attribute of namespace XML.
  • XSD updated to 0.4. Tweaked URLs and added version attribute.

Attached:

No more feedback? Should ^demon go ahead and apply?

If this is ready then feel free to add to the newly created export-0.4.xsd created on r54472. I'd like to release it this week to make the snapshots fully documented by the new schema definition.

Need Brion to give this a once-over one last time...never got any feedback after the February 25 patch.

Done in r57558. Merged into 0.4 XSD rather than bumping to 0.5

gangleri wrote:

Thanks for fixing this!

*** Bug 9254 has been marked as a duplicate of this bug. ***