SiteLinkTable should apply light weight normalization to page titles before storing the. This would avoid issues with specifying titles with or without spaces as parameters to API calls, etc.
The following normalization should be applied:
- strip leading and trailing whitespace
- unicode normalization
- converting underscores to spaces (currently, the items_per_site table uses spaces in the page titles, in violation of current practice elsewhere in the database schema)
The following normalization should not be applied:
- namespace normalization (this requires knowledge of the target wiki's config)
- first letter capitalization (requires knowledge about the target wiki's content language, but also about namespaces)
- redirect resolution (requires access to the target wiki's database)
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=45111