Page MenuHomePhabricator

meta=siteinfo&siprop=namespaces|namespacealiases should return canonical names as well
Closed, ResolvedPublic

Description

meta=siteinfo&siprop=namespaces|namespacealiases should return canonical names as well

Current output (excerpt of cswiki):

<query>

<namespaces>
  <ns id="-2">Média</ns>
  <ns id="-1">Speciální</ns>
  <ns id="0" subpages="" />
  <ns id="1" subpages="">Diskuse</ns>

...

<ns id="6">Soubor</ns>
<ns id="7" subpages="">Soubor diskuse</ns>

...

</namespaces>
<namespacealiases>
  <ns id="4">WP</ns>
  <ns id="6">Image</ns>
  <ns id="7">Image talk</ns>
</namespacealiases>

</query>

Suggested output:

<query>

<namespaces>
  <ns id="-2" canonical="Media">Média</ns>
  <ns id="-1" canonical="Special">Speciální</ns>
  <ns id="0" canoncial="" subpages="" />
  <ns id="1" canonical="Talk" subpages="">Diskuse</ns>

...

<ns id="6" canonical="File">Soubor</ns>
<ns id="7" canonical="File talk" subpages="">Soubor diskuse</ns>

...

</namespaces>
<namespacealiases>
  <ns id="4" canonical="Project">WP</ns>
  <ns id="6" canonical="File">Image</ns>
  <ns id="7" canonical="File talk">Image talk</ns>
</namespacealiases>

</query>

or create siprop=canonicalnamespaces

Any solution to deliver canonical namespaces via API is welcome.


Version: 1.14.x
Severity: enhancement

Details

Reference
bz16672

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:29 PM
bzimport set Reference to bz16672.

soxred93 wrote:

Assigning to me, won't be too difficult to fix.

Not sure about putting them in the aliases too, perhaps just in the normal namespaces. Aliases don't have canonical names, they redirect to the localized name.

Please also consider adding the aliases, that are defined at $wgContLang->namespaceAliases (i.e. in MessagesXX.php). AFAICT they are missing in the current output.

thanks for fixing this, but the way it is now seems grossly inconsistent. Specifically, the following points seem odd:

a) canonical names use underscores while the rest uses space
b) custom namespaces shouldn't have a canonical name
c) having canonical names in attributes vs. aliases in a separate section.

Why and how canonical names are different from aliases is something users (and even users of the api) should not need to care about. It's an artifical, technical distinction which might go away at some point.

I suggest to unify the output format:

<namespaces>

<ns id="-2" preferred="yes">Média</ns>
<ns id="-2" canonical="yes">Media</ns>
<ns id="-1" preferred="yes">Speciální</ns>
<ns id="-1" canonical="yes">Special</ns>
<ns id="0" canonical="yes" preferred="yes"/>
<ns id="1" preferred="yes">Diskuse</ns>
<ns id="1" canonical="yes">Talk</ns>
<ns id="4" preferred="yes">WP</ns>
<ns id="4" canonical="yes">Project</ns>
<ns id="6" preferred="yes">Soubor</ns>
<ns id="6" canonical="yes">File</ns>
<ns id="6" >Image</ns>
<ns id="7" preferred="yes">Soubor diskuse</ns>
<ns id="7" canonical="yes">File talk</ns>
<ns id="7" >Image talk</ns>

</namespaces>

That is, for each entry, just say if it's preferred (i.e. what the system itself generates) and if it's canonical. But that info can easily be ignored, and I can build a list of all namespaces and a map namespace ids without having to think about canonical, alias, etc.

If this info is split up into separate sections or not, i don't really care.

Info about subpages would have to be delivered separately i guess. Or we could use a more complex scheme:
<namespaces>

<ns id="6" subpages="">
  <nsname canonical="yes">File</nsname>
  <nsname deprecated="yes">Image</nsname>
  <nsname preferred="yes">Soubor</nsname>
  <nsname >Obraz</nsname>
</ns>

</namespaces>

that should cover everything. but would be quite incompatible to what we have now.

(In reply to comment #5)

thanks for fixing this, but the way it is now seems grossly inconsistent.
Specifically, the following points seem odd:

a) canonical names use underscores while the rest uses space

That's a bug, which I fixed in r44769.

b) custom namespaces shouldn't have a canonical name

Maybe, maybe not; I see arguments for and against. But since $wgCanonicalNames contains canonical names for custom namespaces too and since removing the canonical attribute for some namespaces but not others would violate expectations and be a breaking change, I'll just keep stuff the way it is. Regardless of whether custom namespaces should or shouldn't have a canonical name, removing it from the API output isn't worth the trouble.

c) having canonical names in attributes vs. aliases in a separate section.

That makes sense because a namespace can have multiple aliases, but has only one canonical name.

Why and how canonical names are different from aliases is something users (and
even users of the api) should not need to care about. It's an artifical,
technical distinction which might go away at some point.

I disagree. One major distinction is that canonical names are guaranteed to work on every wiki regardless of language or configuration (except of course when they change, like with Image -> File; ideally, though, canonical names don't change), while aliases and localized names only work if the configuration is right.

I suggest to unify the output format:

<namespaces>

<ns id="-2" preferred="yes">Média</ns>
<ns id="-2" canonical="yes">Media</ns>
<ns id="-1" preferred="yes">Speciální</ns>
<ns id="-1" canonical="yes">Special</ns>
<ns id="0" canonical="yes" preferred="yes"/>
<ns id="1" preferred="yes">Diskuse</ns>
<ns id="1" canonical="yes">Talk</ns>
<ns id="4" preferred="yes">WP</ns>
<ns id="4" canonical="yes">Project</ns>
<ns id="6" preferred="yes">Soubor</ns>
<ns id="6" canonical="yes">File</ns>
<ns id="6" >Image</ns>
<ns id="7" preferred="yes">Soubor diskuse</ns>
<ns id="7" canonical="yes">File talk</ns>
<ns id="7" >Image talk</ns>

</namespaces>

That is, for each entry, just say if it's preferred (i.e. what the system
itself generates) and if it's canonical. But that info can easily be ignored,
and I can build a list of all namespaces and a map namespace ids without having
to think about canonical, alias, etc.

I don't like this scheme, especially because the number of tags with <ns id="-2" ... is variable. Also, in JSON and other formats, the ns ID is also used as array key, which would be broken by this scheme.

If this info is split up into separate sections or not, i don't really care.

Info about subpages would have to be delivered separately i guess.

That makes no sense. We might as well add the subpages attribute to the <ns> tags. The fact that that duplicates is just another indicator that the aforementioned scheme is poorly designed.

Or we could
use a more complex scheme:
<namespaces>

<ns id="6" subpages="">
  <nsname canonical="yes">File</nsname>
  <nsname deprecated="yes">Image</nsname>

Note that deprecatedness isn't defined anywhere, so this particular bit of information (deprecated="yes") is not available.

  <nsname preferred="yes">Soubor</nsname>
  <nsname >Obraz</nsname>
</ns>

</namespaces>

that should cover everything. but would be quite incompatible to what we have
now.

This scheme makes finding the canonical or preferred name for a namespace more difficult, because a client would have to iterate over all <nsname> elements and find the right one. The current scheme doesn't have that problem.

The fact that namespaces and namespacealiases are separate siprops may be weird, but other than that the current scheme is not bad at all. All the other schemes you've suggested makes it harder for a client to extract one particular piece of information, especially if it uses a non-XML format (because of array indices).