Page MenuHomePhabricator

Fix translateAndCapitalizeNamespaces for Portuguese
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1323/
Reported by: heldergeovane
Created on: 2011-06-30 14:48:20

Per discussion on
https://pt.wikipedia.org/wiki/Wikipédia:Esplanada/propostas/Incentivar_o_uso_de_"Imagem"_em_vez_de_"Arquivo"_ou_"Ficheiro"_(12mar2011)?uselang=en
please, change the function translateAndCapitalizeNamespaces (from cosmetic_changes.py) so that the bots stop doing the following chnges:

  • Image --> Ficheiro
  • File --> Ficheiro
  • Arquivo --> Ficheiro
  • Imagem --> Ficheiro

This is necessary in order to avoid linguistic problems, considering that "Arquivo" is the preferred word on Brazil but "Ficheiro" is preferred on Portugal.

For image files, the word "Imagem" is common to both Portuguese variants, and as such it is preferred, so this should be the name used when changing the namespace name of images. The use of "Ficheiro" and "Arquivo" is preferred only for other kinds of files (such as PDF or OGG), which are not images.

So, in short, the bots should do the following changes:

  • For images (i.e. files with one of the following extensions: png, gif, jpg, jpeg, svg, tiff, tif), change:
    • Image --> Imagem
    • File --> Imagem
    • Ficheiro --> Imagem
    • Arquivo --> Imagem
  • For other files (i.e. files with one of the following extensions: xcf, pdf, mid, ogg, ogv, djvu, oga):
    • Arquivo --> Do not change (we should respect the variant used by the editors)
    • Ficheiro --> Do not change (we should respect the variant used by the editors)
    • File --> Do not change (or change randomly to "Ficheiro" or "Arquivo", since it is indeed a "file" and both pt and pt-BR are acceptable)
    • Image --> Do not change (or change randomly to "Ficheiro" or "Arquivo", since it is indeed a "file" and both pt and pt-BR are acceptable)
    • Imagem --> Do not change (or change randomly to "Ficheiro" or "Arquivo", since it is indeed a "file" and both pt and pt-BR are acceptable)

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:29 AM
bzimport set Reference to bz55242.
bzimport added a subscriber: Unknown Object (????).

Raising the priority since this bug is still afecting bots on every Portuguese wikis.

The bot doesn't see the extension of that links. For implement this behavior, that code needs to be redesigned. Maybe a future feature. If there is a way for fixing namespace aliases without looking at the extension, we could do it sooner. I've deactivated translateAndCapitalizeNamespaces for the file namespace now.

  • assigned_to: nobody --> xqt
  • status: open --> open-later

I guess the prio could be degraded since the code is deactivated

It was reported against compat, but probably exists in the same code that appears in core.

jayvdb set Security to None.
jayvdb moved this task from Backlog to Wikimedia prod/Cloud Services issues on the Pywikibot board.
jayvdb removed a subscriber: Unknown Object (????).
jayvdb removed a project: Pywikibot-compat.

Removed compat, as this feature request is unlikely to be implemented there.

Hi. This bug is rather old. Is the issue reported here still happening? Thanks.

MarcoAurelio raised the priority of this task from Medium to Needs Triage.Dec 2 2017, 6:55 PM

(retriage needed, no action on this since years)

Dvorapa moved this task from Backlog to Needs Review on the Pywikibot board.
Xqt triaged this task as Lowest priority.May 18 2018, 10:30 AM
Xqt added a project: good first task.

It's not a bug, it's a feature request.

@Xqt I cannot see this script in pywikibot-core. Does this still exist? Thanks.

I was looking for translateAndCapitalizeNamespaces.py yet indeed it sounded like something for cosmetic_changes.py. Thanks.

So to sum up you want cosmetic_changes not to change links containing Imagem.

Change 441180 had a related patch set uploaded (by MarcoAurelio; owner: MarcoAurelio):
[pywikibot/core@master] [IMPR|WIP] cosmetic_changes: skip changing 'Imagem' links for pt.* wikis

https://gerrit.wikimedia.org/r/441180

But only for these files: png, gif, jpg, jpeg, svg, tiff, tif
For xcf, pdf, mid, ogg, ogv, djvu, oga the behavior should stay as before, just Arquivo shouldn't be changed (or none of them)

Maybe it'd be easier to just ignore the whole NS:6 for pt.* wikis?

Maybe it'd be easier to just ignore the whole NS:6 for pt.* wikis?

This is the current behavior of cosmetic_changes. This task basically waits to some file link file extension parsing in cc.

Change 441180 abandoned by MarcoAurelio:
[IMPR|WIP] cosmetic_changes: skip changing 'Imagem' links for pt.* wikis

https://gerrit.wikimedia.org/r/441180

Dvorapa raised the priority of this task from Lowest to Low.
Dvorapa moved this task from Backlog to Deactivated code on the Pywikibot-cosmetic-changes.py board.
Dvorapa moved this task from Backlog to Doing on the good first task board.

Change 441200 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] [IMPR] Fix Portuguese file namespace translation in cc

https://gerrit.wikimedia.org/r/441200

Change 441200 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Fix Portuguese file namespace translation in cc

https://gerrit.wikimedia.org/r/441200