Page MenuHomePhabricator

Translation pages: display outdated translations as such, without removing them
Closed, ResolvedPublic

Description

Wikis with the Translate extension enabled suffer this problem:

  1. Contributor translates all translation units of page ABC to language XYZ.
  1. Master (English) version changes one word in one paragraph (one unit translation).
  1. After marking for translation, the whole paragraph is marked as outdated (good), and then the whole paragraph appears in English at the page in language XYZ (bad most of the times).

Currently there are only two options when updating translation units:

  • Ignore the changes. Translations won't be notified about the change. Not good when you add/change meaning.
  • Apply changes. This will mark translation units affected as outdated, removing them from the displayed text. Not good if you just made a little change, but the old version in language XYZ is still more useful than a disruptive paragraph updated in English.

SOLUTION

Allow a third option: apply changes, keep translations. The banner will still say that the translation is outdated. It could be even possible to provide a special look to the outdated strings e.g. light pink background (which would perhaps invite the reader to complete the translation). The banner has the link to the original English version, if someone is really curious about the details.

This way the translation admins would have full control deciding how a change in the English text should affect all current translations, without stressing out the translators because now a page looks half broken because of one word changed.

ADDITIONAL INFORMATION

We asked more information from the people with direct knowledge of the topic: 500 translators who had real-life experience of invalidated translations. We got about 50 responses and the outcome is quite clear. https://meta.wikimedia.org/wiki/Meta_talk:Babylon/Archives/2016#Should_FuzzyBot_remove_all_potentially_outdated_translations.3F

See Also:

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:19 AM
bzimport set Reference to bz58429.
bzimport added a subscriber: Unknown Object (MLST).

It would be useful to have more opinions about the relevance of this problem in Wikimedia multilingual wikis. Also about the complexity of the solution proposed.

Is this as relevant and as complex as to become a GSoC project?

(In reply to Quim Gil from comment #1)

It would be useful to have more opinions about the relevance of this problem
in Wikimedia multilingual wikis. Also about the complexity of the solution
proposed.

Is this as relevant and as complex as to become a GSoC project?

I don't think it's complex at all, it's only about reverting bug 44328. It was only introduced because Wikidata translation administrators used to do particularly stupid things which broke the HTML of all translation pages, and they were loud enough; nobody likes the current situation, as far as I know, or at least I've never heard anyone happy about it, but of course people happy with changes tend to stay silent.
Definitely not suitable as GSoC project, but anyone interested might venture to wikidata and check if the translation administrators there have learnt to behave in the meanwhile or they'd still oppose the translation working as it always did.

Thanks Nemo, I didn't know about the history of this feature.

So... I don't know how important this bug was, and I don't know whether the only way to fix it was to disable completely fuzzy translations, but I really wonder if the consequences of this "fix" were really considered.

Also, should all the wikis using Translate lose this feature because it was a problem for one specific project? Would it make sense to propose a preference allowing wikis to enable/disable fuzzy translations?

You seem to suggest that part of the problem was a specific practice of the Wikidata translation admins? Is that so? The examples provided are bullet lists...

Then again Translate already organizes translation units in a way that seems to be wary of HTML consistency. For instance, a bullet list will be included in a single translation unit, therefore it should be possible to mark the entire list as fuzzy without breaking it?

(In reply to Nemo from comment #2)

(In reply to Quim Gil from comment #1)

It would be useful to have more opinions about the relevance of this problem
in Wikimedia multilingual wikis. Also about the complexity of the solution
proposed.

Is this as relevant and as complex as to become a GSoC project?

I don't think it's complex at all, it's only about reverting bug 44328. It
was only introduced because Wikidata translation administrators used to do
particularly stupid things which broke the HTML of all translation pages,
and they were loud enough; nobody likes the current situation, as far as I
know, or at least I've never heard anyone happy about it, but of course
people happy with changes tend to stay silent.
Definitely not suitable as GSoC project, but anyone interested might venture
to wikidata and check if the translation administrators there have learnt to
behave in the meanwhile or they'd still oppose the translation working as it
always did.

Yes, reverting bug 44328 sounds like a great idea. I hear the WMF Legal team is observing related problems with their current multilingual community consultations on Meta, which this might mitigate.

Not only the legal team. Solving this bug would remove a huge source of distress for translation administrators, who – in their routine work of marking new versions for translation – are regularly accused of acts of hostility against random languages by some of the translators who see their translations "deleted" (in fact, "just" temporarily replaced by English text until confirmed) by FuzzyBot, sometimes compared to a USA drone killing people in foreign countries... luckily only few users reach such levels of aggressivity and madness in relation to this issue.

Even in a small wiki with only a bunch of translators that know each other, many times I'm refraining myself from updating source (English) texts with small improvements because

a) I don't want that such small change turns translated paragraphs back to English, without knowing when the translators will have time to go through the changes

b) I don't want to mark those paragraphs as "Don't invalidate translations", because in that case the translated versions won't notice the changes at all.

This is happening right now in a small wiki with only a handful of editors/translators that know each other. I can imagine the distress and the amount of English-text-in-translated-patragraphs this is causing in, say, mediawiki.org.

It looks like reverting the change creates less hassle than keeping it. If the previous situation caused problems in the format of certain types of content (long bullet lists? it is still unclear to me what was the format causing the problem) then maybe those pages could be formatted differently?

Can we come up with a solution which works for everyone? I see bunch of (conflicting?) requirements, including:

  1. Access to translation, even if possibly outdates
  2. Showing original text when translation source has changed in significant ways
  3. Simple, intuitive user interface

(In reply to Niklas Laxström from comment #7)

  1. Access to translation, even if possibly outdates
  2. Showing original text when translation source has changed in significant

ways

  1. Simple, intuitive user interface

I don't think (2) is a requirement: if the unit is now very different, just make a new one (i.e. remove the old marker); this happens very rarely.
(3) is not a problem with the old system, i.e. if we fix this bug and revert bug 44328, because the pink background was very intuitive and there was no need to worry about all the details and implications of invalidating translations.

(2) is a requirement. Whether it is done by removing the tag and instructed in documentation or somehow else is an implementation detail.

If we just revert, then we will also have to solve all the inconsistent state issues as well as the bugs where the pink color does not appear correctly or breaks things.

(In reply to Niklas Laxström from comment #9)

(2) is a requirement. Whether it is done by removing the tag and instructed
in documentation or somehow else is an implementation detail.

Good. Then it's satisfied already and in any case, the docs already instruct translation administrators to remove the unit marker and re-create the translation unit upon radical changes.

If we just revert, then we will also have to solve all the inconsistent
state issues

Inconsistent state?

as well as the bugs where the pink color does not appear
correctly or breaks things.

Are those bugs filed?

I'm not a HTML/CSS ninja, but let me share this hypothesis anyway:

Bug 44328 - Replace outdated translations with source text on translation pages

The problem reported:

"When a section of a translated page is fuzzied, it is enclosed in spans with the id mw-translate-fuzzy. Currently these span tags have a line break after the <span> and before the </span> in wikitext. This breaks line-sensitive wikimarkup, such as lists."

The wikitext txt attached shows line breaks:

  • {{anchor|Triplet}}<span class="mw-translate-fuzzy">

'''Triplet''' is how to store data as a single data entry in linked data. It consists of a ''subject'', a ''predicate'' and an ''object''. In Wikidata this is approximately ''item'', ''property'' and the ''value'' from the statement.
</span>
*...

That render as <p></p> in the HTML source code, causing the mess in lists and other elements where in-line span is required.

Now, looking at the patch:

https://gerrit.wikimedia.org/r/#/c/52215/1/resources/css/ext.translate.css

.mw-translate-fuzzy {

background-color: #FDD;

}

Only background color. Nothing to see here.

https://gerrit.wikimedia.org/r/#/c/52215/1/tag/PageTranslationHooks.php

746 » » » if ( $fuzzy ) {
747 » » » » // Only show if there is fuzzy messages
748 » » » » $wrap = '<div class="mw-translate-page-info mw-translate-fuzzy">$1</
div>';
749 » » » » $wgOut->wrapWikiMsg( $wrap, array( 'tpt-translation-intro-fuzzy' ) )
;

Alright, the fuzzy string is wrapped by a <div></div>. A div is a block level element, it is designed to create blocks by default. Perhaps the "<span> + line break" in wikitext is the interpretation of the "<div>" done by MediaWiki, resulting in "<p>" when it is rendered to HTML again?

Is there a need to wrap the fuzzy strings with <div>? Why not doing it with <span> directly, which is an inline element with the same functionality as <div>? If <div> must be used for some reason, it is possible to specify "display:inline;" in the CSS, and it should work.

Could someone test what happens when you revert the patch, changing the line 748 to define a span instead of a div? (sorry, I'm not a git ninja either)

(In reply to Quim Gil from comment #11)

Could someone test what happens when you revert the patch, changing the line
748 to define a span instead of a div? (sorry, I'm not a git ninja either)

That's not the culprit, because it's only the header of the translation page (where there is the link "Translate this page" etc.). For the actual text of the translation page, a span was indeed used (https://gerrit.wikimedia.org/r/#/c/52215/1/tag/TPParse.php,cm), the problem is that a bullet not preceded by a newline doesn't make a list and a bullet's "content" ends at the first newline.
See https://meta.wikimedia.org/w/index.php?title=Meta:Sandbox&oldid=7853938 (https://translatewiki.net/w/i.php?title=Sandbox&oldid=5390009, without tidy is the same).

Change 118959 had a related patch set uploaded by Nemo bis:
[WIP] Display outdated page translations as such, without removing them

https://gerrit.wikimedia.org/r/118959

Change 118959 had a related patch set uploaded (by Amire80):
Display outdated page translations as such, without removing them

https://gerrit.wikimedia.org/r/118959

It's been a while, and still think the impact of lost translations is way higher than the little problems in specific pages that might be caused by reverting the patch. What about reverting and see what happens? We can always come back to the current situation (as a worst case scenario) or then file bugs for the specific problems reported (none of them looked urgent).

I think the silence in this task happens only because the readers and translators suffering the problem don't come here to complain -- or are not even aware that a simple action would make their experience in multilingual wikis a lot better.

In T60429#604604, @Qgil wrote:

You seem to suggest that part of the problem was a specific practice of the Wikidata translation admins? Is that so? The examples provided are bullet lists...

Then again Translate already organizes translation units in a way that seems to be wary of HTML consistency. For instance, a bullet list will be included in a single translation unit, therefore it should be possible to mark the entire list as fuzzy without breaking it?

Usually the items in a bullet list should be single translation units each.
Having intro texts and list in one translation unit is a nuisance to me.

(In reply to Niklas Laxström from comment #7)

[...]

  1. Simple, intuitive user interface

[...]
(3) is not a problem with the old system, i.e. if we fix this bug and revert bug 44328, because the pink background was very intuitive and there was no need to worry about all the details and implications of invalidating translations.

I disagree for some cases. If there is something inserted into the translated text, such as "[$link anchor]" or "{{PLURAL:...}}" and this is altered by a text change, a problem may arise, if a source of he inserted string becomes unavailabe.

If you do not want to make a new translation unit (maybe good for page translation and almost never advisable for interface translation) you must be able to mark an amended text as "do not use" - yet that is a comparatively rare case.

According to https://www.mediawiki.org/wiki/Special:MessageGroupStats/agg-Help_pages combined with a sample check of the "outdated" translations (which are typically just minor markup changes), I estimate this bug is disrupting 3k translations on mediawiki.org help pages alone.

Qgil set Security to None.
Nikerabbit lowered the priority of this task from High to Medium.Mar 23 2016, 8:01 AM

What happened that suddenly reduced the priority/severity of this task? As far as I know, thousands of translations continue to be disrupted.

Is any additional information needed?

Please see the workboard of MediaWiki-extensions-Translate . This task has lower priority compared to the high priority problems in that column.

Lower than T47096? Sounds unlikely. Especially as this one is a simple fix which would improve thousands of translations at once.

Lower than T47096? Sounds unlikely. Especially as this one is a simple fix which would improve thousands of translations at once.

Of the three asks above this one, two of them are feature requests (less likely to be worked in near future), one is a definitely higher priority issue, which is close to being resolved thanks to Glaisher and I have been proposing to address this task as well given the time saved from that task, but that has hard contention with TTMServer updates that might be forced to us by other teams in this quarter.

It is not a simple fix. The last thing I want to do is to introduce new bugs in this brittle code.

The last thing I want to do is to introduce new bugs

In the worst case we would be reintroducing old bugs, as we only need to restore the previous code.

The last thing I want to do is to introduce new bugs

In the worst case we would be reintroducing old bugs, as we only need to restore the previous code.

Other code around it has changed considerably, so that is not a valid assumption.

Copying from the chat on freenode:

[11:52] <mboisson> Hi Nikerabbit. We've observed something strange recently. When the source page of the translated version is modified, and then marked for translation, FuzzyBot brings the modified source translation unit into the translated page, instead of keeping the old one and mark it outdated. Is this something new, a bug, or expected ?
[11:53] <mboisson> I don't recall FuzzyBot doing this in the past. It would keep the translated (and outdated) unit, which I think is preferable.

Change 118959 merged by jenkins-bot:
Display outdated page translations as such, without removing them

https://gerrit.wikimedia.org/r/118959

Was this included in 1.29.0-wmf.7? I can't see any difference? Am I missing something? If not, when will this be in production?

FYI: This causes translation units used inside unnamed parameters in templates to break the template, because the equal sign in the span causes the text before it to be interpreted as a parameter name

See template parameters on pages having outdated translation

Nemo_bis claimed this task.

Thanks; documented at https://www.mediawiki.org/w/index.php?title=Help%3AExtension%3ATranslate%2FPage_translation_administration&type=revision&diff=2373581&oldid=2359386

I think there's nothing left to do here, except making sure we add all the necessary release notes and announcements for the people to enjoy this long-awaited change.