Page MenuHomePhabricator

Translation page does not contain the latest translations/last translation
Closed, ResolvedPublic

Description

NOTE: T53731#2151432 (and secondarily T63353#2202206, T56579#2151653) were the main issues. The main fix is 052bd7d93, included in the REL1_27 branch or higher and in the 2016.04 tag or higher. Please upgrade to MLEB 2016.04 or higher to fix issues with updates to translation pages.

Example: saving this unit: https://meta.wikimedia.org/w/index.php?title=Translations:FDC_portal/CentralNotice2013-2/2/es&oldid=5347994 ("Comenta cuatro ...") caused this simultaneous edit in the translated page as usual: https://meta.wikimedia.org/w/index.php?diff=5347995&oldid=5347984 (see also the matching edit summary: 'Created page with "Comenta cuatro ...'), which however only changed the preceding unit ("Ayuda al Comité de ...", see diff).

The problem is particularly severe when saving the translation of the last unit, because the translation page will remain incomplete even though the translator has actually provided translations of the entire text.

Example: After https://meta.wikimedia.org/w/index.php?diff=5326458&oldid=5326448 (the most recent edit to that page at the time of filing this bug), the page remained at "98% complete" status and the very last paragraph is still in English ("Wikimedia Sweden has started ..."), even though the translation is at https://meta.wikimedia.org/wiki/Translations:Wikimedia_Highlights,_February_2013/61/da ("Wikimedia Sverige har indledt ..."); this misled me to into not distributing that translation to the corresponding communities yet because I erroneously assumed that the translator was not done yet.

Purging the cache (e.g. https://meta.wikimedia.org/w/index.php?title=Wikimedia_Highlights,_February_2013/da&action=purge ) did not have an effect in resolving the situation (the last unit still displays the English original instead of the existing translation).

Related but less severe bug: T41415: Deleting a translation unit page doesn't remove the corresponding content from the translation page


Version: master
Severity: major
See Also:

Details

Reference
bz46716

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Nikerabbit: It would be possible to crate an automated process which "marks the pages" again in background or a similar workaround?.

Probably better ask one of our beloved gadget authors.

He7d3r set Security to None.

Nope. The issue still exists but nobody is working on it currently.

Let's see how much previous translation admin time we're wasting by requiring basically every single "mark for translation" action to be done twice, with a dummy edit in between. This catches many such edits (but possibly not the majority), with nearly no false positives, by looking for common edit summaries:

SELECT count(rev_id)
  FROM revision
  JOIN logging
  ON rev_page = log_page
  AND log_type = 'pagetranslation'
  WHERE rev_comment RLIKE '.*(null|dummy|trigger|propagate|46716).*';

+---------------+
| count(rev_id) |
+---------------+
|          5489 |
+---------------+
1 row in set (9.32 sec)
MariaDB [metawiki_p]> SELECT COUNT(rev_id)
    ->   FROM revision
    ->   JOIN logging
    ->   ON rev_page = log_page
    ->   AND log_type = 'pagetranslation'
    ->   WHERE rev_comment RLIKE '.*(null|dummy|trigger|propagate|46716).*';
+---------------+
| count(rev_id) |
+---------------+
|         24718 |
+---------------+
1 row in set (21.29 sec)

But it looks like this query is counting the same revision many times. When we add DISTINCT:

MariaDB [metawiki_p]> SELECT COUNT(DISTINCT rev_id)
    ->   FROM revision
    ->   JOIN logging
    ->   ON rev_page = log_page
    ->   AND log_type = 'pagetranslation'
    ->   WHERE rev_comment RLIKE '.*(null|dummy|trigger|propagate|46716).*';
+------------------------+
| COUNT(DISTINCT rev_id) |
+------------------------+
|                   1390 |
+------------------------+
1 row in set (14.34 sec)

That is, I'm not sure about your stats.

However, I do see people regularly making dummy edits on Meta-Wiki to apparently work around this Translate bug (e.g., changing <languages /> to <languages/>).

How is this task only normal priority?

Adding Wikimedia Product Management and line management to this issue.

Can any of you please assist in getting this issue moved forward? It's clear from comments by @Nikerabbit that this is a major issue, that he is not allocated to work in it, and doesn't have the tools to work on it properly. Besides that, user impact on Wikimedia wikis is high, as also explained in previous comments.

AFAICS, that's only one of the byproducts of this bug, which are listed in the "See also" section of the description.

jayvdb raised the priority of this task from Medium to High.Mar 22 2016, 4:23 PM

While resourcing seems like an issue, this extension is increasingly mission critical for our multilingual communication, and this bug needs to be high on the agenda.

siebrand lowered the priority of this task from High to Medium.Mar 22 2016, 4:36 PM

@jayvdb Please do not change priority if it does not confirm with https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities. I think the only way to raise attention for this now, as this thread is apparently being ignored by those with the actual power to put it on the agenda, is to email the responsible people, product manager @Amire80 and budget holders @Arrbee and @TrevorParscal. Email addresses are tparscal@ wikimedia, rbhattacharjee@ wikimedia and aaharoni@ wikimedia.

I sent a mail to Trevor on January 28, with CC to @Nikerabbit. Niklas replied to it on February 8th, confirming the issue as high impact and supporting a high priority. The emails have not been replied to. This issue has been going on for a long time, and it is the single highest impact issue that Translate currently has in the WMF wikis, in the opinion of myself and Niklas.

Those waiting for signals from the rest of the Language engineering team might be interested to learn that the movement-wide mailing list Wikimedia-l recently saw an opinionated posting by @Amire80 where he called upon the Wikimedia Foundation to improve support for translation of meta-level movement communications in case of the trustee selection process (and correctly pointed out that volunteer translations are usually the preferred way), albeit without mentioning the LE team itself. And that the Language-Engineering April-June 2016 board currently contains an item addressing the Translate extension's technical debt.

This bug is reaching its third anniversary next week and (along with its various relatives) has caused countless hours of wasted time for volunteer translators and also for WMF staff who are supporting multilingual communication.

Nikerabbit raised the priority of this task from Medium to High.Mar 23 2016, 9:21 AM

Like I mentioned above, I was tracking T53731: After re-marking an updated page for translation, FuzzyBot does not react, or only ports over the previous update instead of this bug. I have now moved this together with that to make it more clear.

Apparently, the code in TPParse::getTranslationPageText(). that calls on line 182 MessageCollection::loadTranslations( DB_MASTER ), and that should see the latest version, is not using the same SQL session, so it occurs in another database transaction which is not the transaction that started when the translation was previously saved, but which is still not committed: Wikimedia visibly wants to have persistant SQL sessions in order to commit all pending insert/update/delete in one operation.
However, because this is not always possible (this would create too many deadlocks when updates are occuring in multiple tables) some transactions are "pseudo-committed": not really committed, but instead there's visibly somewhere a hook that creates a new separate session for some intermediate updates that can be commited separately (e.g. for logging or debugging purpose), but then the old session stays idle : all the sessions will be closed only at end of the execution of all scripts for the currently viwed page.

To see if this is the case, it would be interesting to trace, on the Wkimedia farm, all SQL sessions that are opened/closed and all transactions started/commited/rollbacked in those sessions, by tracing it from a user account that has a special "SQL debugger" role : the test could also be activated on a specific part of the namespaces, and on a low-traffic wiki that has this extension enabled (e.g. on a Wikiversity).

The alternative would be to have on all wikis a special namespace used exclusively for debugging purpose (whose content could be eradicated at any time without asking, such eradication killing even the history and dropping pages instead of just masking them, in order to keep the storage space low: such killing of course should be done only by administrators): we could test translations in this "Debug:" namespace. This namespace could be useful for testing advanced features including new extensions (that would first be activated only in this namespace but doing nothing and remaining invisible in all other namespaces, even for wikilinks to them and transclusions):

  • only full url links could be used, for example in some talkspace of a wikiproject discussing about proposed or pending changes;
  • those "Debug:" pages would also not be categorised, or only in categories with a specific namespace such as "Debug Category:" instead of "Category:" (any attempt to categorise those debug categories would not pollute in the standard categories).
  • However transclusions from normal namespaces to the "Debug:" pages or "Debug Category:" description pages would still work normally (meaning also that "What links to this page?" could list those transclusions, but these could be hidden by default in the list except if we enable it in a checkbox to also include pages in this namespace).
  • Linking from a "Debug:" page or "Debug Category:" description page to other namespaces would also work normally (meaning also that "What links to this page?" could list those links; same remark for hiding by default pages in this namespace)

Reminder for myself to run refresh-translatable-pages.php (T53731#2160828), but we can wait for the other fixes currently in progress by @Glaisher to be reviewed and deployed.

NOTE: I would like to give huge thank you to @Glaisher for doing the hard work (see the blocked by tasks) to fix these.

As far as I can see all the high priority issues in page translation are now addressed. If you disagree, please nominate then now. Since Glaisher saved us a lot of time, we could try to close some tasks with patches in review:

Since the train does not run next week, and I am on vacation on the week after that, the ran of that script is going to happen in early May.

I looked around on some RCs but haven't seen any occurrence of the original problem stated on the task description. Can anyone tell whether they've experienced this issue recently?

We can run the script again after a month, and see if FuzzyBot makes any changes (which would indicate there are still some issues left).

Nikerabbit changed the task status from Open to Stalled.Apr 20 2016, 1:28 PM
Nikerabbit changed the task status from Stalled to Open.May 11 2016, 7:13 AM

Planning to run the script tomorrow.

wikipagesedits by FuzzyBottime taken
bewikimediaskipped, no pages
brwikimedia101s
cawikimedia1730s
collabwikiskipped, restricted wiki
commonswiki451~80082m
incubatorwiki271913m
legalteamskipped, restricted wiki
otrs_wikiwikiskipped, restricted wiki
outreachwiki104847m
ruwikimedia302s
uawikimeia6012s
specieswiki26022s
testwiki6325s
testwikidatawiki206s
wikidatawiki351872211m

Rest of the wikis (wikimanias, meta, mediawiki.org) did not fit and I plan to do them next week.

Remember that even if the bots does not run on your wiki, you can fix it on any translated page by pseudo-editing a single existing translation unit in the translation interface (add a space, remove it with backspace, apply the "change").

This will force the regeneration of the page and its statisitics.

However this works only when translating one page.

But in translation groups, there are frequently dozens or more pages and it is not evident to scan them all and select at least one translation unit for each page in that group (translation groups do not exhibit the separation between pages from which a sets of translation units is displayed.
In large groups, it is difficult to locate which page has out of sync statistics or generated content.

So there are many translation groups still showing low completion statistics, even if they were all completed. The bug is less critical for pages that are isolated outside any group and that are being translated isolately, as you can manually apply the pseudo-edit workaround easily. For example if you intend to publish a translated newsletter and send them by email, first make sure that that page is in sync by applying the manual workaround.

This only concerns the Wikimedia servers farm: wikis running on a single server or a server with a single master database (such as translatewiki.net) are not affected by this desynchronization bug which affects wikis running with slave servers and various levels of caches for data or web content.

If you have an automated newsletter, a very basic bot can run on each language listed to make a single pseudo-edit (for this reason, if you don't want to make errors when manually pseudo-editing a page, include a *leading* optional translation unit (that will not be part of the newsletter) but for which the "translation" is nearly automatic. For example allow translating the "display page name" when submitting a source page for translation: it is safe to force it to initially use the same text as English if it's still empty, and as a benefit that page will be listed for the languages you want to target, even if its completion remains at 0% or near 0% if this is the only thing "translated")


If the display page name is used for the content you want to publish (e.g. for the subject of the newsletter), just add another leading translation unit that will be left invisible when rendered, such as a fake word, and no content returned by the #if as in

{{#if:<translate>_</translate>}}

(place this near the top of the page, so that it becomes the first translation unit in the interface, just below the display page title whose effective translation is requested.

You'll get a translation unit containing only that fake "_", and for which the effective translation does not really matter in the rendered page, and its translation can be automated in all languages by submitting "_" as the translation. Prefilling this pseudo-translation unit automatically will prefill the language bar for all languages you want to target in your newsletter, but translators won't have to care about it as it is already translated, and this will not block the translated page to reach 100% in the translation statistics (where you generally want to avoid keeping pages listed when their translation is complete.

You may also automate the "translation" of the "/qqq" doc saying that translation is optional, but should be identical to the English text (in order to not pollute the translation memory)


Unfortunately, there's still no way to explicitly mark a page with translation units that are in fact optional (and that should be ignored in completion statistics). It would help to have a syntax like:

<translate optional>technical text</translate>

(with the "optional" kept inside the translation marker of the converted template).

That translation unit would not require running the fuzzybot to refresh the completion statistic, but it would run only to regenerate the page.


[OT]Supplemental idea:
Similar attributes to specify the behavior of the translation interface directly in the "translate" or "tvar" tags could also be used to disambiguate the style or format expected for the translated content These options would be useful in the translation interface to better tune the validation pass being performed by the UI. For example,

  • for "tvar" placeholders, is "$1" really a placeholder or a currency amount ?).
  • empty content allowed in translation,
  • valid (or existing) URL syntax,
  • valid (or existing) page name for a designated wiki,
  • valid (or existing) name in its local "File:" namespace,
  • text using a valid C-style format (useful on translate.net for translating C projects), or for other programming languages like Java or Python
  • text using a valid PHP-style formating string,
  • text with valid HTML markup, or MediaWiki+HTML markup, or JSON syntax, or strict XML markup
  • text using the MediaWiki markup syntax,
  • valid MediaWiki template parameters list (if this template includes metadata for parameters description, as used on the Visual Editor when editing templates).
  • or any plain-text (with no interpretation of markup and no placeholders),
  • or value in a restricted enumerated list (possibly described in another special tag with a reference ID), outside <translate> tag itself
  • or value matching a regexp pattern

@Verdy_p I would appreciate if you posted your thoughts and feature requests in a more appropriate place and not here.

So there are many translation groups still showing low completion statistics, even if they were all completed.

T49864 was recently fixed so this will (hopefully) not happen anymore. It looks like refresh-translatable-pages doesn't update the statistics so we might want to run another script to fix the statistics as well. There may also be other bugs with translation statistics and if we do want to run a script to update the statistics across all the wikis, we should probably resolve those bugs before a doing mass update.

wikipagesedits by FuzzyBottime taken
wikimania2012-2014wikiskipped, closed wiki
wikimania2015wiki73627m
wikimania2016wiki55536m
wikimania2017wiki5921m
mediawikiwiki3029~11000237m + 288m
metawiki3134~2400072m + 1082m

Now, that was a huge amount of edits. I will re-run refresh-translatable-pages.php in two weeks on commonswiki to see if the script makes any edits and re-open this bug if necessary.

Apparently not resolved in Metawiki (tested this morning).

  • The last edit translation unit is still not visible and needs a pseudo-edit to refresh the page
  • When we update the translation source and commit it, the translated pages are not refreshed (also need a pseudo-edit in at least one translation unit).

Do you have links for inspection?

There was a request to translate on Meta a message (already posted, but only in English, in several Wikimedia lists), speaking about interactive maps and inviting users to comment the proposed feature. Th intent is to let users of other languages comment about the new Wikipedia feature (and notably the design style).

When updating the source page (without changing any translation unit, only changing the text around them or reordering them) none of the translations units need any update (so statistics also don't need to be updates). But the page must still be regenerated to use the new layout.

For now we still need to pseudo-edit one translation unit in each language to force the page regeneration.

I'm guessing you are referring to this edit (and the corresponding mark for translation) to the source page which supposedly didn't trigger an update by FuzzyBot as you made this edit to the translation page within the same minute. I think what happened here was that before FuzzyBot could update (note that the updates are done through the job queue, which can take several minutes) the source page, you made an edit to that page so there was nothing to edit by the time the job was run so FuzzyBot only made a null edit causing nothing to be recorded in the revision history. I demonstrated this at https://meta.wikimedia.org/w/index.php?title=User:Glaisher/translate-sandbox&action=history and https://meta.wikimedia.org/w/index.php?title=User:Glaisher/translate-sandbox/fr&action=history which worked correctly there but note again that the update by FuzzyBot was about one minute after I marked the page for translation. I can't say for sure that this is what happened because there was no other translatable page at that time except the fr page but this is the most likely explanation for what happened unless there is a bug again somewhere. See also T53731#2206665.

Also can you provide an example of "The last edit translation unit is still not visible and needs a pseudo-edit to refresh the page" (which is what this task was originally about)? I'm interested in seeing whether this task has actually being fixed.

Thanks a lot to Nikerabbit, btw.

I reran the script on Commons (because it had great number of edits last time compared to running time).

Results were 459 pages, 0 edits, 90m running time. I am now highly confident that Glaisher has fixed this bug for good.

As a final test we can run the script once more in couple of weeks for mediawiki.org.

As promised, I ran the script on MediaWiki.org:

Refreshed 3053 translatable pages.

real    287m40.554s

There are four edits by FuzzyBot that would indicate issues:

The corresponding translation unit changes:

Might be useful to investigate FUZZY addition.

Hi, can you run the script on Meta-Wiki too? Thanks.

I recently noticed that I need to continue performing dummy edits so
changes in the main translatable page gets propagated.

All the info I have gathered have says the opposite. You may be doing it out of habit or due to misunderstanding of the delays for example.

Can you provide examples where this is not the case, because running the script all the time is not a solution?

The expected time for a change to be propagated can now be seen at https://grafana.wikimedia.org/dashboard/db/job-queue-health?panelId=7&fullscreen&var-jobType=TranslationsUpdateJob

The worst case is rarely above 10 seconds, but try to wait at least as much before going the dummy edit way.

Currently, "/en" subpages was created not for all from translatable category pages.

Example in Meta-Wiki - Category:Talk_header_templates (last Marked for translation at the 2015-09-11) and /en subpage was not created yet

I am confused when you say "today" and then give one year old example. It might be an issue that our refresh script is not creating missing /en pages, but if you re-mark the page now, it should be created.

Excuse my English :(
"Currently", no "today".... I thought that after fixing this bug, bot will find and create the lost subpages. So not planned?

https://gerrit.wikimedia.org/r/#/c/298470/ should fix your report. I was lazy and did not create a separate bug report.