Page MenuHomePhabricator

Splitting should be done after alignment at Special:PageMigration?
Closed, DeclinedPublic

Description

Author: pr4tiklahoti

Description:
Right now, first the translation units are split on (any) headers and then the alignment is done on h2 level at Special:PageMigration. As aligning has a consequence of 'collapsing' the above sections to perform a match, this defeats the purpose of splitting. So, first alignment should be done on h2 level, and then the splitHeaders() function can be called on the resulting translation units.


Version: unspecified
Severity: major

Details

Reference
bz69310

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:32 AM
bzimport set Reference to bz69310.

This bug report lacks steps to reproduce and a user story like "As a translation administrator, I want ... so that ...".

Suggested example: https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php/ru&oldid=611990 vs. https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php&oldid=1095587

pr4tiklahoti wrote:

To reproduce:

  1. Go to Special:PageMigration
  2. Enter Manual:LocalSetting.php in title field and ru in language code field.
  3. Press the Import button

I. Observed:

  1. The translation units are imported
  2. Due to h2 alignment, many units (containing h3+ headers) are collapsed into one

II. Expected:

  1. The units should be split on headers after the alignment on h2 is done.

This avoids many sections getting collapsed into one if there was no h2 header in them, like for the example above.

gerritadmin wrote:

Change 153064 had a related patch set uploaded by BPositive:
Splitting of units on headers done after h2 alignment at Special:PageMigration

https://gerrit.wikimedia.org/r/153064

Thanks for comment 2 and the patch, but I'm not going to spend time looking into it until I'm sure we have a shared understanding of the problem.


19.22 < Nemo_bis> 19.19 < Nemo_bis> So what would you look for if you needed a second one [page] to test with?
19.22 < Nemo_bis> The steps to reproduce ideally are able to be followed on any wiki by scratch
19.23 < Nemo_bis> A good portion, if not the majority, of the work needed to fix a bug is describing it well
19.23 < Nemo_bis> Isolate the steps to reproduce
19.23 < Nemo_bis> Produce a minimal test case
19.23 < Nemo_bis> Ensure the test case covers the actual issues that originated the report
19.24 < Nemo_bis> That page is huge but to describe the problem we actually only need to look at the TOC

[Only then]

19.23 < Nemo_bis> Devise a solution
19.23 < Nemo_bis> Implement it
19.23 < Nemo_bis> Test it against the test case
19.23 < Nemo_bis> (Build a unit test)


So your next task for today is figuring out this bug in a more general way: again, steps to reproduce (and minimal test case), user story.

There is also https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools/Design , which was never really polished. You may need to update this page in order to make it clear to yourself and others what the expected behaviour is and why it failed here.

pr4tiklahoti wrote:

As a user of Special:PageMigration, a minimal requirement from my side would be that the imported translation units are well separated from each other. And even if there is no 100% correct alignment, it should be easy for me to align them using the add, swap and delete features.

As per https://bugzilla.wikimedia.org/show_bug.cgi?id=66162, the translation units are adjusted by adding empty units or collapsing multiple units into one so that the headers are aligned on h2 level.

For examples like https://www.mediawiki.org/wiki/Manual:LocalSettings.php vs https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php/ru&oldid=611990 , in which the number of h2 headers and the h3 headers under them don't match (due to newer sections getting added or change in layout of source page), just aligning on h2 level would collapse all the previous sections into one huge chunk. For the example quoted above, ==Security== would get aligned with ==Настройки БД== as per the h2 alignment, collapsing all the h3 headers (1.1 to 1.13) in a single unit.

As a user, it would be tedious to do the splitting manually and create corresponding units. The Special page would indeed be useless for this particular case. So, we could avail this automation provided by https://gerrit.wikimedia.org/r/#/c/136334/ so that headers and their text are split and new units would be available from the start itself. Given that this Special page goes hand in hand with Special:PagePreparation, and PagePreparation takes care of having headers as separate translation units, this should require less adjustment as it would only be a matter of moving all the units together.

As per the current code, this splitting is done before alignment. But aligning results into collapsing of units for cases like above, which defeats the purpose of splitting itself. For cases in which there are no headers in the collapsed text, doing this before or after has the same effect. But, by doing this later, we cover cases in which there were h3+ headers that got collapsed into one and separate them out.

So, to summarize and reproduce this bug:

  1. Go to Special:PageMigration on your wiki
  2. Enter the input fields such that the page would have h3+ headers and unequal number of h2 headers which should result into collapsing of units. You can easily find this observing the TOC for both the pages.
  3. Try adjusting the units and see how painful it would be if there were many h3+ headers that got collapsed into one.

Expected (or rather desired):

  1. The units containing headers should be separated so that adjusting takes less efforts from user's perspective.

What would happen then if the old version of the translation page had no double newlines between the h2? Does alignment still work?

pr4tiklahoti wrote:

We had discussed this on IRC (wish I had the logs). You had clearly told me that there would be double newlines atleast between two h2's and such situations are rare. We had discussed this during working on h2 patch when I had misinterpreted it.

However, if we still wish to cover this rare case as well, the splitting function would need to be applied before as well as after the headers are aligned.

You should wish for up to date docs at https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools/Design , not for chat logs. :-)

It's fine if you've considered this drawback, you just had forgotten to say.

Nemo_bis claimed this task.

I think the idea here was worth exploring, but we now have few pages left where to test it and in general where there is a big mismatch in structure I feel we can improve little.