Page MenuHomePhabricator

updateCollation.php should sanity check the collation before proceeding
Closed, ResolvedPublic

Description

updateCollation.php should sanity check the collation before proceeding.

This could prevent some silly mistakes like the one seen on bug 46005 comment 4.


Version: 1.22.0
Severity: minor

Details

Reference
bz46615

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:23 AM
bzimport set Reference to bz46615.

Basically, updateCollation didn't actually check if chosen collation is supported by both the underlying ICU library and MediaWiki's list of first-letters for this language, and so processed all the pages without complaining; the error was only revealed when actually browsing the wiki.

That change then breaks at update.php:

Updating category collations...PHP Fatal error: Call to undefined method UppercaseCollation::getFirstLetterData() in mediawiki/core/maintenance/updateCollation.php on line 87

UppercaseCollation extends Collation, but getFirstLetterData() is provided by IcuCollation, so UppercaseCollation doesn't have such method.

It may be better to break at update instead of when browsing the category page but it shouldn't fail at all.

Whoops that method isnt in the base class and i had only tested on uca collations. I submitted a revert for now (can I +2 a revert if its reverting code I +2'd in the first place)

That's gerrit change Ib7b9597ff842a76185ba5c153922834ffb741237 for reference

Maybe we should wrap it in an if instanceof IcuCollation instead of reverting?

The revert was meant as a temporary measure.

In this case maybe the constructor should be changed to throw an exception. MatmaRex had previously tried to do that, at the time I didnt think it was a good idea, but now I thimk it makes more sense.

Ughhhh. Sorry for the fail.

Throwing in the constructor will require us to either:

  • keep a hard-coded list of allowed collations (bad, see bug 44667 and bug 46058)
  • duplicate the loading logic to check if all will be OK without loading (bad for obvious reasons)
  • or just actually load the data in it, which seems sane, but I assume we're not doing for a reason.

Tim wrote the code - can you explain? I suppose loading all the data
would be slow, but it's not like we're instantiating Collations left
and right.

What about changing $collation->getFirstLetterData(); to $collation->getFirstLetter(); ?
On a IcuCollation, it will call getFirstLetterCount() which will load getFirstLetterData(), but it's a member of Collation class, to it should work everywhere.

The other IcuCollation members should probably be changed to protected, to prevent future issues like this.

https://gerrit.wikimedia.org/r/57500 (Gerrit Change Ib7b9597ff842a76185ba5c153922834ffb741237) | change APPROVED and MERGED [by Tim Starling]

(In reply to comment #5)

(can I +2 a revert if its reverting code I +2'd in the first place)

Yes. I would have preferred it if you had done that instead of leaving the maintenance script horribly broken for 6 weeks.

(In reply to comment #9)

Tim wrote the code - can you explain? I suppose loading all the data
would be slow, but it's not like we're instantiating Collations left
and right.

In the code I wrote, only uca-default was allowed, and I imagined that there would be one class per additional locale, so the developer would be required to check if the relevant locale actually worked before adding a class. You were the one who added the ability to use any ICU locale, in I838484b9.

Why not call getFirstLetter() from updateCollation.php and see if that throws an exception?

(In reply to comment #9)

Tim wrote the code - can you explain? I suppose loading all the data
would be slow, but it's not like we're instantiating Collations left
and right.

In the code I wrote, only uca-default was allowed, and I imagined that there
would be one class per additional locale, so the developer would be required
to
check if the relevant locale actually worked before adding a class. You were
the one who added the ability to use any ICU locale, in I838484b9.

This doesn't answer my question about why we are not loading the first-letter data in IcuCollation's consturctor, but only lazy-loading it using getFirstLetterData().

Why not call getFirstLetter() from updateCollation.php and see if that throws
an exception?

I77de040f.

Related URL: https://gerrit.wikimedia.org/r/64503 (Gerrit Change I77de040f97080653fe0d1734d38490eaa2d322db)

Change 64503 merged by jenkins-bot:
updateCollation.php: sanity check the collation before proceeding

https://gerrit.wikimedia.org/r/64503