Page MenuHomePhabricator

Job queue improvements contain unannounced schema changes
Closed, ResolvedPublic

Description

I think it was agreed upon that code requiring schema changes would temporarily need to work without the schema change being applied. However in https://gerrit.wikimedia.org/r/#/c/13194/ Aaron writes this is not the case. I also don't see the schema change announced in wikitech-l nor http://wikitech.wikimedia.org/view/Schema_changes nor is there any mention in RELEASE-NOTES nor even in the commit message.

It was pure luck we didn't break the production site of translatewiki.net. I don't inspect most commits beyond the commit message.


Version: 1.21.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=60719

Details

Reference
bz41100

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:53 AM
bzimport set Reference to bz41100.
bzimport added a subscriber: Unknown Object (MLST).

Actually it's been on http://wikitech.wikimedia.org/view/Schema_changes for a while and is still there right now.

(In reply to comment #0)

I think it was agreed upon that code requiring schema changes would temporarily
need to work without the schema change being applied. However in
https://gerrit.wikimedia.org/r/#/c/13194/ Aaron writes this is not the case. I
also don't see the schema change announced in wikitech-l nor
http://wikitech.wikimedia.org/view/Schema_changes nor is there any mention in
RELEASE-NOTES nor even in the commit message.

It was pure luck we didn't break the production site of translatewiki.net. I
don't inspect most commits beyond the commit message.

I'm sure Niklas is very sorry about mentioning the one location where it was documented. Please fix the cases where it wasn't and should have been. Unless we're deciding to really drop third party support of course,

There was more:

  1. MessageGroupStatesUpdaterJobTest::testHooks

DBQueryError: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE unittest_job SET job_token = '2a19f741b629f083508dcbe2499a1c03',job_token_timestamp = '20121116185507',job_attempts = job_attempts+1 WHERE job_cmd = 'MessageGroupStatesUpdaterJob' AND job_id = '3' AND job_token = ''
Function: JobQueueDB::claimRandom
Error: 1054 Unknown column 'job_attempts' in 'field list' (localhost)

Setting milestone: of course release notes have to be fixed before release.
Is the job queue stable again after bug 41656 and friends were fixed?

Does probably need a mention in the release notes. Probably a moot point, but a token mention on wikitech-l would still be nice.

This is not one of the more egregious cases, since fields were only added as near as I can tell, but not changed. That affords us some ability to roll back if necessary, which was the main point of the conservative approach to schema changes.

(In reply to comment #3)

There was more:

  1. MessageGroupStatesUpdaterJobTest::testHooks

DBQueryError: A database error has occurred. Did you forget to run
maintenance/update.php after upgrading? See:
https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE unittest_job SET job_token =
'2a19f741b629f083508dcbe2499a1c03',job_token_timestamp =
'20121116185507',job_attempts = job_attempts+1 WHERE job_cmd =
'MessageGroupStatesUpdaterJob' AND job_id = '3' AND job_token = ''
Function: JobQueueDB::claimRandom
Error: 1054 Unknown column 'job_attempts' in 'field list' (localhost)

And? The commit message mentioned the change and https://github.com/wikimedia/mediawiki-core/blob/master/RELEASE-NOTES-1.21 already mentions job queue schema changes (it would be odd to add another entry since the whole job queue stuff was done in 1.21 so it makes no difference to third parties).

Updating the code without running update.php will always have problems sometimes. One could add wikitech-l comments...although that starts to become more denormalization and busy work. Keeping an eye on https://github.com/wikimedia/mediawiki-core/blob/master/includes/installer/MysqlUpdater.php would be easier than watching the summaries of all commits. Maybe date comments could be added there or something too.

(In reply to comment #6)

(it would be odd to add another entry
since the whole job queue stuff was done in 1.21 so it makes no difference to
third parties).

Updating the code without running update.php will always have problems
sometimes. [...]

Then fix the documentation.
https://www.mediawiki.org/wiki/Manual:Upgrading#Basic_overview contains no "check includes/installer/MysqlUpdater.php because the release notes don't contain everything".
https://www.mediawiki.org/wiki/Manual:Upgrading#Using_Git doesn't say that running update.php is needed/better even when nothing suggests so.

(In reply to comment #7)

(In reply to comment #6)

(it would be odd to add another entry
since the whole job queue stuff was done in 1.21 so it makes no difference to
third parties).

Updating the code without running update.php will always have problems
sometimes. [...]

Then fix the documentation.
https://www.mediawiki.org/wiki/Manual:Upgrading#Basic_overview contains no
"check includes/installer/MysqlUpdater.php because the release notes don't
contain everything".
https://www.mediawiki.org/wiki/Manual:Upgrading#Using_Git doesn't say that
running update.php is needed/better even when nothing suggests so.

Are we looking at the same page? It mentions running update.php in "Basic overview" and in "Run the update script".

We don't mention MysqlUpdater.php or anything since we already mention update.php. Checking MysqlUpdater.php and release notes is useful when people want zero downtime and want to run the changes before updating the code (even then one must be careful not to upgrade more than one version at once unless they know what they are doing). This area could be documented better, though it mostly applies to large sites like WMF and Wikia. As a side note, for avoiding downtime for small wikis, one could also check out the new code to a new directory but with the same LocalSettings and run update.php there (assuming the upgrade is only up by one version). It would be nice for these tricks to be documented in the UPGRADE file or mw.org.

In any case none of what I mentioned is specific to job queue changes at all and nor was anything. It sounds like something for wikitech-l.

tchay wrote:

Is there something in our process that needs to be fixed here?

I assume TranslateWiki is moving faster than the point releases, what stuff needs to be changed about the wmf ones that isn't there already that would keep something like this from slipping through and breaking your site?

It seems to me even if wikitech-l made a token mention that this might be prone to error, but would that be enough?

This has the 1.21.0 target milestone set as it should be in the rel-notes (comment 4 and 5), but this is already the case (comment 6).
Other docs were brought up in comment 7 by Nemo, but seem to not apply as per comment 8.

Is anything else required here that should be done before 1.21.0? Or should that TM be removed?
Clarification welcome; wondering how to proceed. Thanks.

Bug 46934 looks like it is directly related to the un-announced schema changes discussed here. I'd like to have something for the release notes, at least.

These have been in the 1.21 notes for some time.