Page MenuHomePhabricator

[l10n] [scap] Are Scap and LocalisationUpdate actually fighting over l10nupdates?
Closed, ResolvedPublic

Description

So Bryan and I were looking at the l10nupdate scripts, noticing that it's essentially a reproduction of what's in scap.

For a little work, it was possible to make l10nupdate quicker, but there's a lot of code in it that should be replaceable with a scap call.

When we started digging into this a bit further, we have come to a conclusion that any scap run after an l10nupdate is possibly wiping out any changes made by them. This is due to different cache directories being used to build l10n caches from... Causing untold bugs?


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=70752

Details

Reference
bz70446

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:51 AM
bzimport added a project: Deployments.
bzimport set Reference to bz70446.
bzimport added a subscriber: Unknown Object (MLST).

[15:42:30] <Reedy> hey bd808
[15:43:09] -*- bd808 waves and reads backscroll
[15:43:39] <Reedy> TLDR; https://github.com/wikimedia/operations-puppet/blob/production/files/misc/l10nupdate/l10nupdate-1#L90-L106
[15:43:44] <Reedy> Can be replaced by a call to scap?
[15:45:37] <bd808> Doesn't it use a different source dir somehow?
[15:45:41] -*- bd808 reads more closely
[15:46:35] <_joe_> bd808: there are path issues in beta
[15:46:48] -*- bd808 is not surprised
[15:46:53] <_joe_> I discovered them when I flushed the bits cache this morning
[15:47:12] <bd808> _joe_: apache path, bash path, scap soemthing?
[15:47:26] <_joe_> filesystem path
[15:47:27] <_joe_> :)
[15:47:33] <_joe_> there are broken links
[15:47:35] <_joe_> 1 sec
[15:48:10] <bd808> Reedy: That looks a lot like the l10nupdate bits of scap for sure.
[15:48:52] <Reedy> The only overall difference really seems to be the localisation update call before, and hte rl cache update after
[15:49:10] <bd808> Reedy: Does mwscript extensions/LocalisationUpdate/update.php add new i18n contents to the branches in /usr/local/apache?
[15:49:27] <bd808> I read all this code once but it was quite a while ago (March I think)
[15:50:36] <Reedy> $wgLocalisationUpdateDirectory
[15:50:42] <Reedy> $wgLocalisationUpdateDirectory = "/var/lib/l10nupdate/cache-$wmfVersionNumber";
[15:51:17] <Reedy> Which I think might be the only real difference to where scap uses
[15:52:28] <bd808> Right... it makes it's own pile of cdbs and then syncs them out. But doesn't it pull in new messages from master somehow?
[15:52:50] <bd808> Which is all that gitrepo stuff above
[15:53:08] <Reedy> yup
[15:53:15] <_joe_> bd808: Symbolic link not allowed or link target not accessible: /usr/local/apache/common/docroot/bits/static-master/skins, referer: http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
[15:53:22] <Reedy> That's the /usr/local/bin/mwscript extensions/LocalisationUpdate/update.php above
[15:53:48] <Reedy> bd808: my initial patch of switching to sync-dir and calling scap-rebuild-cdbs N times will still be a load faster than currentl l10nupdate
[15:53:50] <Reedy> ;)
[15:54:14] <bd808> Cool. I saw the email but haven't looked a the patch yet.
[15:54:26] <Reedy> I was going to look at moving it around... But when I noticed that the whole block could just be replaced with a scap call...
[15:54:34] <_joe_> bd808: also, on the FS: # ls -la /usr/local/apache/common/docroot/bits/static-master/skins
[15:54:38] <_joe_> lrwxrwxrwx 1 mwdeploy mwdeploy 28 Sep 3 23:37 /usr/local/apache/common/docroot/bits/static-master/skins -> ../../../../php-master/skins
[15:54:42] <_joe_> is broken
[15:55:05] <bd808> _joe_: Oh. The new relative links and beta don't play nicely together then
[15:55:20] <-- csteipp_afk (~csteipp@wikipedia/csteipp) has quit (Ping timeout: 276 seconds)
[15:55:55] <bd808> I bet that has something to do with /usr/local/apache/common being a symlink in beta
[15:56:10] <bd808> but... it's a symlink everywhere
[15:56:16] <_joe_> oh I have absolutely no idea
[15:56:20] <_joe_> just reporting the problem
[15:57:05] <bd808> ori changed the link targets from absolute to relative as a step in moving /usr/local/apache/common-local to /srv/mediawiki
[15:57:18] <bd808> That merged a couple of days ago.
[15:58:00] <-- MaxSem (~max@wikimedia/MaxSem) has quit (Quit: This computer has gone to sleep)
[15:58:45] <bd808> Reedy: The guts of that if block sure do look just like a scap call. It would be pretty easy to make an --l10n-only scap action I think.
[15:58:57] <bd808> If we were worried about syncing other things
[15:59:01] <Reedy> bd808: It'd be nice, I'm not sure it's so needed
[15:59:06] <Reedy> Most of the rest will be a no-op
[15:59:13] <bd808> otherwise, I don't see why just calling scap would be bad
[15:59:14] <Reedy> needed isn't the right word
[15:59:35] <Reedy> branch_dir = os.path.join(cfg['stage_dir'], 'php-%s' % version)
[15:59:41] <Reedy> cache_dir = os.path.join(branch_dir, 'cache', 'gitinfo')
[15:59:54] <Reedy> so I guess that's completely different
[16:01:01] <Reedy> ignore that
[16:01:02] <Reedy> cache_dir = os.path.join(
[16:01:02] <Reedy> cfg['stage_dir'], 'php-%s' % version, 'cache', 'l10n')
[16:01:02] <bd808> Yeah. Scap builds l10n from /a/common/... so that it can build l10n for branches that haven't been synced with the cluster yet
[16:01:33] <Reedy> what's stage_dir? /a/common?
[16:02:08] <bd808> Yeah -- https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap.cfg#L22
[16:02:53] <Reedy> So, we could just change $wgLocalisationUpdateDirectory to...
[16:03:21] <Reedy> $wgLocalisationUpdateDirectory = "/a/common/php-$wmfVersionNumber/cache/l10n";
[16:03:34] <Reedy> replace most of l10nupdate with scap
[16:03:35] <Reedy> profit
[16:04:31] <bd808> so... does all the work done by l10nupdate get wiped out in the next scap?
[16:04:42] <bd808> That seems ... dumb
[16:05:11] <Reedy> haha
[16:05:22] <Reedy> I'm not sure
[16:05:45] <Reedy> it sorta seems like that
[16:05:53] <_joe_> btw, mod_php is definitively off on beta - everything is served through HHVM
[16:06:00] <bd808> If I'm understanding this it updates the i18n files in /usr/local/apache/common, makes cdbs and syncs them
[16:06:37] <Reedy> yeah, as one big rsync from tin
[16:06:40] <bd808> _joe_: Could you open a bugzilla bug about the path issues? I'll forget otherwise.
[16:07:09] <_joe_> bd808: ok
[16:07:14] <bd808> Reedy: So then when scap runs next it builds l10n from /a/common which doesn't have the updates and syncs that
[16:07:41] <bd808> So the l10n from master only stays around between when l10nupdate runs and the next scap
[16:07:50] <Reedy> :/
[16:07:51] <bd808> Which could be minutes or days later
[16:07:57] <Reedy> And no one has noticed?
[16:08:12] <bd808> Well maybe they have and we just didn't connect the dots
[16:08:27] <bd808> there have been tons of bugs reports about lagged l10n changes
[16:08:28] <_joe_> bd808: "wikimedia labs", right?
[16:08:38] <_joe_> LOL
[16:08:42] <_joe_> this was the reason?
[16:08:50] <bd808> I think so yeah
[16:08:52] <Reedy> it's looking likely
[16:09:21] <bd808> _joe_: yeah product wm-labs component deployment-prep
[16:09:37] <_joe_> yes I've seen
[16:09:54] <_joe_> well if _this_ is the reason, it's kinda funny
[16:09:56] <_joe_> :)
[16:10:00] <bd808> Reedy: So the obvious problem here is that if we update the i18n source in /a/common we will dirty the git clone
[16:10:27] <Reedy> Well, I was thinking of leaving it to clone stuff to the same place
[16:10:39] <Reedy> I was wondering if scap should be running localisationupdate as part of its process
[16:10:49] <Reedy> so we get the up to date json files to sync
[16:11:11] <bd808> It's a messy dance. We sync out from /a/common
[16:11:43] <bd808> So we'd have to sync-common on tin, update, copy the files back to /a/common
[16:11:50] <bd808> which is doable
[16:11:57] <bd808> but not the current process
[16:12:26] <bd808> right now we build in place from the source in /a/common
[16:13:01] <bd808> Switching isn't too hard though I don't think... how is it going to break...
[16:13:04] <_joe_> bd808: https://bugzilla.wikimedia.org/show_bug.cgi?id=70445
[16:13:30] <bd808> _joe_: thanks
[16:14:26] <Reedy> bd808: Should we merge what I have in 158623/158624 to gain some benefits?
[16:14:37] <Reedy> Cause this reworking is going to take a little bit longer
[16:14:43] -*- bd808 nods
[16:15:01] <bd808> let me review and then yeah lets make it incrementally better
[16:15:24] -*- Reedy repurposes the bug a little
[16:15:25] <bd808> but I think we should open a bug to investigate this idea of scap and l10nupdate fighting over the cdbs
[16:16:05] <Reedy> I'll do that
[16:16:16] <_joe_> I'm off for now
[16:16:25] <bd808> This is a lot like when I found the root cause of the bug with the json switch. REally complex crap that had been looked at in pieces but not as a whole process
[16:19:57] <Reedy> :)

Additional analysis at http://etherpad.wikimedia.org/p/l10nupdate. I also sent an email to Ori, Roan, Sam, Dan and Mukunda to see if they were interested in discussing a deeper fix for this problem. Roan wrote most of the existing l10nupdate code that is in oeprations/puppet.git.

Is it really broken?

I chatted with Tim and he explained something about the records from the files on disk being merged with the existing cdb content that has been generated by scap and/or l10nupdate. So I may have just made this whole problem up by not understanding all the parts in play.

Is it really broken?

I don't know about one year ago, but it currently is: T103879: l10n-update not updating Vector and extensions.

T72443 has been resolved. With l10nupdate now using Scap's sync-l10n, this maybe isn't an issue any more. They should now use the same directory structure, and there is also Scap's locking mechanism to avoid (human-)scap vs l10nupdate conflicts.

I'm going to close this per @Krinkle's comment. If anyone sees anything to prove otherwise, please reopen.