Currently when we make site software updates with scap, sync-common-all, etc the web servers are still running while they work.
This has the unfortunate side effect that a portion of web requests will come in to a server whose copy of MediaWiki is only partially updated, which can cause transient but very scary-looking errors. A common type of error is where files in different directories are both changed and have a dependency on each other; especially problematic with skin files since skins may be synced out ahead of time... this can toss up big scary PHP fatal errors or exceptions.
We want the updates to be atomic, so any given request will get _either_ the old deployment version _or_ the new version, but never a mix.
There's two main ways we could implement this:
- Shut down Apache before rsync, restart it after.
Simple, but could make updates slower, or leave us with most machines out of service simultaneously for a minute or two.
- rsync to a staging directory, then swap the entire thing out for the live one.
(I'm not sure if it's possible to totally atomically swap out two directories in posix semantics.)
or maybe also
- rsync to a staging directory, then swap which directory we refer to in the .conf files and do an apachectl graceful restart.
This would avoid holes in response time, but we may have a magical moving directory which could be confusing madness. :)
(Another thing to consider might be keeping the 'live' skin and extension JS/CSS files in a separate subdir, so we can update those en masse first with no code safety issues, then run the code updates -- atomic per server -- guaranteeing we'll have the new css/JS on all new hits.)
Version: unspecified
Severity: enhancement
Whiteboard: deploysprint-13