Page MenuHomePhabricator

Odd session bugs
Closed, DeclinedPublic

Description

(When I say "logged-in", I'm referring to what the MediaWiki interface is showing in the sidebar/toolbar/headers etc. e.g. "Krinkle / My talk" or "Login/Signup").

  1. I'm logged in on MediaWiki.org (running 1.20wmf1)
  2. Navigating works fine, I remain logged-in
  3. Go to an edit page, I remain logged-in
  4. Making a few additions, then hitting "Show preview"
  5. I'm suddenly logged out
  6. Page shows the diff but starts with error "Session lost, try logging out and logging back in again" (which is odd in itself since I'm allegedly logged out already)
  7. Click "Show preview", again
  8. Now I'm magically logged back in
  9. Diff shows as usual
  10. Click "Show preview", again
  11. Logged out again... like 4)
  12. Click "Show preview", again
  13. Stil logged out...
  14. Clicking "Login/Signup" on the top right brings me to Special:UserLogin, *but* while there I notice that I'm not magically logged-in already so there is no need to actually use the login form, the header shows "Krinkle / My talk"
  15. pressing back shows session error again and I'm back at 4)

Version: unspecified
Severity: major

Details

Reference
bz35900

Event Timeline

Also sometimes when I press "Save page" (on an edit page where I appear to be logged in) I am POST-Redirect-GET'ed as usual, but the target view shows me logged out..

then it is up to random choice whether my edit was saved as Krinkle or under my IP address.

https://www.mediawiki.org/w/index.php?title=ResourceLoader/Default_modules&oldid=523749 and https://www.mediawiki.org/w/index.php?title=ResourceLoader/Default_modules&oldid=523747 were unexpectedly saved exposing my IP-address.

Created attachment 10406
Logged-out edit view (with Cookies inspector in Chrome)

Attached:

Screen_Shot_2012-04-12_at_12.41.04_AM.png (788×1 px, 276 KB)

Created attachment 10407
Logged-in edit view (with Cookies inspector in Chrome)

This shot was taken directly after the previous one, only a mere refresh was in between.

Attached:

Screen_Shot_2012-04-12_at_12.41.29_AM.png (788×824 px, 234 KB)

This gets reported also on de.wp (which run 1.19wmf1 (Version 114429) at the moment). Sounds not like a 1.20wmf1 problem.

There was a dead server in the memcached pool, which would presumably have sent 1 in every 78 sessions nowhere. I swapped out the dead memc server with a spare, so this should be fixed now. You may still experience problems if your session *started* before I fixed it.

This appears to be fixed. There may still be corrupted sessions from before this was fixed (around the time of comment 5, 2012-04-18 19:27:47 UTC), so if you haven't logged out/in since that time, try that before reopening. Bumping up the priority on this so that it doesn't fall off of our list should it be reopened.

Still happening six days on, so reopening. Suggestion that it is "worse than ever" - indeed, from the commentary I suspect the problem actually alleviated somewhat, then came back with a vengeance.

Consequently, it's far from obvious to me that this issue has been resolved.

Recent report: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#System_logging_me_out

(Also, rm "1.20wmf1" from the name of the bug, since it started occurring on en.wp before en.wp received the update.)

KilliondudeWP wrote:

I'm experiencing something similar currently (Firefox 11) where I'm *navigating* (not editing) the site and all the sudden I notice my gadgets aren't working and I'm logged out. Happened about 4 times in a 10 minute window.

(In reply to comment #1)

then it is up to random choice whether my edit was saved as Krinkle or under my
IP address.

That should have never ever happened. If you were shown as logged in in the Save page, it should never save as IP, you should have hitted a session lost message. If the edit page opened as IP but you didn't notice, it could happen, of course.

Ok, here's my understanding of things so far. We suspect there's a bug in the memcache client implementation that we use, which would result in occassional session corruption. Roan and Asher have taken this about as far as they can, and now Roan is asking that Tim take a look at this.

More details: Roan ran mctest.php against our production servers. Rather than returning 100% success or 0% (down server), for many servers, this was somewhere in between. Roan and Asher then looked at the results in tcpdump, and found that the servers were returning the correct values, but the client wasn't processing them correctly for whatever reason.

Tim, could you take a look at this, and see what is up here?

(In reply to comment #10)

More details: Roan ran mctest.php against our production servers. Rather than
returning 100% success or 0% (down server), for many servers, this was
somewhere in between. Roan and Asher then looked at the results in tcpdump,
and found that the servers were returning the correct values, but the client
wasn't processing them correctly for whatever reason.

mctest.php has always returned such results, even when there are no user-visible site issues.

I gathered output from strace and tcpdump while mctest.php was running. It suggests that there is packet loss, and that it is leading to read timeouts (after 100ms). The client does not correctly handle a read timeout, it continues regardless, leading to protocol violations.

I don't know yet if that has anything to do with this bug, but I can try increasing the timeouts.

The timeout in production was 500ms, whereas in mctest.php it was 100ms, I guess that's why mctest.php fails so often. I raised it to 3000ms in both.

Another user reported very similar symptoms this morning, unfortunately.

elenoftheroads wrote:

(In reply to comment #14)

Another user reported very similar symptoms this morning, unfortunately.

That was me - around 11.00 UTC. Taking four or five attempts to save an edit, with the "loss of session data" error message, occurred in edits on three pages over about a 15-20 minute period. It didn't log me out though.

Chris, per our conversation this morning, could you take a look at this one?

Recapping (Tim/Rob/etc., please jump in with corrections/clarifications):

Our memcached cluster (which handles session information) has been experiencing stability issues in the last few weeks. Over the weekend, we rebooted all memcached boxes to deal with a particular kernel issue. Unfortunately there's no graceful failover, which caused session issues to be temporarily exacerbated.

As far as we know the cluster should be stable at this point in time. However, you may need to log out and log back in in order to fix the issue for your account.

We're planning to add database backing to session handling and improve the MediaWiki integration, to improve stability in the long term.

If you're still experiencing session issues after logging out, please report them in this bug. Let's keep the bug open for a few days at least.

elenoftheroads wrote:

New effect of this problem being reported today 1 May at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#April_30 Two users (one on enwp one on Commons) report extreme slowness, failure to save edits, and loading the skin without .css or .js Not clear whether either user has tried logging out/in.

afeldman wrote:

Not loading .css / .js; sounds like those reports are of a different issue. We had a site outage yesterday resulting from a code deploy failure. All js/css access was disrupted for some users for a time.

http://wikitech.wikimedia.org/view/Incident_documentation/April_30,_2012

I was able to reproduce the effects that have been reported ("not logged in" error when Saving page, staying logged into the site, seeing edits from my ip address instead of my user) by yanking out memcached from under live sessions. It looks to me like that was likely the cause.

Chris, can you provide the steps you took?

1-Click edit
2-Change page
3-Delete the session file
4-Press save
5-The change is saved as your ip and you appear as logged in???

None of the points on 5 should be possible.