Page MenuHomePhabricator

Redirects to oldid URI sometimes to wrong wiki
Closed, ResolvedPublic

Description

Report by Marek Blahuš (Blahma on IRC) about content from another wiki with correct oldid being returned when requesting pages without oldid at parsoid-lb.eqiad:

Yesterday, the problem reappeared with another user, and I think I could positively identify to be related to the old problem you've mention in our discussion:

The user requested "skwiki/Dolný Vadičov", Parsoid identified it correctly as "oldid=5594681", but it returned content for "plwiki/Bartłomiej Sochański" – actually an old version of that page, the one with the same oldid.

When I explored the first problem again, I realized that the most recent oldid for "skwiki/Turie" is 5596738, which is the same as the oldid of a 1 Sep 2004 version of "enwiki/Alvin and the Chipmunks" (the article shown to the users). The fact that the English article was edited at the same time was also most probably irrelevant and only a coincidence.

As a precaution, I decided to switch my tool back to using http://parsoid.wmflabs.org/ because I never experienced this erroneous behavior with this instance. I have asked users to inform me of any future problems and will report those to you if you so wish.


Version: unspecified
Severity: normal

Details

Reference
bz60372

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:58 AM
bzimport added a project: Parsoid-Web-API.
bzimport set Reference to bz60372.

The redirect response itself is never cached, and requesting the correct URL with oldid seems to work fine. That leads me to believe that Parsoid sometimes redirects to the wrong full URI (e.g. /plwiki/Dolný Vadičov?oldid=5594681). Interestingly, http://parsoid-lb.eqiad.wikimedia.org/plwiki/Doln%C3%BD_Vadi%C4%8Dov?oldid=5594681 was in cache:

X-Cache:cp1045 hit (1), cp1058 frontend miss (0)

So somebody requested that page and version recently. Since that is a very old revision and the title is actually from skwiki it seem unlikely that this happened by chance. The oldid always wins over the title, which explains why the article with a different name is returned.

A possible work-around is to pass in the oldid explicitly. That avoids the redirect altogether.

A user has announced a new occurence of this bug and my new debugging code has logged additional data. Here's what I have:

On Wed, 29 Jan 2014 at 14:07:05 UTC, the following URL was requested:
http://parsoid-lb.eqiad.wikimedia.org/skwiki/Hrabovka%20%28okres%20Tren%C4%8D%C3%ADn%29

The response from the server, however, included the following incorrect header (change of wiki):
Location: /itwiki/Hrabovka_(okres_Tren%C4%8D%C3%ADn)?oldid=5594963

Therefore, the returned content (the redirect was automatically followed) from that new URI was an article from itwiki with the oldid of the intented article on skwiki, i.e. this article: https://it.wikipedia.org/w/index.php?title=Discussioni_utente:Kynoppy&oldid=5594963
Indeed, apart from the different PAGENAME (the provided title "Hrabovka (okres Trenčín)" was used during the rendering and this modified the output of the "welcome" template on that particular user talk page), the returned content is the same as received by calling http://parsoid.wmflabs.org/itwiki/Discussioni%20utente:Kynoppy?oldid=5594963

Therefore, I can confirm that the bug occurs already when constructing the redirect header or earlier.

As a work-around, I can check the Location header and repeat the request if it was redirected off-wiki. This could prevent the user from receiving incorrect output and let us know whether the issue is completely accidental or persists across several timely close requests.

Thanks for investigating this. So it is indeed the redirect that is at fault here. I'll look into it.

Change 110233 had a related patch set uploaded by GWicke:
Bug 60372: Work around express bug that clobbered res.locals

https://gerrit.wikimedia.org/r/110233

Change 110233 merged by jenkins-bot:
Bug 60372: Fix use of res.local function

https://gerrit.wikimedia.org/r/110233

This is now fixed in master. The fix will likely be deployed to production next week.

This seems fixed now, or at least the issue does not reappear in my case.

Thank you, Gabriel, for taking care of this!