Page MenuHomePhabricator

Certain entries in parser cache on ca.wikipedia.org do not have Tidy enabled
Closed, ResolvedPublic

Description

Author: prlpz

Description:
Some users see left side menus and upper side tabs superposed to table content in several pages of Catalan Wikipedia. I uploaded an screenshot of the problem at https://commons.wikimedia.org/wiki/File:Men%C3%BA_solapat.png

Several users reported at https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna/Ajuda#Viquip.C3.A8dia_desquadrada


Version: wmf-deployment
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=38273

Details

Reference
bz58042

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:21 AM
bzimport set Reference to bz58042.
bzimport added a subscriber: Unknown Object (MLST).

Andre, Greg, this is the bug report I mentioned on IRC.

One of the pages where users have seen this problem is https://ca.wikipedia.org/wiki/Guerres_p%C3%ADrriques . The editor reported "seeing the Wikipedia globe in the middle of the screen".

Some editors touched/removed a template thinking that that was the cause, but I can't see how a template can move the site logo and make the content overlap with the left navigation. In general it seems that editing the page and saving fixes the issue.

I can't reproduce it, but several editors with 10000s of edits in their backs are reporting it.

Hi Pere,

Can you please disable all gadgets, just to see? I'm curious what happens with a plain account.

Can you reproduce this when you are not logged in?

I can reproduce the issue at https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna with my staff account and also logged out (in Firefox and Chromium, respectively).

Now, what's special about that page?

I could now reproduce the problem at https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna both with Firefox and Chrome, and from two different accounts, one with Beta Features enabled, another one without them. Beta Features was my second suspect after gadgets, but if you could see the problem logged out, both are out of question.

Note that the problem is happening in other pages, regular articles. I will ask for specific URLs.

Some corrupted cache file somewhere affecting CSS?

When I run the page through html validator (when its displaying wrong), I notice a bunch of unclosed element errors. When I run the same page through html validator (but logged in, where I get the non-broken version), no errors about unclosed elements.

I think one of the servers has a broken version of tidy, and is not running the pages through tidy (or alternatively somehow some extension is parsing pages, putting them in parser cache, but not enabling tidy). invalid html in a template, can thus cause the issue you are seeing. It would only appear for some people, as different people might have different preferences (In my example, logged in and logged out I have different thumbnail preferences I think)

It would only appear for
some people, as different people might have different preferences (In my
example, logged in and logged out I have different thumbnail preferences I
think)

Sorry, I read that over, and realized it didn't quite make sense. The broken version would be saved in parser cache. People with different prefs get served different versions from parser cache. An ?action=purge or edit would appear to (temporarily) fix the issue, since that would flush the current version from parser cache.

One thing I noticed, on the broken pages, no limit report is shown. This further suggests that somewhere things are being parsed with the wrong ParserOptions, and then being saved to parser cache.

Created attachment 13993
HTML of a broken page

I was able to reproduce when logged out as well. Attaching the HTML of broken page. It is a little funky, missing the parser report – such bug was reported before (can't find it now :( ) and was fixed.

Attached:

Found it! Bug 38273. (Parser report, aka NewPP limit report.)

cc'ing maxsem in case its mobile related, given its similarity to bug 38273 per comment 10.

Translating a comment at the Catalan village pump by Vriullop (CCed here), just in case it's useful:

"What these corrupted pages have in common is that their source, instead of showing "NewPP limit report" they have "Saved in parser cache with key cawiki:pcache:idhash:16532-0!*!0!!ca!4!*".

Also, is this problem really happening only in ca.wiki? I guess we would have heard about it if en.wiki and other major projects would run into the same problem.

I just wonder whether ca.wiki editors are using some gadget / template / something that is causing this problem, or it is purely a server side problem and all what ca.wiki editors can do is to wait.

I don't see how client-side anything could affect this. The bug is only visible on pages with misnested HTML, and probably only for pages which were updated since it started occuring (cached renders should be still okay). It might also be dependent on which server the code is running on internally (if the suspicion about missing/broken Tidy is correct).

We really need to get some parser people on this to check if this is caused by something similar to bug 38273 in some extension. Tim?

(In reply to comment #12)

Translating a comment at the Catalan village pump by Vriullop (CCed here),
just
in case it's useful:

"What these corrupted pages have in common is that their source, instead of
showing "NewPP limit report" they have "Saved in parser cache with key
cawiki:pcache:idhash:16532-0!*!0!!ca!4!*".

All pages have saved in parser cache line. The missing pp limit report is what identifies pages with incorrect parserOptions

(In reply to comment #14)

I don't see how client-side anything could affect this. The bug is only
visible
on pages with misnested HTML, and probably only for pages which were updated
since it started occuring (cached renders should be still okay). It might
also
be dependent on which server the code is running on internally (if the
suspicion about missing/broken Tidy is correct).

We really need to get some parser people on this to check if this is caused
by
something similar to bug 38273 in some extension. Tim?

Someone could make parser cache throw an exception if a page without pplimit is saved, that would draw attention to the issue rather fast (if you are going to fail, might as well fail hard instead of subtly)

Just a reminder: this is still happening in ca.wiki. We have asked editors to link from time to time to new pages where they see this problem:

https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna/Ajuda#Viquip.C3.A8dia_desquadrada

(In reply to comment #14)

We really need to get some parser people on this to check if this is caused
by
something similar to bug 38273 in some extension. Tim?

I'm pretty sure this is due to how CirrusSearch used to work with the parser cache (so yeah, bug 38273 in new form).

We fetched our content from WikiPage::getParserOutput() with default ParserOptions. This was mostly incorrect since we (A) want the canonical options and (B) don't want to actually *save* to the pcache, just read from it.

Both have been fixed since then, which would explain why reindexing the pages doesn't fix the problem and purging a broken page does fix it.

We haven't heard any complains from editors in the past week. Broken pages that were purged got fixed and have stayed fixed so far.

I'm closing this bug for now. We will reopen if the issue reappears.