Page MenuHomePhabricator

Headings appear twice, instead of edit section links
Closed, ResolvedPublic

Assigned To
Authored By
Tbleher
Jan 9 2011, 11:13 AM
Referenced Files
F7145: LocalSettings_public.php
Nov 21 2014, 11:16 PM
F7143: Technikworkshop-correct.html
Nov 21 2014, 11:16 PM
F7144: Technikworkshop-incorrect.html
Nov 21 2014, 11:16 PM
F7142: correct-headings.png
Nov 21 2014, 11:16 PM

Description

Screenshot showing the buggy behaviour

Using current phase3 trunk (r79878), headings appear twice, and section edit links are missing. It worked correctly on r79519. Therefore I suspect the parser changes made by Daniel Friesen. Attached are two screenshots that show the buggy and intended behaviour.

I should also add that I use a small Javascript function to move edit section links, like it is or was used on the german wikipedia. It is included below.

function moveEditsection() {

if (typeof oldEditsectionLinks == 'undefined' || oldEditsectionLinks == false) {
  var spans = document.getElementsByTagName("span");
  for(var i = 0; i < spans.length; i++) {
    if(spans[i].className == "editsection") {
      spans[i].style.fontSize = "x-small";
      spans[i].style.fontWeight = "normal";
      spans[i].style.cssFloat = "none";
      spans[i].style.marginLeft = "0px";
      spans[i].parentNode.appendChild(document.createTextNode(" "));
      spans[i].parentNode.appendChild(spans[i]);
    }
  }
}

}


Version: 1.18.x
Severity: normal

Attached:

double-headings.png (397×829 px, 31 KB)

Details

Reference
bz26639

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:16 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz26639.

Created attachment 7963
Screenshot showing the correct behaviour

Attached:

correct-headings.png (400×823 px, 32 KB)

Link to url with the issue please.

I can't identify the cause of the issue without page source.

And... does the problem occur without the script?

The page that is shown in the screenshot is http://spiele.j-crew.de/wiki/Technikworkshop . I have however already reverted to the old version because of the bug (so you won't see the issue directly, but you can examine the page source). It occured on multiple pages, so it doesn't seem too page specific.
I should also note that I had to explicitly purge the page to see the new behaviour, which I found a bit strange - after reading the commit messages, I had assumed that old parsed pages would be evicted from the cache automatically.

I downloaded the page using wget, and the headings are already doubled in the page source so it seems to be unrelated to the Javascript code I posted.

Old version of first header:
<h2><span class="editsection">[<a href="/w/index.php?title=Technikworkshop&amp;action=edit&amp;section=1" title="Abschnitt bearbeiten: Kurzbeschreibung">Bearbeiten</a>]</span> <span class="mw-headline" id="Kurzbeschreibung"><a href="/wiki/Datei:Info.png" class="image"><img alt="" src="/images/b/b3/Info.png" width="32" height="32" /></a> Kurzbeschreibung</span></h2>

New version of the first header:
<h2>Kurzbeschreibung <span class="mw-headline" id="Kurzbeschreibung"><a href="/wiki/Datei:Info.png" class="image"><img alt="" src="/images/b/b3/Info.png" width="32" height="32" /></a> Kurzbeschreibung</span></h2>

As you can see, the word "Kurzbeschreibung" is doubled.

Created attachment 7965
The correct html output, from phase3 r79519

Attached:

Created attachment 7966
The incorrect html output, from phase3 r79878

Attached:

Hmmm... this is strange behavior...

It seams like for some reason something is stripping the <editsection [...]>Text</editsection> down to "Text".

But everything worked fine when I coded it... parser tests to. and those included stuff in other languages.

Created attachment 7985
LocalSettings.php file which exhibits the problem

In the hope of making progress on this bug: here is a (slightly redacted for passwords and such) version of LocalSettings.php, with which I can reliably reproduce the bug. I tested without any extensions, and the bug is still there. Any ideas what could cause the bug?

Attached:

(In reply to comment #8)

Created attachment 7985 [details]
LocalSettings.php file which exhibits the problem

In the hope of making progress on this bug: here is a (slightly redacted for
passwords and such) version of LocalSettings.php, with which I can reliably
reproduce the bug. I tested without any extensions, and the bug is still there.
Any ideas what could cause the bug?

I tried some of those settings related to the parser but didn't have any luck. Try disabling tidy to see if that's causing the issue. I tried using the same setting but I'm not sure if I actually have tidy configured on my server.

Attached:

(In reply to comment #9)

Try disabling tidy to see if that's causing the issue. I tried using the same
setting but I'm not sure if I actually have tidy configured on my server.

Yes, this is it! When disabling tidy, everything works correctly.

So the next step is probably to find out why tidy messes up the page this way...

Daniel, how did tidy cause this? Can you track it down?

"how" tidy is causing the issue is fairly obvious without even browsing the code. It sounds like tidy is being run after we insert the editsection tokens, but before we replace them with real editsection tokens. The editsection tokens are made in the format <editsection page="..." section="...">...</editsection> specifically because that's the only way to get the LanguageConverter to treat the information in the exact way we want it to. However, unfortunately editsection is not a real html tag, so when tidy sees it before we replace it with editsection markup (which we want to let be customized) it decides to strip out the tag part and leave nothing but the heading hint behind, which is usually the same text as the header's text, leading to that text being displayed twice instead of having a header and an editsection link.

Now the "solution" is less obvious. Perhaps I'll experiment and see if I can use something like a mw: or mediawiki: xml/xhtml namespace and use something like <mw:editsection [...]> instead of <editsection> which is in the default xhtml/html namespace and trick tidy into leaving those tags alone.

Experimenting a little, Tidy seams to completely ignore the fact that xml and xml namespaces are valid in xhtml and decide to strip out valid namespaced xml anyways. So even when using <mw:editsection> and defining a valid xhtml namespace in the code we send to tidy, it still decides to ignore that and strip it out.

The only thing I've seen to have an effect is adding 'input-xml: yes' to the tidy.conf, however that might have other unwanted effects someone who knows tidy better should point out.

Hmm, so there seem to be several ways forward:

  1. Modify tidy config and add 'input-xml: yes'
  2. Remove $wgUseTidy, and use the non-tidy-mode in MediaWiki
  3. Adapt the parser and the LanguageConverter so they can work together, without interfering with tidy
  4. Revert the feature for now

Option 1 sounds a bit dangerous, as tidy is explicitly used to clean up faulty markup; using an XML parser that doesn't correct errors seems wrong in this context (The option is described as "This option specifies if Tidy should use the XML parser rather than the error correcting HTML parser.") That said: I haven't experimented with this config option, so I may be completely wrong on this point.

Option 2 seems quite unrealistic (see eg bug 12221 for what can happen without tidy installed). This change could break page rendering in lots of "wrong-but-worked-until-now" cases.

I do not know the code well enough to say if option 3 is viable, but from the comments so far it doesn't seem so easy.

IMHO, it would therefore be best to go for option 4 and revert the changes for now, until a better solution is found. This would at least make trunk work again in this common configuration.

Any other options/opinions?

  1. Make tidy run on the parser output text each time it's fetched, instead of running on the text before saving it to the parser output

Though looking at the code we have ParserBeforeTidy and ParserAfterTidy and trying to remove tidy from Parser::parse might not be totally desirable.

  1. Hack up MWTidy so that it replaces things like <mw:editsection> with something like the parser's UNIQ tokens, runs tidy (making tidy ignore the tokens) and then replaces the tokens with the original blocks.

May have some effects like making tidy not work inside anything inside a <mw:...> block, and it might fall apart with nesting, but it's probably the best option.

Are you still seeing this bug in trunk? I could swear I still have trunk installed, I don't have any special local changes to fix the bug left, but I can't reproduce the bug anymore.

Still happens with r81507.
$wgUseTidy = false --> everything is OK
$wgUseTidy = true --> headings messed up as described

Hmmm, ok, either this has something to do with me upgrading some stuff on my server probably not or there's something wrong with the output in the pages I'm using to test on my test wiki causing XHTML errors tidy can't handle and instead results in tidy being ignored... guess I need to play with test pages.

I tried making a change in r81583 to hide these markers from tidy.

See if this fixed bug.

Thanks, that fixes the issue for me!