Page MenuHomePhabricator

Quotes and whitespace normalized in unedited <ref> tags
Closed, ResolvedPublic

Description

Filing this since subbu wanted to take a closer look ("Yes, VE/Parsoid shouldn't be making dirty diffs -- and these kind of dirty diffs is an exception rather than the rule. This is a bug."):

<<VE seems to make unnecessary changes. I wouldn't mind so much if it always helped to maintain the layout of invisible elements, but normally it just screws them (like infoboxes, or position of templates), without any effect on the rendered page but with a less easily readable wikitext editing window.
What I notice now though are totally unnecessary and hard to explain changes to ref names. For some reason, VE decides that all ref names need to be in quotes, which is not true at all and doesn't help one bit. It also decides that between the final quote and the forward slash, there needs to be a space. All this has no effect on the rendered page, and doesn't make the wikitext page any easier (or harder) to read, so why does VE bother doing this, which only helps to make diffs a lot longer.
Take a look at e.g. [10]: can you easily see if any significant change has been made, and what that may be?
First difference; <ref name="globsec-arjun"/> becomes
<ref name="globsec-arjun" /> Further: <ref name=nomorearjuns> becomes <ref name="nomorearjuns">
and so on. I have not found the actual significant difference between the old and new version. What is the purpose and benefit of this? If VE would always make the "invisible" layout clearer and better, fine, but VE doesn't care one bit about wikitext layout, so I don't get this useless editing. Fram (talk) 09:27, 7 October 2013 (UTC)>>

https://en.wikipedia.org/w/index.php?title=Arjun_%28tank%29&curid=7648336&diff=576092971&oldid=575064244


Version: unspecified
Severity: normal

Details

Reference
bz55461

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:15 AM
bzimport set Reference to bz55461.

Yup, that's a bug. VE should not be normalizing <ref name="globsec-arjun"/> to <ref name="globsec-arjun" /> nor should it be normalizing <ref name=nomorearjuns> to <ref name="nomorearjuns">.

Since the revision wasn't tagged with the visualeditor-check tag, VE didn't detect any corruption on its end, which makes me suspect that this is a Parsoid bug.

This seems to be related to bugs in selser (Parsoid) when a cache URL is set. I've set a cache URL locally and will investigate.

Gabriel and I now think this happened when Parsoid incurred a cache miss for the HTML (rare) and also suffered a timeout on reparsing (possible on large pages since we had a timeout of 16 sec). But, a couple commits today should address this and further reduce the likelihood of selser not getting triggered on the edit.

The relevant commits are:

  1. https://gerrit.wikimedia.org/r/#/c/93896/ (increases cache timeout to 60 sec)
  2. https://gerrit.wikimedia.org/r/#/c/93902/ (improves parse time when selser suffers a cache miss fetching original HTML).

The new code I mentioned in #c3 hasn't been deployed to production yet. Will update this ticket when that happens.

New code has been deployed with a bunch of fixes related to cache misses that would cause Parsoid to run the regular serializer on the entire dom. This should fix this issue on most pages.

We'll also make some fixes to minimize dirty diffs even when there is a cache miss -- we've identified some improvements we could make to the DOM diff algorithm.

Wikifram wrote:

Not just the quotes, also the additional spaces: [https://en.wikipedia.org/w/index.php?title=Gray_wolf&curid=33702&diff=581450926&oldid=581450732] I can't judge whether this is fixed on most pages or not, but it certainly seems to happen quite reguilarly still.

Yes, those were probably from cache misses. We deployed newer code y'day that should eliminate these unrelated normalizations in most cases (barring bugs).

Note however that if a <ref> is edited, Parsoid will still normalize the quotes and whitespace in the attributes of the edited ref.

Let us wait for a week to see if we still see unrelaed normalizations and if not, close this as resolved.

This last diff seems to have happened in the context of some error the editor encountered. Recording here for further debugging.

https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=581952520#Error:Unknown_error

The diff in comment #10 might have been caused by an unrelated issue with one of the Varnish caches in production.

I believe this was fixed by 67fca5bdc7 and 923c79102a.

These test cases from comments above now round-trip fine:

Please reopen if this problem still exists.

  • Bug 64197 has been marked as a duplicate of this bug. ***
  • Bug 64162 has been marked as a duplicate of this bug. ***

This is just a note that Bug 64197, merged here, is about {{sfn}} templates, which sort of are and sort of aren't ref tags. See https://en.wikipedia.org/w/index.php?title=Jack_Parsons_%28rocket_engineer%29&diff=604788783&oldid=604787564

Here's another diff that is probably the same bug:

https://en.wikipedia.org/w/index.php?title=Rights&curid=51490&diff=603988349&oldid=601154438

In this one, an oddly named-but-unnamed ref tag (<ref name="">) has its whitespace normalized.

Arlolra added a project: Parsoid.
Arlolra set Security to None.
ssastry claimed this task.

Closing this task since there is nothing much more to be done here since we cannot protect against varnish cache failures -- these should be a fairly rare occurrence. Once we move to RESTbase, this should not happen anymore.