Page MenuHomePhabricator

Removing some whitespace is not detected as a change in some circumstances (selser issue?)
Closed, ResolvedPublic

Description

https://en.wikipedia.org/wiki/Michael_Lowry_%28actor%29?veaction=edit - removing the spaces before the reference at the end of the second paragraph leads to "
Could not start the review because your revision matches the latest version of this page.
" when you click "review my changes". If you instead tries to save, it will silently succeed in doing so - at which point it does not actually remove the spaces.


Version: unspecified
Severity: normal

Details

Reference
bz50756

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:44 AM
bzimport set Reference to bz50756.

joedecker wrote:

This is, for what it's worth, a surprisingly common operation in cleaning up articles creating articles for new users at Articles for Creation and the like. Nobody gets [[:w:WP:PAIC]] right the first time, and extra spacing is one of the two most common errors relative to that policy.

joedecker wrote:

Still seeing this. For example, it is impossible, using the Visual Editor, to remove the incorrect space before the first reference at https://en.wikipedia.org/w/index.php?title=Jim_Carroll_(author)&oldid=539841137

Can confirm that the HTML going from VE to Parsoid doesn't include the whitespace - it's being re-inserted by Parsoid for some reason, possibly due to a selser bug?

This seems to be working on latest master (see below). I'll test this against VE in my local mediawiki after installing the cite extension and report back later today.


[subbu@earth lib] echo "foo <ref>bar</ref>\n\n<references />" > /tmp/wt
[subbu@earth lib] node parse --fetchConfig false --extensions ref,references < /tmp/wt > /tmp/old.html
[subbu@earth lib] sed 's/foo <span/foo<span/g;' < /tmp/old.html > /tmp/new.html
[subbu@earth lib] node parse --selser --oldhtmlfile /tmp/old.html --oldtextfile /tmp/wt < /tmp/new.html
foo<ref>bar</ref>

<references />

joedecker wrote:

As the person who pointed Oliver at this example originally, I went back to the version I'd had trouble with, and verified that the problem does not seem to be occurring now. You can see my test attempt with this diff:

https://en.wikipedia.org/w/index.php?title=Michael_Lowry_%28actor%29&diff=583277441&oldid=542893928

In summary: I haven't done any wider testing, but at least the example that caused me to report this appears resolved.

After some more digging and playing around in VE, I was able to reproduce it with the following snippet: "'''foo''' <ref>bar</ref>\n\n<references />"

With this example, I can indeed reproduce this on the command line. Investigating.

This proved to be a great bug to fix a whole bunch of related things in Parsoid. It led to about 10-15 patches indirectly.

The final patch that fixes the bug (with full history) is:
https://gerrit.wikimedia.org/r/#/c/97746/ (patch before Parsoid repo split)
https://gerrit.wikimedia.org/r/#/c/102012/ (latest version after migration into new repo)

This should get merged soon, but the code will not get deployed until January since we are now entering holiday code freeze time now.