Page MenuHomePhabricator

Spaces added before colons and links in some cases
Closed, ResolvedPublic

Description

See https://fr.wikipedia.org/w/index.php?title=COE&diff=prev&oldid=93166267 for example.

I only removed the link, but VisualEditor added a space before the colons.

Another example: https://fr.wikipedia.org/w/index.php?title=MSS&diff=prev&oldid=93166411

There was already two spaces, but VisualEditor added a third.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=51024

Details

Reference
bz48570

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:43 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz48570.

The same happens when you edit the first word of a sequence "word + word + wikilink", it will add an unnecessary space before the link. Examples,
http://it.wikipedia.org/w/index.php?title=Amazon.com&diff=prev&oldid=59604951 -
http://it.wikipedia.org/w/index.php?title=Google&diff=59622535&oldid=59377529 (<<talmente popolare ch in [[lingua inglese|inglese]] >>).
This last one is a deliberate vandalism from me to check the issue, I later rollbacked.

(In reply to comment #0)

See https://fr.wikipedia.org/w/index.php?title=COE&diff=prev&oldid=93166267
for example.

I only removed the link, but VisualEditor added a space before the colons.

Another example:
https://fr.wikipedia.org/w/index.php?title=MSS&diff=prev&oldid=93166411

There was already two spaces, but VisualEditor added a third.

These have been now fixed - sorry for the difficulty.

(In reply to comment #1)

The same happens when you edit the first word of a sequence "word + word +
wikilink", it will add an unnecessary space before the link. Examples,
http://it.wikipedia.org/w/index.php?title=Amazon.
com&diff=prev&oldid=59604951 -
http://it.wikipedia.org/w/index.php?title=Google&diff=59622535&oldid=59377529
(<<talmente popolare ch in [[lingua inglese|inglese]] >>).
This last one is a deliberate vandalism from me to check the issue, I later
rollbacked.

I *think* this is also fixed but I'm not absolutely sure; please re-open if you can still reproduce this.

We unfortunately can:
http://en.wikipedia.org/w/index.php?title=Little_Mosque_on_the_Prairie&diff=prev&oldid=566820635

It seems to be triggered by removal of '.
Another user verified this (not actually saving, only preview mode) in http://en.wikipedia.org/wiki/Indian_Bank , removing the ' after "Binny&Co" would cause a whitespace to be added right before the next wikilink.

It's not just ' either, also seen when removing - and " in my sandbox https://en.wikipedia.org/w/index.php?title=User%3AThryduulf%2Fsandbox&diff=566841354&oldid=566840716

spaces were added before internal and external links, except in the first instance where a space was added before a colon instead.

Could the fix to bug 2035 be related to this?

Ignore last comment, should be: Could the fix to bug 52035 be related to this?

This looks like 1. a Parsoid issue, 2. fixed in master. Reassigning to Parsoid to confirm.

  • Bug 51024 has been marked as a duplicate of this bug. ***

Bug 51024 has several more examples of this bug occurring if anyone is looking for them.

Confirmed on simple test case. Investigating.

[subbu@earth tests] echo "foo : bar" > /tmp/wt
[subbu@earth tests] node parse < /tmp/wt > /tmp/old.html
[subbu@earth tests] cat /tmp/old.html | sed 's/bar/bars/g;' > /tmp/new.html
[subbu@earth tests] node parse --html2wt --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html
foo : bars

Change 84701 had a related patch set uploaded by Subramanya Sastry:
(Bug 48570) Fix subtle selser bug handling separator-only nodes

https://gerrit.wikimedia.org/r/84701

  • Bug 50637 has been marked as a duplicate of this bug. ***

Change 84701 merged by jenkins-bot:
(Bug 48570) Fix subtle selser bug handling separator-only nodes

https://gerrit.wikimedia.org/r/84701

Can it be this is happening again? At cs.wp they are filtering edits adding extra spaces, like this or this (I haven't much info ATM).

@Elitre: This doesn't look like space before colons or links .. Could they be null edits / edit undos that might have left behind spaces in VE that Parsoid is preserving?

(Sorry if I'm adding to the wrong task, I think a more relevant one was closed as dupe of this one. I haven't heard from cs.wp yet but it looks like there are too many occurrences to be the case, IMHO).

According to a cs.wp editor such changes are intentional, so possibly no need to investigate further. Thanks.