Page MenuHomePhabricator

Excessive punctuation highlighting in wikidiff2
Closed, ResolvedPublic

Description

Author: mr.heat

Description:
Short: Do not colorize characters red that did not changed.

Long: A few weeks ago the diff algorithm was changed in all Wikipedia projects. If a single character was changed in earlier versions, this single character was marked red. Now, all non-whitespace characters around this character become red. In many cases this marks so much red, it becomes impossible to see what really changed, especially when editing templates, links or images.

I think the idea was to make edited whitespace visible (was invisible in earlier versions of the diff). If this is true, why don't you make the whitespace visible? Only the whitespace? Not surrounding text that did not changed?

Here are a few really bad examples:

http://de.wikipedia.org/w/index.php?title=Giovanni_Kessler&diff=prev&oldid=86716346
http://de.wikipedia.org/w/index.php?title=Terraria&curid=6244640&diff=97103203&oldid=97093972
http://de.wikipedia.org/w/index.php?title=Giovanni_Kessler&diff=prev&oldid=86716346
http://de.wikipedia.org/w/index.php?title=Hans_Bentzien&diff=prev&oldid=78225726

I created a script to fix this issue (I know it can't work in all cases, it's just a bad hack to fix at least some of the issues):

http://de.wikipedia.org/wiki/Benutzer_Diskussion:TMg/cleanDiff.js


Version: unspecified
Severity: major
URL: http://de.wikipedia.org/wiki/Benutzer_Diskussion:TMg/cleanDiff.js

Details

Reference
bz33331

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:05 AM
bzimport added a project: wikidiff2.
bzimport set Reference to bz33331.
bzimport added a subscriber: Unknown Object (MLST).

https://en.wikipedia.org/wiki/User:Cacycle/wikEdDiff is a gadget on enwiki that does something similar. If you're interested in fixing wikidiff the source is available.

mr.heat wrote:

Before editing source that may or may not the cause, there are some questions: When was this changed and why? Is this a configuration issue or something in the CPP source of the extension? In which revision was this bug introduced?

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/wikidiff2/?view=log

I think it was intended to be a feature but from my point of view it's a bug. At wikipedia.org the diff was better in October 2011 (as described above) and got worse in November 2011. Why isn't it possible to simply go back to the old version (as requested in bug #32601)?

wikEdDiff is no solution (neither is my script).

(In reply to comment #2)

Why isn't it possible to simply go back to the old
version (as requested in bug #32601)?

Because Bug 26038, bug 25725 and bug 27993 as well as maybe some Thai support (see r67994) depend on the new version. It is better to fix the problem that was introduced instead of re-introducing four old problems.

  • This bug has been marked as a duplicate of bug 32601 ***

Reopening. Thank you for the report, I didn't know there was an issue with punctuation highlighting. The word splitting algorithm was rewritten in version 1.1.0 of wikidiff2 which was deployed on November 2 per bug 27720.

mr.heat wrote:

Does this mean you still colorize full words in red even if only a single character changed? Why? The previous version was good, it was marking single characters only. What was the problem? Where was this change discussed?

Neither bug 26038 (replacing a dash with another dash) not bug 25725 (removing some whitespace from the HTML output) nor bug 27993 (bad diff in the last line, as far as I understand) nor some Thai support (that's a Wikipedia with 70,000 articles) can explain why the diff algorithm was changed so extreme for all languages (for a total of over 10 million articles!).

(In reply to comment #6)

Does this mean you still colorize full words in red even if only a single
character changed? Why? The previous version was good, it was marking single
characters only. What was the problem? Where was this change discussed?

Yes, the full word will be highlighted even if only a single character is changed. This has been the behaviour of the diff engine on Wikipedia since 2002, except in Chinese, Japanese and Thai text. Before that, a line-by-line diff was used. We've never had character-level diffs for European languages.

mr.heat wrote:

You are right. I'm sorry, I mixed something. I will wait, and when the fix is live in the Wikipedia projects I will try to improve my user script as far as I can and ask other users what they think about character-level diff. I will post my results in an other report. Thank you so far.

I came here to report this same bug, but found this, and am glad that it is taken care of. Will it come live on Wikimedia wikis with 1.19?

(In reply to comment #9)

I came here to report this same bug, but found this, and am glad that it is
taken care of. Will it come live on Wikimedia wikis with 1.19?

You can check the beta site: http://beta.wmflabs.org/

Okay, thanks. I guess it is not in 1.19, then, given the following diffs:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Problem_reports&curid=10&diff=265&oldid=264 (would expect only ":" to be highlighted on the right)

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Problem_reports&curid=10&diff=261&oldid=260 (would expect only "337121" to be highlighted on the left)

PS! Your script is really handy, TMg!

mr.heat wrote:

According to
http://labs.wikimedia.beta.wmflabs.org/wiki/Special:Version
that wiki is running 1.19alpha (r109243) but obviously the bug is not fixed.

All non-whitespace characters left and right of a change are highlighted including all punctuation characters.
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=112

And why is the space at the end of this change highlighted? The space was not changed.
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=116

Fix deployed. Note that wikidiff2 is deployed separately to MediaWiki, so the MediaWiki version doesn't tell you anything. Labs was not running the new version.

mr.heat wrote:

Is this some kind of test or game to you? Breaking features and basically ignoring all complaints by marking them as invalid or fixed? Aren't you able to look at the examples? The bug is not fixed.

One of the examples looks better now:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=116

Everything else is still broken:

http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Talk&diff=1492&oldid=1491
http://labs.wikimedia.beta.wmflabs.org/w/index.php?title=Test_cases&diff=prev&oldid=112

How hard can it be to revert the single change that broke the diff algorithm?

I'm very sorry, I don't want to be personal. But the diff is an essential feature for me and this really drives me crazy.

mr.heat wrote:

It seems you are talking about the Wikipedia projects and not about Labs. How should I know?

http://de.wikipedia.org/w/index.php?title=Wikipedia:Spielwiese&diff=98698173&oldid=98698134

I'm sorry. Resolved. Fixed.

Labs is new and I haven't ever logged into it or fixed anything on it. At present, it's not really my problem if something is broken on it. I'm sorry Mark gave you a link to it, anything you see there in relation to wikidiff2 is very unlikely to be related to what is going on on the main cluster.

looks fixed now. Ryan Lane told me to restart memcached.