Page MenuHomePhabricator

Parsoid: Links created with VE are piped links even when linktrail could be used
Open, MediumPublic

Description

Currently the VisualEditor linkes like this: [[Glas|Glases]]
The German-language Wikipedia prefers this though: [[Glas]]es


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=51438
https://bugzilla.wikimedia.org/show_bug.cgi?id=49940
https://bugzilla.wikimedia.org/show_bug.cgi?id=52240

Details

Reference
bz48463

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:35 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz48463.

Parsoid issue, moving to Parsoid.

Would like some input from James as to what he believes the behavior should be (piped links always? linktrails where possible? configurable either way?).

(In reply to comment #1)

Parsoid issue, moving to Parsoid.

Would like some input from James as to what he believes the behavior should
be (piped links always? linktrails where possible? configurable either way?).

Canonically, linktrails should be preferred where possible (so, whenever a case-sensitive substring; except when on a $wgCapitalLinks=true, where it's a tad more complicated). I believe that linkheads should also be preferred where possible for the languages for which that applies.

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

See also Change-Id: I78c8fc87be1777f63c84f8945ead1cc54161fb76

Arlolra set Security to None.
ssastry raised the priority of this task from Low to Medium.Jul 27 2016, 3:52 PM

Jdforrester-WMF, thanks for merging T153107. Could you elaborate how these two are related? How is it possible that the parser does not detect two identical strings as identical?

Jdforrester-WMF, thanks for merging T153107. Could you elaborate how these two are related?

They are both issues where Parsoid is apparently creating wikitext sub-optimally.

How is it possible that the parser does not detect two identical strings as identical?

Your question assumes that it is looking for that. I don't know whether it is.

How is it possible that the parser does not detect two identical strings as identical?

Your question assumes that it is looking for that. I don't know whether it is.

I just looked through what I believe to be the relevant parts of the source code, and it indeed seems that this is not how Parsoid handles this. I haven't looked at the VE implementation though. Is there no general cleanup at the end of a VE editing session?

How is it possible that the parser does not detect two identical strings as identical?

Your question assumes that it is looking for that. I don't know whether it is.

I just looked through what I believe to be the relevant parts of the source code, and it indeed seems that this is not how Parsoid handles this. I haven't looked at the VE implementation though. Is there no general cleanup at the end of a VE editing session?

VE is an HTML editor. What kind of clean-up of HTML do you expect VE to do that would affect this?

VE is an HTML editor. What kind of clean-up of HTML do you expect VE to do that would affect this?

Not HTML, but apparently editing pages with the VE creates links like

[[Target|Target]]

in the Wikitext. I would assume that before Wikitext is saved (or delivered to the user), this is automatically cleaned to

[[Target]]

VE is an HTML editor. What kind of clean-up of HTML do you expect VE to do that would affect this?

Not HTML, but apparently editing pages with the VE creates links like

[[Target|Target]]

in the Wikitext. I would assume that before Wikitext is saved (or delivered to the user), this is automatically cleaned to

[[Target]]

Parsoid currently has code to generate simple links where necessary, but, clearly there was a bug that affected the diff you highlighted in T153107. But, in general, Parsoid does generate [[Foo]] instead of [[Foo|Foo]].

Thanks for the clarification. This specific case of this bug understandably decreases acceptance of the VisualEditor on de-wp, whereas the general problem regarding trailing characters is generally seen as acceptable. I asked several complaining users to provide more diffs of this specific case which I will add here.

Thanks for the clarification. This specific case of this bug understandably decreases acceptance of the VisualEditor on de-wp, whereas the general problem regarding trailing characters is generally seen as acceptable. I asked several complaining users to provide more diffs of this specific case which I will add here.

Let us unmerge T153107, retitle it, and add diffs there. [[Foo|Foo]] shouldn't be happening and is either a regression or a corner case in the HTML or something else going on that we have some missed all this while.

Since there was another complaint on the de-wp VE feedback page, I'm reviving this bug report. Here the previously correctly formatted link

[[Baudenkmal]]e

was converted to

[[Baudenkmal|Baudenkmale]]

Is there any hope that this problem will be fixed soon? If that's not possible, a brief explanation of why it is difficult (which we can relay to our community when this question comes up) would be helpful.

There are several similar reports.

One thing is shortening the overall link to make it easier to spot and read in wikitext, it is less cluttered, and that is a good reason to enforce use of linktrail.

Use of linktrail can also be troublesome. Assume a link using one inflected word, where the inflection goes in the linktrail. Then the links are changed to another page title, but that page title has another inflection. To detect this and change it accordingly with the current bot frameworks are pretty difficult.

In English plural of book is books, but in Norwegian this is bokand bøker.

Note that use of linktrail is an indication of a deeper problem. It should be possible for the writer to inflect the links themselves. That is instead of [[book]]s they should be able to write [[books]].

The page titles are usually of a normalized form, and rewriting a link into this normalized form is pretty strightforward. In some cases there are alternate normalized forms, but as long as all of them identify the same page there should be no real problem. Only if different inflections targets different pages would there be a problem.

@Esanders I think the idea (Visual Editor should fix automatically [[text|text]]) could be a goal and it would have an own task. This task is related, but it is not exactly what I suggested.

@Esanders I think the idea (Visual Editor should fix automatically [[text|text]]) could be a goal and it would have an own task. This task is related, but it is not exactly what I suggested.

The problem with [[text|text]] was resolved in T153107: Parsoid is generating [[Foo|Foo]] instead of [[Foo]] for some VE edits. If it has re-appeared, that task should be re-opened. From what you wrote in [[T195215#4220491]], the problem is link trail detection/generation, which is what this task is about.

@Cirdan I mean it should fix the existing ones. So if somebody edit an article with [[text|text]], the software would fix that (like AWB does). It could slowly but surely solve the problem.

I'm not a VE developer, but from what I learned from reporting Parsoid bugs, the VE/Parsoid try very hard to not alter existing wikitext. This is where quite a few of the copy & paste bugs come from.

I believe syntax cleaning should be done by bots which are controlled by the local community, not through VE.

@Cirdan I mean it should fix the existing ones. So if somebody edit an article with [[text|text]], the software would fix that (like AWB does). It could slowly but surely solve the problem.

As @Cirdan says, Parsoid and the visual editor try very hard to avoid changing wikitext that already exists in the article. Different wikis have different conventions, and the most reliable way to respect those conventions is to avoid digging into bits of wikitext that don't relate directly to what the user has done.

The visual editor is not intended to be a general purpose wikitext cleanup tool. As you suggest, AWB is good for that.