Page MenuHomePhabricator

Preprocessor/Parser irregularities with -{...}- variant constructs.
Open, MediumPublic

Description

This is a parent task for various subtasks having to do with irregularities parsing -{...}- constructs, especially if they contain embedded vertical bar characters. See the subtasks for specific bugs having to do with different places in the parser and preprocessor where irregularities have been found.

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:01 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz52661.
bzimport added a subscriber: Unknown Object (MLST).

From IRC:

07:36:15 PM) TimStarling: but it wouldn't be too hard to make the preprocessor annotate it, the same way it does with links
(07:36:33 PM) TimStarling: you know the preprocessor is responsible for expanding templates
(07:36:55 PM) TimStarling: but it marks up links for the sole purpose of getting correct template DOM
(07:37:28 PM) TimStarling: e.g. for parameter splitting in {{ a [[b|c]] }}
(07:37:45 PM) TimStarling: it would probably be beneficial for -{}- to be handled in the same way
(07:38:15 PM) TimStarling: then {{ a -{b|c}- }} would work in the intuitive way
(07:38:10 PM) cscott-free: yes. i think i'm going to add [[File:foobar.jpg|-{R|rawcaption}-]] as a parser test and open a bugzilla for that. for the future.

Change 78330 had a related patch set uploaded by Cscott:
Add parserTests for language converter markup.

https://gerrit.wikimedia.org/r/78330

btw There's another wikitext snippet that isn't handled well currently:

;-{zh-cn:AAA;zh-tw:BBB}-

Is this resolvable with the preprocessor change?

(In reply to comment #3)

;-{zh-cn:AAA;zh-tw:BBB}-
Is this resolvable with the preprocessor change?

Yes, I believe this has the same root cause.

(In reply to comment #4)

(In reply to comment #3)

;-{zh-cn:AAA;zh-tw:BBB}-
Is this resolvable with the preprocessor change?

Yes, I believe this has the same root cause.

Lists are not handled by the preprocessor. The issue here is that the list handler (doBlockLevels) is not aware of -{ }- either and (wrongly) recognizes the embedded colon as a single-line dt/dd pair.

Right. But if the preprocessor lifts out the -{...}- constructs, then doBlockLevels won't get confused. So yes, same root cause.

If you reintroduce language conversion blocks only after doBlockLevels is done, then you'll need to find a different way to parse the contents of those blocks independently of the main content.

One more thing being broken:

{|

-
-{RB}-
}

Also:
-{zh-cn:[[Category:A]];zh-tw:[[Category:B]];}-

This shouldn't be in both A and B (should it?). We don't want the category to depend on the variant. So maybe it *should* be in both?

I think it should be in neither. (gwicke agrees.)

[[Category:foo]] would add it to the 'foo' category. in a variant where foo=>bar, it might appear like [[Category:foo|bar]], and be edited that way by VE, but that wouldn't change the category of the page. Category links inside -{...}- would be forbidden (that is, parsed as plain text).

Change 78330 merged by jenkins-bot:
Add parserTests for language converter markup.

https://gerrit.wikimedia.org/r/78330

  • Bug 72875 has been marked as a duplicate of this bug. ***
  • Bug 72010 has been marked as a duplicate of this bug. ***

Change 311849 had a related patch set uploaded (by C. Scott Ananian):
WIP: protect language converter markup in the preprocessor.

https://gerrit.wikimedia.org/r/311849

cscott renamed this task from Preprocessor should handle -{...}- variant constructs. to Preprocessor/Parser irregularities with -{...}- variant constructs..Sep 21 2016, 6:17 PM
cscott updated the task description. (Show Details)

Change 312066 had a related patch set uploaded (by C. Scott Ananian):
Other language converter bugs (test case tweaks)

https://gerrit.wikimedia.org/r/312066

There are also irregularities in how lists and tables with language converter markup are handled; see https://gerrit.wikimedia.org/r/312066

Change 312066 abandoned by C. Scott Ananian:
Other language converter bugs (test case tweaks)

Reason:
Squashed into https://gerrit.wikimedia.org/r/327127

https://gerrit.wikimedia.org/r/312066

I added T153761 as blocker since it would be nice to have a test for that case (I see things are moving: T146305#2891350).

There's an issue with autolink URLs, like:

-{en-us:http://elevator.com;en-gb:http://lift.net}-

because the autolink regexp won't stop at the semicolon and thus will grab the en-gb and break the language converter nesting. See T166429: Getting a unclean output with {{#property:P856}} on site which enables Language Converter.

Not entirely sure this is fixable, it seems to be a genuine priority mismatch between autolink and language converter constructs.