Page MenuHomePhabricator

Bold/italic markup handled differently depending on leading whitespace
Open, LowPublicFeature

Description

The following markup gives different results from normal when inside table markup:

'''Look at ''this edit'''s more complicated bold/italic markup!'''

In normal text, you get:

'<i>Look at</i> this edit<b>s more complicated bold/italic markup!</b>

Within a table, you get:

<b>Look at <i>this edit'</i>s more complicated bold/italic markup!</b>

To me, the latter is the intended output, and from prior knowledge of the parser, what I would expect. However the important point is that they are currently rendered differently, when they should not be!


Version: 1.15.x
Severity: enhancement
URL: http://test.wikipedia.org/wiki/Bug_18765

Details

Reference
bz18765

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:37 PM
bzimport set Reference to bz18765.
bzimport added a subscriber: Unknown Object (MLST).

ssanbeg wrote:

bold/italic has a fairly complex heuristic to determine how they match. It's not dependent on tables, but it is (apparently) sensitive to whitespace. Your test-case shows the difference one extra space in the line can make. I don't think the table is relevant, other than how it causes the whitespace to render.

Whitespace may affect things in order to ensure proper handling of the 's and l' sort of cases... but start-of-line and whitespace probably should look the same there.

Needs to be checked against the other test cases...

Interestingly, I thought the parser used to format this kind of example in the manner described for when there is white-space at the the start, rather than the example without, however it now seems to use the non-white-space formatting as standard, with the white-space version only appearing in the described edge case. This is what I was eluding to in the last para of my original post.

Is there a possibility that this behaviour has changed in a parser update (which could have some serious implications), or is my memory just faulty?

It has worked this way since MediaWiki 1.3

MediaWiki 1.2 produce the same html for both cases, but in a third way:
<strong>Look at <em>this edit</em></strong><em>s complicated bold/italic markup!</em>

OK - just a bit of faulty wiring then... damn this broken brain of mine! :-)

Parser change

MediaWiki handles unbalanced quotes by looking at the different words length and doing a guess.

The test case showed several issues:
-MediaWiki treated the beginning of line as a multiletter word.
-Markup as <span> or | are treated as "words".

There's also the parser assumption that words are separated by spaces, which is not true for all languages.

The patch fixes just the first issue (plus parsertest and releasenotes).

Many usages now work, but
<span>'''Look at ''this edit'''s complicated bold/italic markup!'''</span>
and
{|

'''Look at ''this edit'''s complicated bold/italic markup!'''
}

Still fail, since it thinks <span> and | is text instead of markup. I don't think it's worth trying to instruct it that.

The behavior of parsertest "Mixing markup for italics and bold" changed, since it began the line with bold quotes.
I modified the rule "If there are more than 5 apostrophes in a row, assume they're all text except for the last 5." rule to make the 6 apostrophes produce the original <b>bold</b><b>bold<i>bolditalics</i></b>. It still spits single quotes to match open italic and bold but general behavior seems closer to what a human would expect. See the new 'Six quotes' parsertest for all the cases.

attachment bug_18765.patch ignored as obsolete

Parser change

Fix the heuristic for the case with six quotes.
Added another parsetest for that.

Attached:

Full grabbing in regex

Accumulative patch to move the quote grabbing logic from php code to the regex.
It doesn't change the parser behavior, just the implementation.

The regex is faster than the php code, but the most fastened path is an uncommon one, and the regex is more complex. Needs benchmarking.

Attached:

(In reply to comment #8)

Fix the heuristic for the case with six quotes.
Added another parsetest for that.

Committed in r61052

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM
Aklapper removed a subscriber: wikibugs-l-list.