Page MenuHomePhabricator

Multiple comments on a single line are interpreted as a blank line
Closed, ResolvedPublic

Description

When writing a single comment on a line this line is correctly ignored.
When writing two comments on a single line this line is not ignored but interpreted as a blank line.

See this page for an example that illustrates the issue:
http://en.wikipedia.org/wiki/User:Patrick87/comments

It's not a big problem and there should be only few cases when one actually writes two separate comments on a single line, however formatting shouldn't change depending on if there are only one or two comments on the line.


Version: 1.21.x
Severity: minor

Details

Reference
bz41756

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:11 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz41756.

Another test case:

*a
<!-- x -->
*b
<!-- x --> <!-- y --> <!-- z -->
*c

The PHP parser treats 'a' and 'b' as part of the same list, but item 'c' is treated as a completely different list.

There are other examples of this sort in the parserTests. It's becoming a source of diffs between PHP and Parsoid.

Change 77988 had a related patch set uploaded by Cscott:
Preprocessor: Don't treat a line containing multiple comments as a blank line.

https://gerrit.wikimedia.org/r/77988

Change 78248 had a related patch set uploaded by Cscott:
Add '-m' option to dumpGrepper; add patterns for bug 41756.

https://gerrit.wikimedia.org/r/78248

Change 78248 merged by jenkins-bot:
Add '-m' option to dumpGrepper; add patterns for bug 41756.

https://gerrit.wikimedia.org/r/78248

subbu notes that parsoid accepts both tabs and spaces surrounding the comments. PHP accepts only spaces. Is it worth tweaking my patch to allow PHP to accept tabs as well? I don't think it will make any/much difference to content, but it would be nice to converge the parsers.

I've grepped through the 20130708 enwiki dump looking to see how many pages this change would affect. I found only 414 pages in the article namespace that are affected -- I put the full list at http://en.wikipedia.org/wiki/User:Cscott/bug41756

There are an additional 1,913 articles in the File: Wikimedia: or Portal: namespace which have lines with more than one space-separated comment. These appear to be mostly bot-generated and mostly harmless. I've put this list on the above page as well.

Change 77988 merged by jenkins-bot:
Preprocessor: Don't treat a line containing multiple comments as a blank line.

https://gerrit.wikimedia.org/r/77988

Verified fixed in beta and test.