Page MenuHomePhabricator

<p> tags are inserted between transcluded pages, if they contain images
Open, LowPublic

Description

Author: beau

Description:
Screenshot of a rendered page

Let's assume:

  • Page 1 contains only text (no paragraphs)
  • Page 2 contains only text (no paragraphs)
  • Page 3 contains only text (no paragraphs), in the middle of text there was inserted image.

When transcluding those pages using <pages> tag, MediaWiki inserts between pages <p> tag, which incorretly divides text.

In the URL field there is address to sample page on pl.wikisource, which demonstrates the issue. I have also attached screenshot.

I don't know what causes parser to change the way text is rendered, when there is an image in the text.

I think there should not be \n inserted between pages, which is related to bug #27637 closed as INVALID.

Now inserting any images makes overall quality of the document lower instead of higher.


Version: unspecified
Severity: normal
URL: https://pl.wikisource.org/wiki/Encyklopedia_staropolska/Handel

Attached:

paragraph.png (673×663 px, 239 KB)

Details

Reference
bz30262

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:55 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz30262.
bzimport added a subscriber: Unknown Object (MLST).

Similar testcase (http://pl.wikisource.org/wiki/Wikiskryba:Ankry/brudnopis0):

  • Line 1 contains only text
  • Line 2 contains text and the middle of text there are images inserted
  • Line 3 contains only text

There is no empty line between. However MediaWiki places EACH line between <p> and </p>. IMO it is in contradiction with wiki rules (where an empty line between text lines means a new patragraph).

This is a parser bug not a proofread extension bug as shown in the second example. I first thought it'll possible to workaround it in proofread extension by inserting a space instead of linefeed between page, but it breaks code where the last page end with a linefeed by protecting it from removal with an empty template. The generated code is in this case "\n<space>first line on the next page" : which mediawiki handle as a <nowiki><pre>first line</pre></nowiki>

beau wrote:

You can use space equivalent: &#32;

beau wrote:

Simple patch replacing \n with &#32;

attachment proofread.patch ignored as obsolete

Patch tested on my local wiki, it works with the {{nop}} template on en.ws which was broken by using a simple space instead of the proposed &amp;#32; Beside that, can someone ping a parser maintainer as comment 2 show it's a parser bug.

beau wrote:

The image thumb is created using <div> (block element), so it cannot be put inside <p> (inline element). The parser closes the opened paragraph, inserts <div> and then reopens paragraph.

john wrote:

Beau I applied your patch to ProofreadPage as a workaround however I think we're, all in agreement that this is a parser issue. I'll update the bug to reflect that and mark your patches as obsolete.

I didn't think enough about the side effect of this patch. This patch would be reverted from trunk, first it doesn't solve the problem described and actually it is thought as a noop but it is not. If you start a Page: with a LF you'll get an expected <p> from the parser, but when transclusing with the <pages command this LF will only terminate the last line of the previous page so we'll not get a <p> at the page boundary, this mean there is no clean way to get the same html by looking a Page: or after transclusing two pages, the second page starting with a LF. It's a bit odd but a LF between page transclusion is more neutral than any other character.

(In reply to comment #8)

My bad, first part of comment 8 is right, this patch doesn't fix anything, but the following rationale is wrong for reverting its wrong. Before the parser is called the generated code by the extension is <span>\n{{:MediaWiki:Proofreadpage_pagenum_template|page=Page:.......}}</span>{{:Page:....djvu/97}}
the span between page transclusion means a LF at start of a Page: can't be combined with the inserted LF by the extension to produce a <p>