Page MenuHomePhabricator

let empty metadata-elements pass through tidy
Closed, ResolvedPublic

Description

Author: a.d.bergi

Description:
The English and the German Wikipedia use the COinS microformat (see WP article) to embed citation metadata. It specifies a span with class="Z3988" and a title-parameter for the data. Usually the span is empty, but that does not work with MediaWiki: Tidy removes them (when they have no id). There are some hacks, but all of them include a element displayed on the screen, which shows its (ugly) title when hovering about it.
I'm not sure how far a solution for this problem can go, but I propose tidy to let pass any empty element which has at least one attribute. If that is not desired, you may implement an exception matching exactly to this class attribute.


Version: unspecified
Severity: normal
URL: http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Vorlagen/Werkstatt#Vorlage:Literatur

Details

Reference
bz27786

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:28 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz27786.
bzimport added a subscriber: Unknown Object (MLST).

Actual result:

“<span></span>” → “”
“<span id="…"></span>” → “<span id="…"></span>”
“<span class="…"></span>” → “”
“<span style="…"></span>” → “”
“<span title="…"></span>” → “”

Expected result:

“<span></span>” → “<span></span>” or “”
“<span id="…"></span>” → “<span id="…"></span>”
“<span class="…"></span>” → “<span class="…"></span>”
“<span style="…"></span>” → “<span style="…"></span>”
“<span title="…"></span>” → “<span title="…"></span>”

a.d.bergi wrote:

(In reply to comment #2)

MediaWiki strips empty SPAN attributes, but not empty DIV.

Sure, but this bug is about empty spans.
The mentioned http://en.wikipedia.org/wiki/COinS metadata format even /specifies/ the use of <span> tags...

Btw, <div style="display:inline;"> is a horrible workaround for <span>.

<div class="…" style="display:inline;" />

is not a workaround for

<span class="…" />

because

A<div class="…" style="display:inline;" />B

generates

A
B

Okay okay okay. Before anyone else comes to put any other example I just want to clarify that my comment was to:

A) Notice the inconsistency between stripping empty SPAN vs DIV. I would expect none or both to be stripped.

B) DIVs can be used WHERE APPROPIATE as a replacement (for example, to insert an empty element with an ID or CLASS for presentational changes or script manipulation).

And about the last example,

<div>A<div class="…" style="display:inline;" />B</div>

generates

A B

Notice it's all surrounded by an additional DIV.</rant>

(In reply to comment #5)

<div>A<div class="…" style="display:inline;" />B</div>

generates

A B

Yes, but expected result is

AB

without a whitespace.

Copy from T134423#2266304:

Re the stripping - Tidy actualy does not strip every self-closing tag and OTOH it strips some empty pair tags.

<span/> - stripped
<span class="foo" /> - stripped
<span></span> - stripped
<span class="bar"></span> - stripped
<div/> - stripped
<div class="foo" /> - NOT stripped
<div></div> - stripped
<div class="bar"></div> - NOT stripped

So, Remex is no longer stripping in the way Tidy stripped before (notably, empty elements div and span #3, are not removed). That's Good.

On an aside, I was under the impression that Remex would actually strip the self-closed tags which cannot be self-closed under HTML (that's both div and span), which is why we have the Linter category. The output I'm getting instead is:

<span></span>
<span class="foo"></span>
<span></span>
<span class="bar"></span>
<div></div>
<div class="foo"></div>
<div></div>
<div class="bar"></div>

Spans and divs 1 and 2 (versus 3 and 4 for each) are clearly not being stripped and I'm pretty sure they are supposed to be (either that or passed through as self-closing elements). @ssastry do you know why? 😃

So, Remex is no longer stripping in the way Tidy stripped before (notably, empty elements div and span #3, are not removed). That's Good.

On an aside, I was under the impression that Remex would actually strip the self-closed tags which cannot be self-closed under HTML (that's both div and span), which is why we have the Linter category. The output I'm getting instead is:

<span></span>
<span class="foo"></span>
<span></span>
<span class="bar"></span>
<div></div>
<div class="foo"></div>
<div></div>
<div class="bar"></div>

Spans and divs 1 and 2 (versus 3 and 4 for each) are clearly not being stripped and I'm pretty sure they are supposed to be (either that or passed through as self-closing elements). @ssastry do you know why? 😃

We decided to provide a backward compatibility fix for self-closing tags to reduce some of the workload of editors. But, the plan is indeed to remove that fix once the use of self-closed tags has come down as editors fix the linter issues for that category. The b/c fix is to convert a <tag /> to <tag></tag>.

Izno claimed this task.

Okay, I think this is resolved then.