Page MenuHomePhabricator

Whitelist global HTML5 semantic attributes and inline meta element
Closed, ResolvedPublic

Description

Could we get the global microdata ( http://www.w3.org/TR/html5/microdata.html ) attributes, @itemscope, @itemid, @itemtype, @itemprop, and @itemref whitelisted, as well as allow meta tags ( http://www.w3.org/TR/html5/semantics.html#meta ), now allowable in the body of a document with HTML5, with @name and @content attributes (in addition to the global ones just mentioned available on all elements)?

The HTML5 spec even specifies a (Mediawiki) wiki for making official extensions (at http://wiki.whatwg.org/wiki/MetaExtensions ), so extensions could become "standard extensions" in convenient wiki fashion (while allowing @itemtype to indicate namespaced extensions), in cases where the Wikimedia community wished to reuse a particular meta property.

I should point out too that microdata is not only something which those using custom client-side jQuery or XQuery or one-off server-side parsers can take advantage of--it is already implemented by prominent crawlers such as Google: http://www.google.com/webmasters/tools/richsnippets (see also http://googlewebmastercentral.blogspot.com/2010/03/microdata-support-for-rich-snippets.html ).


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63099

Details

Reference
bz28776

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:31 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz28776.
bzimport added a subscriber: Unknown Object (MLST).

Also allowing <link/>'s in the body for expressing meta-data (as with in-body <meta/> mentioned earlier)

For the original issue (microdata), that's already available as an option ( set $wgAllowMicrodataAttributes = true; in LocalSettins.php). If i recall, the reason that its not enabled by default is that there is some concern that once its enabled, we'll never be able to un-enable it [since disabling it would than brake user content], so we want to make sure we really want to whitelist those elements before actually whitelisting. (Don't quote me on that though, could be wrong on reason, this is from half remembered mailing list threads).

As for <link> and <meta> in body - Well first off, whitelisting them totally is probably a bad idea - certain link elements can be dangerous (<link rel="stylesheet" ...> can be used to load js, <meta http-equiv="refresh" ...>, is also evil, etc). In order to get them [or the safe parts] whitelisted, it'd probably help to provide concrete use-cases where the tags would be useful (Just going from experience on other bugs where people wanted things whitelisted, concrete examples go a long way)

Hi,

@Bawolff: Thanks very much for the info on enabling microdata attributes.

Although HTML5 may technically still be an in progress specification, Microdata has been endorsed by Google, Microsoft, and Yahoo, though http://schema.org . My feeling is that the time has come for higher-level semantics to be made available, especially for sites like Wikisource which could particularly benefit from allowing richly semantic markup, such as we are now discussing in preparing an HTML5 Microdata serialization for TEI (Text Encoding Initiative).

The Microdata specification (at http://www.w3.org/TR/html5/microdata.html ) demonstrates <link/> being used with @itemprop and @href and http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-link-element explains that a <link/> will not be treated as a link without @rel, but the element can exist without @rel using just @itemprop. So, perhaps disallowing @rel (and probably @type) would then be sufficient to limit this tag to behave as a purely semantic information.

Likewise, for in-body meta tags, one only needs to whitelist @itemprop and @content (and ideally harmless global attributes like @id, @title, and @lang).

The following is a sample of a proposed serialization approach for TEI (a language used in the academic world for marking up classical literature, and which I think ought to be allowable on the likes of Wikisource). The following is adapted from code within the first example at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-note.html , preserving all of the semantics for round-tripping.

<aside itemprop="note">

<meta itemprop="place" content="bottom"/>
<meta itemprop="type" content="gloss"/>
<link itemprop="resp" href="#MDMH"/>
<dfn xml:lang="de" lang="de" itemprop="term">Malerisch</dfn>. This word has, in the German, two

distinct meanings, one objective, a quality residing in the object,
the other subjective, a mode of apprehension and creation. To avoid
confusion, they have been distinguished in English as
<span itemprop="mentioned">picturesque</span> and
<span itemprop="mentioned">painterly</span> respectively.
</aside>

<div itemprop="respStmt" id="MDMH" style="display:none;">
<div itemprop="resp">translation from German to English</div>
<div itemprop="name">Hottinger, Marie Donald Mackie</div>
</div>

Note that <meta/> is here used to reflect attributes from TEI (e.g., to indicate that this note is a gloss), while <link/> is used to reference additional hidden meta-data (in this case, information about who is responsible for the note). <link/> might also be used for the likes of TEI's <ptr/> elment which can indicate a relationship to one or more targets from and to anywhere (including optionally using XPointer to indicate the semantic relationship) but which need not be visible. This is one way to allow for example, what TEI calls stand-off markup: the ability to reference a text (e.g., a famous authoritative work) from another text (e.g., a commentary).

You may be interested in this mailing list post from earlier this month: http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053720.html

I imagine the scope of this bug has shifted to be enable $wgAllowMicrodataAttributes by default.

Anyways, cc'ing Aryeh Gregor on this bug since he knows all about this stuff.

ayg wrote:

(In reply to comment #0)

Could we get the global microdata ( http://www.w3.org/TR/html5/microdata.html )
attributes, @itemscope, @itemid, @itemtype, @itemprop, and @itemref
whitelisted

This just requires turning $wgAllowMicrodataAttributes on. Note that $wgHtml5 must also be on for it to do anything.

as well as allow meta tags (
http://www.w3.org/TR/html5/semantics.html#meta ), now allowable in the body of
a document with HTML5, with @name and @content attributes (in addition to the
global ones just mentioned available on all elements)?

<link> and <meta> can only be used in the body if itemprop is specified. In those cases, we should whitelist them (assuming microdata is enabled) but currently don't. It should be pretty easy to add support to Sanitizer.php. I could probably do this if microdata is actually going to be enabled by default, especially if it's going to be enabled on Wikimedia wikis. As long as it happens soon -- I'll be unavailable starting a few months from now.

The HTML5 spec even specifies a (Mediawiki) wiki for making official extensions
(at http://wiki.whatwg.org/wiki/MetaExtensions ), so extensions could become
"standard extensions" in convenient wiki fashion (while allowing @itemtype to
indicate namespaced extensions), in cases where the Wikimedia community wished
to reuse a particular meta property.

That's only for <meta name="">, which is only allowed in the head, so it's not relevant to us. The spec mandates no central repository for microdata vocabularies.

I should point out too that microdata is not only something which those using
custom client-side jQuery or XQuery or one-off server-side parsers can take
advantage of--it is already implemented by prominent crawlers such as Google:
http://www.google.com/webmasters/tools/richsnippets (see also
http://googlewebmastercentral.blogspot.com/2010/03/microdata-support-for-rich-snippets.html
).

And Bing and Yahoo!, yes. I think it should be enabled by default, but I'm not going to do it unless I get the okay from someone in charge, since previously there was disagreement about it.

We have an option to enable all the microdata markup. <meta> and <link> are supported.

Is there anything left missing for this bug? It looks like it's already FIXED.

If attributes, and meta/link are now supported, it sounds good by me... Thanks!

(In reply to comment #7)

We have an option to enable all the microdata markup. <meta> and <link> are
supported.

Is there anything left missing for this bug? It looks like it's already
FIXED.

Is there any reason not to have them on by default? I imagine such properties could be useful for people making infoboxes or license templates.