mobile view
On https://ja.wikipedia.org/wiki/%E6%B0%B4%E4%B8%AD%E3%80%81%E3%81%9D%E3%82%8C%E3%81%AF%E8%8B%A6%E3%81%97%E3%81%84 there's a <br clear="both" /> in the article.
That br tag shows up in the extract as well, causing some issues for downstream users (example: https://musicbrainz.org/artist/6fb627d9-983e-43c5-bf73-efcf8e81926b).
There's also extra whitespace in the mobile view (see attachment) which I think is using related code?.
MusicBrainz bug report is http://tickets.musicbrainz.org/browse/MBS-7948
Version: unspecified
Severity: normal
Attached:
Replication steps
On http://en.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=query&prop=extracts&format=json&exchars=100000000&titles=Test%20br%20tags%20in%20extracts
<br> tag shows which is expected since html is requested but this leads a random empty space.
With explaintext flag set it doesn't show:
http://en.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=query&prop=extracts&format=json&exchars=100000000&explaintext=&titles=Test%20br%20tags%20in%20extracts
We would like to rethink this behaviour.
AC
- All br tags are removed from the HTML extract of a page.
- All br tags are replaced by a space character in the plaintext extract of a page.
- The decision should be documented in the codebase.
- Link to the archived mediawiki-api thread: https://lists.wikimedia.org/pipermail/mediawiki-api/2017-June/004001.html