Page MenuHomePhabricator

Improve html2html newline normalization
Closed, ResolvedPublic

Description

We currently have a regexp-based newline serialization hack which converts non-IEW ws to a single space. The regexp does not work well any more with the XML serializer (as > is no longer entity-escaped). We should normalize this on the DOM instead, where IEW vs. non-IEW info is readily available.


Version: unspecified
Severity: normal

Details

Reference
bz55588

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:25 AM
bzimport added a project: Parsoid-Tests.
bzimport set Reference to bz55588.

See also bug 63195, which is a more general issue that may also include IEW normalization.

The newline normalization was turned off in commit fc153752d4a8cf3c865f02d0303c8bd1529b3162

Change 155639 had a related patch set uploaded by Cscott:
Don't strip newlines within text content.

https://gerrit.wikimedia.org/r/155639

Change 155639 merged by Cscott:
Improve whitespace normalization for parser tests.

https://gerrit.wikimedia.org/r/155639