Page MenuHomePhabricator

GeSHi uses a highly recursive regex for number highlighting
Closed, ResolvedPublic

Description

The page Module:Convertdata, i.e. data intended to be used by Module:Convert, recently crossed 200 kB. After doing so, it appears the Module page truncates it and only displays a small fraction of the content when view as a reader.

Compare:

Truncated revision is displayed:
http://en.wikipedia.org/w/index.php?title=Module:Convertdata&oldid=541459412

Much longer Source for that revision:
http://en.wikipedia.org/w/index.php?title=Module:Convertdata&action=edit&oldid=541459412

Prior revision, showing full module:
http://en.wikipedia.org/w/index.php?title=Module:Convertdata&direction=prev&oldid=541459412

Is this truncation necessary? If it is technically necessary for some reason, then I would suggest that there should at least be some message warning users about the truncation. If it is not necessary, then it should be fixed to show the whole page.

Also, I would note that the truncation is rather strange, since it isn't a straight truncation, rather a large block in the middle was removed leaving parts of the beginning and the end. Perhaps this is related to the syntax highlighting being unhappy with very large pages for some reason?


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36839

Details

Reference
bz45669

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:15 AM
bzimport set Reference to bz45669.
bzimport added a subscriber: Unknown Object (MLST).

This seems to be due to Gerrit change 49985, which also explains why it suddenly screwed up upon crossing 200K.

For number highlighting, GeSHi uses a regex that includes "(?!(?:<DOT>|(?>[^\<]))+>)". If there is too long of a run in the text without anything being highlighted other than numbers, this can easily exceed the pcre recursion limit (which is currently set very low on WMF wikis, see bug 36839 for a similar issue) and causes GeSHi to lose the entire chunk.

Possible fixes include changing that regex (defined on geshi/geshi.php line 2135) to "(?!(?:<DOT>|(?>[^\<]+))+>)" which is much less likely to hit the recursion limit or disabling number highlighting along with string highlighting.

  • Bug 46753 has been marked as a duplicate of this bug. ***
  • Bug 47026 has been marked as a duplicate of this bug. ***

Related URL: https://gerrit.wikimedia.org/r/58306 (Gerrit Change I27203c767d1d3f2f0999b1b1d8a06e8cf68c19ed)

Related URL: https://gerrit.wikimedia.org/r/58306 (Gerrit Change I27203c767d1d3f2f0999b1b1d8a06e8cf68c19ed)

  • Bug 39498 has been marked as a duplicate of this bug. ***
  • Bug 45953 has been marked as a duplicate of this bug. ***
  • Bug 29677 has been marked as a duplicate of this bug. ***

https://gerrit.wikimedia.org/r/58306 (Gerrit Change I27203c767d1d3f2f0999b1b1d8a06e8cf68c19ed) | change APPROVED and MERGED [by Tim Starling]

Change merged. Note the fix should be deployed on WMF wikis with 1.22wmf3; see https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap for the schedule.