Page MenuHomePhabricator

jsdiff uses excessive time / memory on pages with a lot of lines
Closed, ResolvedPublic

Description

jsdiff runs out of memory when diffing pages with a lot of wikitext lines. Test case:

node roundtrip-test.js --prefix zhwiki '2011年香港區議會選舉結果'

After a cursory inspection it looks as if the algorithm will use memory bounded by O(l^2) with l being the number of lines in the page.

We could either try to optimize (afaik there is no bug) jsdiff, or we could consider switching to another diff library such as https://github.com/paulgb/simplediff.


Version: unspecified
Severity: normal

Details

Reference
bz59840

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:30 AM
bzimport added a project: Parsoid-Tests.
bzimport set Reference to bz59840.

This is split out of bug 58952, which has some more example pages.

bebirchall wrote:

Resolved by patch "Use simplediff to diff rt-server test results" https://gerrit.wikimedia.org/r/#/c/115985/

(In reply to bebirchall from comment #2)

Resolved by patch "Use simplediff to diff rt-server test results"
https://gerrit.wikimedia.org/r/#/c/115985/

This resolves the issue in roundtrip-test.js.

The various round-trip end points in the web service however still suffer from this problem, so it would be great to move to simplediff there too. This would also give us consistent diff behavior between rt testing and the web service. Some differences are discussed in https://gist.github.com/bebebebebe/9213456.

It would be great to move all diff-related functionality into a separate diff module rather than dropping it all into Util. This makes it easy to reuse the diffing functionality separately.

bebirchall wrote:

Patch to resolve issue in web service: "Use simplediff in web service and move diff functions into diff module": https://gerrit.wikimedia.org/r/#/c/118023/

Change 391707 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] T61840: Remove top level dependence on jsDiff

https://gerrit.wikimedia.org/r/391707

Change 391707 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T61840: Remove top level dependence on jsDiff

https://gerrit.wikimedia.org/r/391707