Page MenuHomePhabricator

post expand size counted multiple times for nested transclusions
Closed, DeclinedPublic

Description

Author: cbm.wikipedia

Description:
This is an issue with the way that the new preprocessor computes template limits. Suppose that page A transcludes B and B does nothing but transclude C. The size of C will be counted twice towards the post-expand counter on page A. This causes pages that have a setup similar to the one described to run into template limits much sooner than expected.


Version: 1.11.x
Severity: major

Details

Reference
bz13260

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:01 PM
bzimport set Reference to bz13260.

cbm.wikipedia wrote:

Affects parser functions as well: {{#ifexpr: 1 > 0 | {{Foo}} }} will add twice the size of Foo to the post-expand counter.

Increased severity; this drastically affects labeled section transclusion on Meta, where it is used for language-specific subpages of large multilingual pages. In that case, a simple template that transcludes a localization (with English fallback) multiplies the size of each localization by 4 times, or 8 times if part of a meta-template like {{language subpage|pt}}. This glitch makes it virtually impossible to cleanly subdivide large pages, which are the ones most in need of subdivision for usability, even if the resulting page has relatively little output.

(multiplied 4 times: {{#if:{{#lst:page|section}}|{{#lst:page|section}}|<fallback>}}, counted twice because it's in a template; this in turn counted twice if part of a meta-template.)

cbm.wikipedia wrote:

Confirmed still an issue.

Steps to verify:

  • Create a big page
  • Transclude it to page B, check post-expand size of B
  • Transclude only B (one time) to page C, check post-expand size of C
  • Post-expand size of C is twice post-expand size of B, should be the same as page B

Geometryguy wrote:

This continues to be a severe inconvenience. Templates have to be written with opaque code in order to minimize nested transclusions, and some dedicated maintainers of dynamic pages (e.g. content review) have to be constantly vigilant about post-expand sizes to avoid breaking pages which would be nowhere near the limit were it not for this bug.

cbm.wikipedia wrote:

This is still an issue with the workflow in enwiki, particularly, for the content review (Featured Article, Peer Review) pages where they transclude conversations from subpages onto a "master" page.

New parser and PHP improvements (HipHop?) are slated that may alleviate the problems. This is not something we are going to attack in the current parser.

(In reply to comment #8)

New parser and PHP improvements (HipHop?) are slated that may alleviate the
problems. This is not something we are going to attack in the current parser.

HipHop is going to help the fact that the hard coded limit programmed into the parser is calculated incorrectly? (OTOH, that limit is pretty huge. It scares me to think that people are reaching it, double counting notwithstanding).

HipHop would hopefully avoid some problems, but the error in calculation is something that would be fixed in the new parser.

Post expand size is build to hold the size of all expansions done by the parser to reach the html (including sub expansions like templates and parser functions). So this works as aspected.

Some text:

[[A]] contains "Big Text" (wikitext length: 8)

[[B]] contains "{{:A}}" (wikitext length 6)

[[C]] contains "{{:B}}" (wikitext length 6)

The parser is starting at C, expanding B to "{{:A}}" which must be expand to "Big Text". This sub expansion adds it expand size (8) to post-expand include size. After this the parser returned the expand text and the expand process of B will add its expand length (also 8) to post-expand include size, which result in the end of a post-expand include size of 16.

This way is needed to handle the following scenario:

[[A]] contains "Big Text" (wikitext length: 8)

[[B]] contains "{{#if:{{:A}}|A|B}}" (wikitext length 18)

This gives a size of 9 (8 from the sub expand of A and 1 of the expand of B). Without the adding of each sub expand the post expand size in this scenario would be 1, which makes the limit useless, because the limit is build to avoid to big expansions in the process of parsing a page.

There is no error in calculation of the post-expand size, it only contains also the size of each sub expansion.

My position on this is:

  • Size multiplied by depth is a defensible cost metric since there will be a factor in the parse time equation which is proportional to it. PHP needs to copy the data at each level, when it concatenates the outputs from sibling subtrees.
  • I'm not keen on lifting traditional parser limits such as post-expand include size, since judging by the parse time of existing large articles, the limits were too high to begin with. Lowering the limits would break existing articles, but refusing to raise them (by a factor of expansion depth in this case) is feasible and will help to limit CPU time.
  • The limit impacts most strongly on the use of deeply nested metatemplates, and that's a design pattern I'd like to discourage anyway, especially given that Lua will soon be introduced.

After Lua is introduced globally and the more complex templates have been migrated to it, then I think it would be reasonable to consider a severe reduction in parse limits, aimed at a reduction in maximum parse time to 10 seconds or so. In the context of such a project, redefinition or removal of the post expand include size would probably make sense. But by then, we might be switching to Parsoid anyway. So I'm resolving this as "later" for reconsideration at that time.

mr.heat wrote:

Maybe it's a good idea to change the name to something that fits the current calculation method?

cbm.wikipedia wrote:

@Tim Starling (comment 12): Thanks for the update. I want to point out something this affects other than deeply nested metatemplates. More importantly it affects "split" discussion pages, for example when they are divided by day. For example if there are 10 discussion pages transcluded on "A (2012-11-5)" and 10 more transcluded on "A (2012-11-6)" and then page B transcludes both both those "A" pages. In this case the nesting is trivial - just depth 2 - but the cost on page B is double what it should be. On enwiki this affects e.g. [[Wikipedia:Peer review]], where there is again a shallow nesting of large-ish discussion pages. If there is a work around for this particular use case it would be very helpful.

[Using keyword instead of tracking bug for HipHop issues as requested in bug 40926 comment 5. Filter bugmail on this message.]

A current use case that this bug bungles magnificently: template X is just a user-friendly wrapper around a module #invoke; it is used many times on subpages (used to create indices) on wikisource, that are transcluded from the global book page. Having the output of all of those module calls being counted repeatedly breaks tables of contents for even mid-sized books even though the /actual/ size of the table is a full order of magnitude below the limit and the parse time is subsecond.