Page MenuHomePhabricator

API parser does not use HTMLTidy on non-cached pages
Closed, ResolvedPublic

Description

Author: rockmfr

Description:
When using the API parser (see URL above for example), HTMLTidy is not used on non-cached pages, thus returning different results depending on whether or not the page happens to be in the cache at the time. Also, the limit report is not displayed for non-cached pages, though I don't particularly care about this.

From what I can see, it seems like it would be a simple fix in ApiParse.php.

Change...

$p_result = $wgParser->parse($articleObj->getContent(), $titleObj, new ParserOptions());

to...

$popts = new ParserOptions();
$popts->setTidy(true);
$popts->enableLimitReport();
$p_result = $wgParser->parse($articleObj->getContent(), $titleObj, $popts);


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/w/api.php?action=parse&format=xml&page=Wikipedia:Selected_anniversaries/June_6

Details

Reference
bz14471

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:12 PM
bzimport set Reference to bz14471.

rockmfr wrote:

use HTMLTidy and enable limit report

This enables HTMLTidy and the limit report in the following cases (all cases):

  1. Parsing an old revision
  2. Parsing the current revision
  3. Parsing some arbitrary text

The second case is the one which currently gives inconsistent results. The other two cases don't currently use Tidy at all, but I can't imagine any anyone wouldn't want to use it. Perhaps add another parameter to make it optional? And another parameter to make the limit report optional?

Attached:

Since it's using the same sequence of several calls in three distinct places, it might be wise to encapsulate that into a function and call it consistently. This'll make it easier to maintain in the future, if something changes.

(In reply to comment #2)

Since it's using the same sequence of several calls in three distinct places,
it might be wise to encapsulate that into a function and call it consistently.
This'll make it easier to maintain in the future, if something changes.

The real solution here is to create the ParserOptions object before differentiating between the three cases mentioned in comment #1. I'll do that tomorrow.