Page MenuHomePhabricator

Use Preprocessor_Hash by default to avoid missing DOM module errors
Closed, ResolvedPublic

Description

Although PHP 5's DOM module is included by default when you compile PHP yourself, a surprising number of Linux and Unix-y distros like to remove it.

(Reported cases include FreeBSD ports and Fedora Core 6.)

This results in MediaWiki 1.12 and 1.13-svn failing due to DOMDocument class being missing, so Preprocessor_DOM gets you nowhere.

Assuming Preprocessor_Hash is functioning correctly (passes tests, and seems ok...), we should consider using it by default and dropping the DOM-based processor.


Version: unspecified
Severity: normal

Details

Reference
bz13770

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:08 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz13770.

ayg wrote:

(In reply to comment #0)

(Reported cases include FreeBSD ports and Fedora Core 6.)

And RHEL 5.

Created attachment 4879
Fallback to Preprocessor_DOM on install

I think it makes more sense to default to Preprocessor_DOM, then fall back to Preprocessor_Hash if DOMDocument doesn't exist. This patch will add a check to the LocalSettings.php generation to add $wgParserConf['preprocessorClass'] = 'Preprocessor_Hash' to the configuration if class_exists() returns false.

Attached:

ayg wrote:

This doesn't belong in the generated LocalSettings.php. That means it won't take effect for upgraded installs. Instead, it should gracefully fall back to the hash preprocessor when you try to load the DOM preprocessor, right before it would normally give a fatal error about DOMDocument not existing (or some suitable time period before that, anyway).

This is assuming there's a reason to keep the DOM-based preprocessor at all.

Setting the default appropriately in DefaultSettings.php (based on DOM module presence) should do the job fine.

Probably someone should do a quick benchmark to compare performance; if the hash version is about as fast or faster, then just using it by default would be easiest. :)

Damn the torpedoes!

Switched to hash by default in r34283 (trunk) and r34284 (1.12 branch for 1.12.1). Would be nice to do more testing of memory and CPU usage.

(In reply to comment #4)

Setting the default appropriately in DefaultSettings.php (based on DOM module
presence) should do the job fine.

Probably someone should do a quick benchmark to compare performance; if the
hash version is about as fast or faster, then just using it by default would be
easiest. :)

I considered that, but figured a 1-time check-and-set would be less expensive in the long run rather than doing the check each time Mediawiki is run.

  • Bug 14885 has been marked as a duplicate of this bug. ***

ayg wrote:

*** Bug 14884 has been marked as a duplicate of this bug. ***

  • Bug 14946 has been marked as a duplicate of this bug. ***

Bryan.TongMinh wrote:

It appears that if php-xml is not installed, another class with the same name but a different constructor http://www.php.net/manual/en/class.domdocument.php is present. The current fix checks for class_exists('DOMDocument') which will pass but fail on construction of the DOMDocument.

The detection mechanism should probably be a bit more sophisticated...

ayg wrote:

Any suggestions?

Should this block 1.13 release, since this completely breaks MediaWiki out of the box on default RHEL installations?

Bryan.TongMinh wrote:

I think extension_loaded('xml') should work, but I'm not exactly sure whether 'xml' is the correct string. I can not test this as there appears to be no way to disable php-xml on Windows.

It should probably be fixed in 1.13 branch before the release.

Somebody here who can test this? Replace

} elseif ( class_exists( 'DOMDocument' ) ) {

by

} elseif ( extension_loaded( 'xml' ) ) {

in Parser.php on line 136.

Bryan.TongMinh wrote:

*** Bug 14911 has been marked as a duplicate of this bug. ***

I'm pretty sure 'xml' is the SAX parser and utf8_encode/utf8_decode functions (which we rely on in lots of low-level code and check for in the installer).

I believe 'dom' is the current PHP 5 DOM module.

This is not to be confused with 'domxml' which existed in PHP 4 and is not compatible.

ayg wrote:

I can confirm that the package I needed to install to get this to work *seems* to give 'dom' loaded if it's installed, 'dom' not loaded if it's not installed. So that would seem to be correct.

Assigning to Tim to double-check the current status of this before 1.13 final release.

Fixed in 39158, backported for release in 1.13.0rc3

dimitar.tsonev wrote:

strange, the error still exist on windows install of 1.13.0

I can post the error and information about XAMPP version and included software, if needed.

ayg wrote:

I'm guessing phpinfo() would be most useful.

Got a report from some folks on IRC whose server reported true for both:

extension_loaded('dom')

*and*

extension_loaded('domxml')

It looks like the DOMXML extension is overriding the class, probably by loading second, so "wins", and we get the broken one that doesn't work.

(They say they installed XAMPP and it's PHP 5.2.6, so should be a recent package. Sounds like they're just blindly shipping both modules without realizing they conflict!)

We can either check to make sure they're not both loaded:
} elseif ( extension_loaded( 'dom' ) && !extension_loaded( 'domxml' ) ) {

or we can check in more detail which class we're going to get when we instantiate it, maybe with some reflection or some crap.

Bryan.TongMinh wrote:

Checking on instantiation sounds like an incredible difficult way to do it.

I would go for the ( extension_loaded( 'dom' ) && !extension_loaded( 'domxml' ) ) solution.

  • Bug 15326 has been marked as a duplicate of this bug. ***

Fixed in r40419 using the method in comment #21. Tested with a conflicting install of both dom and domxml. I had to hack the config.m4 in domxml, disabling conflict checks, to force it to create this unholy combination. Will backport to 1.13.1.