Page MenuHomePhabricator

php error in LinkHolderArray
Closed, DeclinedPublic

Description

Author: gero.scholz

Description:
if an extension tries to clone the MW parser, MW wants to merge LinkHolderArrays.

This leads to a php error because the cloning command (something like "$myParser = clone $parser;") oviously does not call the constructor of the LinkHolderArray. The reason is probably that the LinkHolderArray in general is only constructed on demand.

The following bugfix works, but maybe a better solution can be found:

<pre>
/**

  • Merge another LinkHolderArray into this one
	 */

function merge( $other ) {

		// add this!!
		// protect against uninitialized instance (which can happen if parser is 'cloned')
		if (!isset($this->interwikis)) {
			$this->internals = array();
			$this->interwikis = array();
			$this->size = 0;
			$this->parent  = $other->parent;
		}
		// end !!
		
		foreach ( $other->internals as $ns => $entries ) {
			$this->size += count( $entries );
			if ( !isset( $this->internals[$ns] ) ) {
				$this->internals[$ns] = $entries;
			} else {
				$this->internals[$ns] += $entries;
			}
		}
		$this->interwikis += $other->interwikis;

}
</pre>


Version: 1.15.x
Severity: normal

Details

Reference
bz20036

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:48 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz20036.

that fix makes extensions work. i can confirm it for 1.16alpha. can't confirm the interwiki thingy.
bad line break here in //protect against... though.

I'm a little unclear on what's going on in this bit... Tim can you take a peek and confirm? Thanks!

The clone operation in PHP is a shallow copy, it copies member object handles instead of whole objects. So this:

$a = (object)array( 'm' => (object)array('n' => 1) );
$b = clone $a;
$a->m->n = 2;
print $b->m->n;

will print "2". It's basically equivalent to assigning each member variable in turn:

$b = new stdclass;
$b->m = $a->m;

This means that if you do this from a tag hook:

$parser = clone $wgParser;
$lineStart = true;
$clearState = false;
$parser->parse($text, $title, $options, $lineStart, $clearState);

then you mess up the state of any child objects that $wgParser was storing. I think the reporter is triggering this total destruction of parser state from an extension, and misdiagnosing the result.

If so, I think this is a WONTFIX. I used to recommend cloning the parser but it's since become obvious that it's a really bad idea, especially after MW 1.12. You can do it if you call clearState(), wiping out any problematic objects, that's basically how MessageCache manages it. But you can't just call random functions and expect them to work. My current thinking on this is that cloning should be replaced with calls to appropriate re-entrant functions, such as recursiveTagParse().

Nx.devnull wrote:

This is probably a stupid question, but why don't you implement the __clone function to deep copy the parser? recursiveTagParse() is not always sufficient, especially with something as complicated as DPL.

In my case I wanted to save the html rendered inside my tag extension to the database, but recursiveTagParse() stripped out all the internal links. These were later reinserted properly on the original page, but the html saved to the database didn't have them. Switching to parse() fixed it.

Deep cloning could be very slow indeed, due to the need to copy complex data structures in mTplDomCache. Also it would be subject to a high rate of bitrot, since the code would never be used on Wikimedia, and it would have to be updated every time someone added an object to the parser state. If parse() works for you then I don't see a problem, I think I'd have to hear a more convincing argument.

bugs wrote:

I was discussing this yesterday with Tim on IRC. The "fix" is much simpler than that (and much more secure) the real problem is in Parser::__destruct():

154 function destruct() {
155 if ( isset( $this->mLinkHolders ) ) {
156 $this->mLinkHolders->
destruct();
157 }
158 foreach ( $this as $name => $value ) {
159 unset( $this->$name );
160 }
161 }

There is no need to call LinkHolderArray::destruct() in line 156, since it will be unset() anyways in the following loop. In fact, probably the whole function is needless, but the doc comment says something about circular references... so I dunno. PHP does reference counting and its garbage collector also handles cycles. It will call destruct() automatically if, and only if, the reference count drops to zero. In fact, this code will call LinkHolderArray::__destruct() twice under almost all conditions.

If lines 155-157 above are erased, this specific issue will be solved. The problem is that when the cloned parser dies, the call in line 156 destroys the LinkHolderArray that was shared between both the clone and the original.

I suggest these lines be removed (which will also resolve this bug), but not intending to promote cloned Parser usage (that as Tim explained, it is deprecated), instead because the code is really really unnecessary, and does something hardly justifiable, PHP documentation says that __destruct() is a magic method that is called automatically, and doesn't endorse manually calling it:

http://br2.php.net/manual/en/language.oop5.decon.php#language.oop5.decon.destructor

So, would it be possible to reconsider this bug, but with this different fix above?

In my case, I fixed my extension to use Parser::recursiveTagParse() conditionally (when included) and the global $wgParser when not including and it is known to be safe. I also had to avoid and work around some MW functions that destroys the state of the global parser (wfMsgWikiHtml(), wfMsgExt() with 'parse' option, etc.).