Page MenuHomePhabricator

preg_match cannot compile pattern in includes/IP.php
Closed, ResolvedPublic

Description

Author: p.oranje

Description:
After svn update to r76651 the following warning is shown:

Warning:  preg_match() [function.preg-match]: Compilation failed: assertion expected after (?( at offset 157 in [my server]/mediawiki-trunk/includes/IP.php on line 85

The actual regex="/^((::|:(:([0-9A-Fa-f]{1,4})){1,7})|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){0,6}::|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){7}|([0-9A-Fa-f]{1,4})(:(?P<abbr>(?(abbr)|:))?([0-9A-Fa-f]{1,4})){1,6}(?(abbr)|^))(\/(12[0-8]|1[01][0-9]|[1-9]?\d)|)$/"

Used software versions:
mediawiki: 1.17alpha (r76658)
PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)


Version: 1.17.x
Severity: normal

Details

Reference
bz25920

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:20 PM
bzimport set Reference to bz25920.
bzimport added a subscriber: Unknown Object (MLST).

CC'ing aaron. I think he might be the one to have broken this (he's been working on it recently)

(In reply to comment #0)

After svn update to r76651 the following warning is shown:

Warning:  preg_match() [function.preg-match]: Compilation failed: assertion

expected after (?( at offset 157 in [my server]/mediawiki-trunk/includes/IP.php
on line 85

The actual
regex="/^((::|:(:([0-9A-Fa-f]{1,4})){1,7})|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){0,6}::|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){7}|([0-9A-Fa-f]{1,4})(:(?P<abbr>(?(abbr)|:))?([0-9A-Fa-f]{1,4})){1,6}(?(abbr)|^))(\/(12[0-8]|1[01][0-9]|[1-9]?\d)|)$/"

Used software versions:
mediawiki: 1.17alpha (r76658)
PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should
work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)

Are you saying there might be a PHP bug?

From: http://php.net/manual/en/function.preg-match.php

5.2.2 Named subpatterns now accept the syntax (?<name>) and (?'name') as well as (?P<name>). Previous versions accepted only (?P<name>).

I choose ?P since it was the oldest. It *should* be fine...

OK, I think the problem is that the backreference (not just the name) also needs the "P". *sigh of relief*.

(In reply to comment #4)

OK, I think the problem is that the backreference (not just the name) also
needs the "P". *sigh of relief*.

I do think it's something about the conditional and not the named ref. that 5.2.8 dislikes. Worst case, I can use numbered references as a workaround, but it makes the expression less versatile for outside use (like combining it with other regexes).

The following is also valid syntax (perl style):
define( 'RE_IPV6_ADD',
'(' . // starts with "::" (includes the address "::")

		'(::|:(:' . RE_IPV6_WORD . '){1,7})' .

'|' . // ends with "::" (not including the address "::")

		RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){0,6}::' .

'|' . // has no "::"

		RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){7}' .

'|' . // contains one "::" in the middle ("^" check always fails if no "::" found)

		RE_IPV6_WORD . '(:(?P<abbr>(?(<abbr>)|:))?' . RE_IPV6_WORD . '){1,6}(?(<abbr>)|^)' .

')'
);

Does this work on your setup?

p.oranje wrote:

(In reply to comment #2)

(In reply to comment #0)

[cut]
PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should
work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)

Are you saying there might be a PHP bug?

No, but the new named pattern syntax does not work with hosting provider,
though the reported version of PHP > 5.2.2. So I thought the same could be true
for the backreferences.

(In reply to comment #5)
A test with as part of the pattern set set on line 45 like proposed by you

RE_IPV6_WORD . '(:(?P<abbr>(?(<abbr>)|:))?' . RE_IPV6_WORD . '){1,6}(?(<abbr>)|^)' .

produces the error message:

Compilation failed: assertion expected after (?( at offset 157 in [my

server]/mediawiki-trunk/includes/IP.php on line 85

An other test with as part of the pattern set set on line 45 like

RE_IPV6_WORD . '(:(?P<abbr>(?P(abbr)|:))?' . RE_IPV6_WORD .

'){1,6}(?P(abbr)|^)'

produces the error message:

Compilation failed: unrecognized character after (?P at offset 157 in [my

server]/mediawiki-trunk/includes/IP.php on line 85

My knowledge of PHP is superficial, so likely the other pattern I tested isn't right.
What would be the correct pattern (for the back reference "?(abbr)")?
It seems to me a good choice that the oldest still supported syntax be used.

Before PHP 5.3, PCRE was an extension. The host might be using an old version.

From: http://www.pcre.org/changelog.txt
...Version 7.0 19-Dec-06...

(d) A conditional reference to a named group can now use the syntax
    (?(<name>) or (?('name') as well as (?(name).

It looks like (?(name)x|y) was actually the oldest named group assertion syntax, from "Version 4.0 17-Feb-03".

Do you know what version of PCRE the host uses as the PHP extension?

p.oranje wrote:

Version of PCRE according to phpinfo() is "6.6 06-Feb-2006".

(In reply to comment #8)

It looks like (?(name)x|y) was actually the oldest named group assertion
syntax, from "Version 4.0 17-Feb-03".

I found another log entry in the changelog. It shows that named subpatterns in conditionals were added in 6.7:

...Version 6.7 04-Jul-06...

  1. Added the ability to use a named substring as a condition, using the Python syntax: (?(name)yes|no). This overloads (?(R)... and names that are numbers (not recommended). Forward references are permitted.

They must have been left out in 4.0 then. This now makes sense.

OK, I changed the regex in r76876. It now avoids conditionals on whether a named group matched.

This should be fine on 4.0+, does this give you any trouble?

p.oranje wrote:

Sorry, but it doesn't work yet.

The error msg:

Compilation failed: reference to non-existent subpattern at offset 163 in [myserver]/mediawiki-trunk/includes/IP.php on line 92.

(In reply to comment #13)

Sorry, but it doesn't work yet.

The error msg:

Compilation failed: reference to non-existent subpattern at offset 163 in

[myserver]/mediawiki-trunk/includes/IP.php on line 92.

"6.7...14. Added forward references in named backreferences (if you see what I mean)."

[Sigh]...I guess named refs were really half-assed before 6.7. Maybe there is some weird workaround.

(In reply to comment #13)

Sorry, but it doesn't work yet.

The error msg:

Compilation failed: reference to non-existent subpattern at offset 163 in

[myserver]/mediawiki-trunk/includes/IP.php on line 92.

Here is another variant. The forward reference is now a nested reference (nested refs supported for non-named groups since 2.0). If named groups support it too, then this should work:

define( 'RE_IPV6_ADD',
'(?:' . // starts with "::" (includes the address "::")

		'::|:(?::' . RE_IPV6_WORD . '){1,7}' .

'|' . // ends with "::" (not including the address "::")

		RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){0,6}::' .

'|' . // has no "::"

		RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){7}' .

'|' . // contains one "::" in the middle (awkward regex for PCRE 4.0+ compatibility)

		RE_IPV6_WORD . '(?::(?P<abn>(?!(?P=abn)):(?P<iabn>))?' . RE_IPV6_WORD . '){1,6}(?P=iabn)' .
		// NOTE: (?!(?P=abn)) fails iff "::" used twice; (?P=iabn) passes iff a "::" was found.
		
		// Better regexp (PCRE 7.2+ only), allows intuitive regex concatenation
		#RE_IPV6_WORD . '(?::((?(-1)|:))?' . RE_IPV6_WORD . '){1,6}(?(-2)|^)' .

')'
);

p.oranje wrote:

Your last variant does not produce errors.

Whether the regex actually works cannot be tested by me - my hosting provider does not have IPv6 connectivity.

p.s.
This has been an interesting bug sofar in grasping the development of PCRE, tou qualify as historian. Good luck.

(In reply to comment #16)

Your last variant does not produce errors.

Whether the regex actually works cannot be tested by me - my hosting provider
does not have IPv6 connectivity.

p.s.
This has been an interesting bug sofar in grasping the development of PCRE, tou
qualify as historian. Good luck.

You can run tests as follows:
WINDOWS: execute maintenance/tests/phpunit/run-tests.bat
UNIX: run 'make test' from a shell in the maintenance/tests/phpunit directory

All the stuff I was proposing above passed the tests for me.

Of course, phpunit has to be installed first :)

p.oranje wrote:

Besides the above requirement, the makefile contains a severe warning that the tests should not be run on a production system. For that reason I shall not run the test. Sorry.

When you can instruct me on a non-destructive and sufficiently restrictive test suitable for this bug, I'll be happy to consider running such a test.

(In reply to comment #19)

Besides the above requirement, the makefile contains a severe warning that the
tests should not be run on a production system. For that reason I shall not run
the test. Sorry.

When you can instruct me on a non-destructive and sufficiently restrictive test
suitable for this bug, I'll be happy to consider running such a test.

Are you running /trunk on a production system? I just figured it was a test wiki.

I don't know of a clean way to run IPTests.php by itself. I suppose one could edit the file to have require("MW DIR/includes/IP.php") at the top and then run "phpunit IPTests" from shell.

In either case, you've been MORE than helpful enough here :)... I'll commit the last regexp.

(In reply to comment #21)

Committed in r76928.

Reverted by Sam. Apparently there is a PCRE 8.02 segfault bug (I'm running 7.9). Yay PCRE!

(In reply to comment #22)

(In reply to comment #21)

Committed in r76928.

Reverted by Sam. Apparently there is a PCRE 8.02 segfault bug (I'm running
7.9). Yay PCRE!

PCRE bug filed upstream. Workaround done in r77067.

p.oranje wrote:

The workaround works also with my old PCRE.