Page MenuHomePhabricator

ISBN-recognition fails on (older) ISBN divided by spaces
Closed, ResolvedPublic

Description

Author: interlingue

Description:
Older ISBN may be internally divided by hyphens or spaces. The software
recognizes divisions by hyphen correctly but not ISBN divided by spaces.

If a space would be added to the string search (e. g. (regex) "ISBN
([0123456789-Xx ]+)" instead of (regex) "ISBN ([0123456789-Xx]+)") this bug
could be fixed.


Version: unspecified
Severity: normal
OS: Linux
Platform: PC
URL: http://de.wikipedia.org/wiki/Benutzer:Sebastiano/ISBN

Details

Reference
bz8110

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:34 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz8110.
bzimport added a subscriber: Unknown Object (MLST).

[[ISBN]] says both hyphens and spaces are valid, so I guess this should be
fixed. But in the mean time, you could work around it by just using hyphens
instead -- it's not as if the choice delimiters actually means anything.

jepe wrote:

A hyphen with a space on each side is also sometimes used as a delimiter between
the ISBN and the other text on a line. So, a good fix shall also check that a
space and a hyphen may not occur next to each other in a link. For example:

ISBN 123456789 - 2005

In this case the year may not become a part of the ISBN link.

jepe wrote:

The example above now makes a link for the ISBN including the year, so it looks
for ISBN 1234567892005 on Special:Booksources.

See [[User:JePe/temp]]

ayg wrote:

It could be expanded to

(?:[0-9]{3}[-\ ]?)?[0-9]+[-\ ]?[0-9]+[-\ ]?[0-9]+[-\ ]?[0-9xX]

but that's quite a bit ickier than

[0-9Xx\ \-]+

I don't think it would make a noticeable difference to execution time, though.

Fixed in r18415. This change makes the magic ISBN detection much stricter; it
should still match all valid ISBNs, but there might be some breakage somewhere
if someone had previously relied on it matching invalid ones as well.