Page MenuHomePhabricator

Split magic linking out of core; create new magic linking extension
Open, LowestPublic

Description

The ISBN/RFC/whatever magic linking voodoo really ought not be in core. It's a goofy feature that dates way, way back to r40.

I suggest splitting the magic linking voodoo out into an extension and adding a genericized functionality that could resolve bugs like T2224: IPA or SAMPA module (T33221: Audio pronunciation: Automatic text-to-speech to convert IPA to sound).


See Also:

Details

Reference
bz26207

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:13 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz26207.
bzimport added a subscriber: Unknown Object (MLST).

Bryan.TongMinh wrote:

I don't think the magic linking in core is necessarily evil. However, if it currently isn't, then it should be made configurable and disableble.

Either you would have to manage to keep the syntax identical or you will have to come up with some transition plan for updating what must by now be millions of instances of the old syntax to whatever new syntax you choose.

(In reply to comment #2)
Nobody has proposed to change the syntax. Only that the current usage might be moved into an extension, where also new things similar to this could be added (although I would recommended against it).

(In reply to comment #3)

(In reply to comment #2)
Nobody has proposed to change the syntax. Only that the current usage might be
moved into an extension, where also new things similar to this could be added
(although I would recommended against it).

Actually, Aryeh has suggested eliminating the syntax in favor of templates (e.g. {{ISBN|XXXXX}}). A few other bugs in this tracker have suggested using ParserFunction-type syntax (e.g. {{#ISBN:XXXX}}). An extension has the advantage of being able to be installed across all wikis easily, rather than relying on users to create and maintain templates (there are still no global templates). An extension also has an inherent disable/enable feature (in reply to comment #1). If you were creating this feature today, you'd probably do it in JavaScript (or at least I would).

Whether Wikimedia or any individual wiki would install this extension to replace the functionality is not as important as the current system degrades gracefully. There are a lot of options to move forward here, but I think the first fundamental question to answer is whether this functionality is appropriate for core.

I'm not sure what you're recommending against: adding more cases of magic link syntax or moving this functionality into an extension.

The first, adding more magic link syntax. It's also open question if we even can move this to an extension without causing too big disruption.

We could provide a way to hook into that piece of the parser and add the current ones to something called CoreMagicLinking, just as was done with parserfunctions.

Such thing is purposefully not hookable for being evil.

wsd wrote:

Magical Extension

It does seem to me that having magic linking in the core is inappropriate for a number of reasons, many noted above. Attached is a simple extension that replaces the core functionality. I have tested it with a modified MediaWiki that lacks the magic linking in the core.

A few notes about the extension:

  • it easily supports "legacy" magic linking (i.e. ISBN, RFC, PMID) via $wgMagicalClassic
  • it can be easily extended (example below)
  • the code is a little more general, and less hackey :)

An example of a custom linker (I have tested this in my LocalSettings.php):

function myCustomLinker( $matches ) {

$num = ltrim($matches[1], '0');
$url = 'http://bugzilla.wikimedia.org/show_bug.cgi?id=' . $num;
return '<a href=\'' . $url . '\'>' . $matches[0] . '</a>';

}

$wgMagicalLinkers[] = array(

'linker' => 'myCustomLinker',
'pattern' => '@bug ([0-9]+)@'

);

The name of the extension is "Magical".

attachment Magical.php ignored as obsolete

Seems to do the job. Although I'm not convinced about the efficiency of using preg_match_all.

Thanks for your contribution.

Some notes:

Do not use == null to test if the entry is present, that will produce a notice. Use isset() instead.

The name of the main function (Magical::pbt) should be much clear.

ISBN is not normalised.

You are not providing the css classes.

The callbacks are safe thanks to the limited scope of the regex. Maybe add some redundant htmlspecialchars?

If you want to consider a design change, I would have used per-parser state added in ParserFirstCallInit, like parser functions and tag hooks, probably making a single combined regex there (so you don't need to scan the text N times).

wsd wrote:

Magical Extension (rv2)

Thanks for the feedback, Platonides.

I fixed many of the things you mentioned. (I wasn't quite sure what you meant by "redundant htmlspecialchars".)

Regarding performance: I did a quick and dirty test, and my version seems to go approximately two times as fast as the core version. While I am pleased with this result, I have no idea why it is the case...

If you get a chance, it would be great if some could double check the performance.

attachment Magical.php ignored as obsolete

wsd wrote:

I did some additional testing, and my version is significantly worse for certain types of input. I'll do some more hacking...

wsd wrote:

Magical Extension (rv3)

This should fix performance problems considerably. It is now comparable performance wise with core magic linking.

attachment Magical.php ignored as obsolete

You are not doing it in the structured way of placing extensions into files (no problem, it's easy to add later), but things like
if( ! isset( $wgMagicalLinkers ) )
are since they produce a vulnerability if the host is configured with register globals.

Also, you should check that MEDIAWIKI is defined before executing anything.

It makes no sense having this line:
$wgMagicalLinkers = array_reverse( $wgMagicalLinkers );

Magical::createPattern() should be lazy-loaded, or done by the function if there's a function for registering magic words.

Using Xml::escapeTagsOnly is worse than doing no escaping at all.

With no escaping, it looks like a html injection vector but when looking at the regex it is actually secure (my previous mention to redundant calls to htmlspecialchars) http://www.mediawiki.org/wiki/Security_for_developers#Demonstrable_security

Escaping with Xml::escapeTagsOnly it looks like it is secure, so that would encourage eg. changing the regex for a more insecure one, as it is escaped, but the delimiter you use is ' but escape for ".

A comment on the $unsafePattern would be appropiate. It's quite clever.

Note that you no longer need to use the x modifier. Also, I would use / instead of @ as it is more common. The fact that @ has to be escaped on those regex may bite some people.

wsd wrote:

Magical Extension (rv4)

I think this fixes most of your concerns. Many thanks for your extensive feedback. :)

attachment Magical.php ignored as obsolete

wsd wrote:

Disable magic linking in core (temporary)

In case anyone is interested in testing this extension out, here's a quick patch to disable magic linking in Parser.php.

attachment disablecoreml.patch ignored as obsolete

Removing 1378, 2340, 2391, 3663, 4595, 6535, 8160, 8758, 10866, 10867, 12900, 13195, 13873, 18813, 18814, 19439, 20441, 26644, 28950, 29025 as depends on... they have nothing to do with moving magic linker to a extension.

-tracking, there is nothing tracking about this bug. +patch.

@William Demchick: do you currently have commit access? it might be handy (and get a few more eyes on it) if you work on it directly in svn.

test5555 wrote:

Magical Extension (rv5) with 30360

Sample patch based on rv4 to include [[Bugzilla:30360]] (please double check).

Attached:

  • Bug 30360 has been marked as a duplicate of this bug. ***

sumanah wrote:

Docu, could you please attach your patch as a plaintext diff per https://www.mediawiki.org/wiki/Patch#Posting_a_patch ? Thank you.

(In reply to comment #20)

Docu, could you please attach your patch as a plaintext diff per
https://www.mediawiki.org/wiki/Patch#Posting_a_patch ? Thank you.

Since it's a new file / new extension that isn't really necessary.

audreyt wrote:

Hi Docu, thank you for the patch!

As you may already know, MediaWiki is currently revamping its PHP-based parser
into a "Parsoid" prototype component, to support the rich-text Visual Editor
project:

https://www.mediawiki.org/wiki/Parsoid
https://www.mediawiki.org/wiki/Visual_editor

Folks interested in enhancing the parser's capabilities are very much welcome
to join the Parsoid project, and contribute patches as Git branches:

https://www.mediawiki.org/wiki/Git/Tutorial#How_to_submit_a_patch

Compared to .diff attachments in Bugzilla tickets, Git branches are much easier
for us to review, refine and merge features together.

Each change set has a distinct URL generated by the "git review" tool, which
can be referenced in Bugzilla by pasting its gerrit.wikimedia.org URL as a
comment.

If you run into any issues with the patch process, please feel free to ask on
irc.freenode.net #wikimedia-dev and the wikitext-l mailing list. Thank you!

(In reply to comment #1)

I don't think the magic linking in core is necessarily evil. However, if it
currently isn't, then it should be made configurable and disableble.

This got split out to bug 45942 ("Magic links" RFC, PMID and ISBN should be gated by configuration).

Jdforrester-WMF lowered the priority of this task from Medium to Lowest.EditedOct 27 2015, 6:32 PM
Jdforrester-WMF subscribed.

Though it would be nice to move this out of core, I don't think I can justify it as a priority, sorry.

We had an Extension created for this in May 2011, and we have v5 of it at T28207#294990 .

Can we find out who "wsd" was from the Bugzilla database, in order to credit them, and find out what license they wish to put on it? It could at least be copied to the wiki, were people will find it.

@Legoktm resolved T47942: "Magic links" RFC, PMID and ISBN should be configurable and disableable with https://gerrit.wikimedia.org/r/309528, so there's now $wgEnableMagicLinks in MediaWiki core. I left a note on Gerrit expressing my disagreement with the approach taken.

Just to let you know that I've wrapped wsd's patch to https://www.mediawiki.org/wiki/Extension:MagicalLinkers
It has been developed against MW 1.31. I think that with this extension, the core functionality can be removed.