Page MenuHomePhabricator

Enable Lua extension on WMF wikis
Closed, ResolvedPublic

Description

bug 6455 to enable the string parsing capabilities of ParserFunctions (ex-StringFunctions) was closed with a comment saying they are a horrible mess and the Lua extension is a saner alternative. If that is the case, please enable it on WMF wikis. Reasons for why string processing would be useful can be found in the discussion for that bug.


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/Extension:Lua

Details

Reference
bz19298

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:43 PM
bzimport set Reference to bz19298.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Not very likely to happen on Wikimedia, as discussed elsewhere (mainly IRC that I remember). The Lua extension requires installation of a PHP extension and/or the ability to use exec(). If it were enabled on Wikimedia, all templates would use it pretty soon, and anyone on shared hosting without either of these rights would be unable to use large chunks of Wikimedia content. PHP extension installation requires root access, and exec() is unsafe on shared hosts that have all PHP executed by a single user (using mod_php, FastCGI, etc.).

So it's a Catch-22 then? Sane solutions involve compiled interpreters, and won't be used by WMF for security and accessibility reasons, while solutions which use a PHP-based interpreter are deemed insane and thus won't be used by WMF?

ayg wrote:

That seems to be the current situation, yes. Maybe at some point we can give up on the requirement that you be able to fully use Wikipedia content without exec() rights; then we could use Lua (which is *way* preferable to StringFunctions for sure). You'd have to ask Brion about that.

Other alternative: The Abuse Filter parser could be modified be embeddable.

ayg wrote:

Interesting idea. That would make a lot of sense. Not as powerful or "nice" as Lua, but it's vastly saner syntax than StringFunctions. How easy would that be to write up?

(In reply to comment #5)

Interesting idea. That would make a lot of sense. Not as powerful or "nice"
as Lua, but it's vastly saner syntax than StringFunctions. How easy would that
be to write up?

It wouldn't be difficult to make the abuse filter parser generic enough to include inline in wikitext.

There would be a few things to clean up enough to actually deploy it inline on Wikimedia:

  • We'd want a more comprehensive testing suite to make sure nothing regressed.
  • We'd want to reimplement the parser either with a shunting-yard algorithm, and/or in C/C++, to handle the increased load the feature would undoubtedly get vis-a-vis the parser as used by the abuse filter.
  • I understand there are a few potential security holes with user-supplied regexes, including at least denial of service attacks by making very computationally-difficult regexes and running them against very large test strings. In the past there have been remote code execution vulnerabilities with user-supplied regexes. We'd need to find some way to work around this, or disable regexes.
  • Generally speaking, there are other ways to DoS (and maybe more) the servers with untrusted code.

catlow wrote:

As an occasional amateur programmer (not php unfortunately) and bemused onlooker, I really don't understand how there can possibly be a problem with this. String functions are such a basic thing - surely there's someone among the devs with the elementary competence to write them into parser functions in an efficient way without a whole song and dance. (I mean, finding the length of a string and so on, even in Unicode, is surely much simpler than reformatting numbers and doing arithmetic.) Start off with the simple things at least - those which don't require any special extension or create vulnerabilities - and then move up to the trickier stuff as and when.

(In reply to comment #7)

As an occasional amateur programmer (not php unfortunately) and bemused
onlooker, I really don't understand how there can possibly be a problem with
this. String functions are such a basic thing - surely there's someone among
the devs with the elementary competence to write them into parser functions in
an efficient way without a whole song and dance. (I mean, finding the length of
a string and so on, even in Unicode, is surely much simpler than reformatting
numbers and doing arithmetic.) Start off with the simple things at least -
those which don't require any special extension or create vulnerabilities - and
then move up to the trickier stuff as and when.

It is not a question of whether it's possible, but of whether it's a good idea. Most developers agree that parser functions for string manipulation are the wrong path to go down, and that we should consider other ways of providing useful functionality to users without adversely affecting their sanity.

catlow wrote:

Why should parser functions that do strings affect users' sanity? That attitude makes no sense to me. If formatnum and padleft are parser functions, then why shouldn't len and subst be parser functions? The existing parser functions involve string manipulation anyway (plus some arithmetic) - why not use the same route for functions that are even simpler because they don't involve any arithmetic?

(In reply to comment #9)

Why should parser functions that do strings affect users' sanity? That attitude
makes no sense to me.

Wikitext is supposed to be a *markup* language, not a Turing-complete programming language. It is designed for presentation, not computation. If you want computation, you should write a server-side parser function which implements the functionality you need.

For example, somebody recently implemented a full hexadecimal to decimal converted *in wikitext*. That is totally insane, and something that should have been done in its own parser function, once an appropriate use case had been explained.

The *point* is that you should be able to, with relative ease, edit a page, template or whatever. That's what Wikipedia is about. The syntax for parser functions, especially with string functions, is totally insane, and is difficult to edit even for experienced programmers. This is a serious usability issue, which would doubtless be made ten times worse by string manipulation parser functions.

If formatnum and padleft are parser functions, then why
shouldn't len and subst be parser functions?

Because formatnum and padleft are for markup/formatting, not computation. Giving users the ability to format numbers and values is something that we are more than willing to support. Giving users the ability to write their own natural language parsers (as many have expressed the desire to do) is not something we are willing to support, from a resources perspective (you think it's cheap to parse the intended use cases of these parser functions?), nor from a philosophical perspective (making wikitext even harder to edit is *not* part of our mission).

We have not seen a single suggested use case, which we do not object to on one of the above grounds. Therefore, we will not be activating StringFunctions.

The reason that we are even *considering* a Lua-based inline expression parser is because it would be difficult to now remove the existing parser functions (which were a bad idea in the first place), and their syntax is terrible and causing serious usability issues on Wikimedia projects.

catlow wrote:

But we already have functions under "expr", which *do* do computation and are found in practice to be very useful. Surely len(x) or x[3:5] is far cheaper, no more confusing to users and just as potentially useful, as ((x+50)/0.456. Have you seen the ugly and costly hacks that people are forced to use with padleft/right just to get the length of a string? And substring retrieval is impossible, as far as I know, which means some templates end up far more complex and harder to use than they need be. I agree that the existing syntax is bad, but the functionality it produces is extremely useful, and the addition of a few more functions using reasonable syntax (I'm not saying we need to have *every* function that's been requested) is not going to make the overall syntax problem any worse.

(In reply to comment #11)

But we already have functions under "expr", which *do* do computation and are
found in practice to be very useful. Surely len(x) or x[3:5] is far cheaper, no
more confusing to users and just as potentially useful, as ((x+50)/0.456.

Yes, as I said in my previous message, introducing expr and so on was a mistake in my opinion.

Have
you seen the ugly and costly hacks that people are forced to use with
padleft/right just to get the length of a string? And substring retrieval is
impossible, as far as I know, which means some templates end up far more
complex and harder to use than they need be.

"Forced" to use? Nobody is forcing you to do ugly things with wikitext that it was never intended to be used for.

Those padleft hacks are just as likely to stop working sooner or later, because they're horrible and I haven't seen a single good use case for them.

I agree that the existing syntax
is bad, but the functionality it produces is extremely useful, and the addition
of a few more functions using reasonable syntax (I'm not saying we need to have
*every* function that's been requested) is not going to make the overall syntax
problem any worse.

So the existing syntax sucks, but we should encourage it by adding more functions that use it, instead of rethinking our syntax altogether. Your justification for this seems to be that it will take less time and you need your string functions *now*. I don't buy that.

catlow wrote:

Well, you said above it would be difficult to remove the existing parser functions - I assume that means it's not going to happen. So there's no point in waiting for it to happen (i.e. forever) before adding simple string functions, which users do want and would find useful (see past discussion). If people are finding other ways to do useful things with wikitext besides what it was originally conceived for, that's a GOOD thing. (Some devs seem to think they know what users want better than the users themselves do.)

matthew.britton wrote:

(In reply to comment #4)

Other alternative: The Abuse Filter parser could be modified be embeddable.

eurgh, I hope not. Give administrators the ability to block reading pages as well as editing them...

(In reply to comment #14)

(In reply to comment #4)

Other alternative: The Abuse Filter parser could be modified be embeddable.

eurgh, I hope not. Give administrators the ability to block reading pages as
well as editing them...

I don't think you get it. I mean adopting the parser, not the filter itself.

Am I the only one who looks at Lua and wonders how on Earth that is better?

I don't think embedding an entire programming language, complete with its own syntax conventions, function library, recursion, multi-threading, and everything else can possibly qualify as a usability enhancement. Yes, the code will almost certainly be arranged in a cleaner and more logical fashion, but the barrier to be able to edit that code will be just as high if not worse simply because you are forcing people to learn a whole new syntax and function set.

In addition, Lua is basically a pipe dream anyway because it would be quite hard to make it secure enough to usable. With a full programming language, it would be trivial to write code that would consume as much CPU and memory as you let it have and flood Apache with 100s of megabytes of output. Even if one sandboxes it sufficiently to deal with these obvious cases, one would still have to spend a lot of time considering less obvious abusive code and ways it can interact inappropriately with the parser. And that's on top of the portability problems others have already mentioned. I certainly can't see Lua being viable any time soon.

Template syntax has a lot of problems, but I don't see how dropping an entire programming language into wikicode is the answer. Personally, I'd rather have string functions now and Lua never.

I'll reserve judgment on the hypothetical abuse filter approach until there is a more concrete proposal to discuss. At least with that there is a chance to integrate it in a safe and reasonable way, but I still worry that a whole new programming syntax would be a hindrance rather than a boon to usability. However, unless Andrew is really gung-ho to work on such a thing, it would also seem to be a long way off.

I think the entire discussion is horribly off target. Nobody wants to "install Lua" or "install StringFunctions". The three real use cases why I originally subscribed to the first bug were:

  • Simplify templates like FormatDate that transform "2009-06-22" into anything without the weird math behind it so that it can be understood (and edited! It's a wiki! Or at least should be.) again by "normal" users who do not want to solve puzzles, but problems.
  • Simplify templates like the geographical coordinates manglers that use large switches at the moment to group "US-WA" under "US". (I really doubt that these are more performant than an adequate StringFunctions usage.)
  • Allow to test whether a parameter ends with a "." so that the template can decide whether it has to append one by itself.

How these (and others) are solved, with a generic Lua extension, or StringFunctions that can only be used by sysops, or one-purpose-only extensions, I do not care. But I do not think that a potential threat of abuse should stand in the way of good use.

ayg wrote:

(In reply to comment #16)

Am I the only one who looks at Lua and wonders how on Earth that is better?

I don't think embedding an entire programming language, complete with its own
syntax conventions, function library, recursion, multi-threading, and
everything else can possibly qualify as a usability enhancement. Yes, the code
will almost certainly be arranged in a cleaner and more logical fashion, but
the barrier to be able to edit that code will be just as high if not worse
simply because you are forcing people to learn a whole new syntax and function
set.

I'm not sure whose sanity is supposed to be preserved by Lua, either. Clearly, it's not the average user, who can't program and won't be able to work in Lua any more than in parser functions. But if we're trying to preserve the sanity of template hackers, then it seems a little unreasonable to do so even over their own objections. They'd all love StringFunctions, apparently, sanity-threatening or not.

I feel like the issue here is that some people, as programmers, don't want to have anything to do with encouraging a hacky, awful macro language that's painful for *them* to even think about. I really can't find anyone's sanity being preserved other than the MediaWiki developers' here. I haven't seen usability objections to StringFunctions from *anyone* but MW developers, none of whom are actually involved in template editing on the wikis in question.

But that's just my opinion. :)

We might look into this at some point, but it's definitely not a near-term thing. Marking LATER since we might actually poke at it one day. :)

I have to agree with the editors here. We are using the tools we have, yes people will write all sorts of stuff in template code, just as I wrote a set of arithmetic functions in regex, not because we want to, but because we are trying to achieve a goal and that's the only way to do it in a reasonable time. However, and despite [[WP:PERF]], these hacks and kludges must be generating an suhbstantial server load. And the wiki-way means it will get worse, because once someone has hacked together a replacement for a given string handling function, it becomes part of the repertoire. I started using some of these yesterday, and only curiosity made me dig deep into the various template levels, to see what was going on - it is not a pretty sight. I can see no way that native parser functions can be worse than template hacked parser functions which often have to iterate character by character at best.

Was this intended to be filed under the fundraising-misc component?

EN.WP.ST47 wrote:

Oops, I missed the "extension setup" button!

Changing this from "RESOLVED LATER" to "RESOLVED FIXED", since Scribunto exists and has been deployed.