Page MenuHomePhabricator

Switch AbuseFilter to using Lua
Closed, DeclinedPublic

Assigned To
None
Authored By
Reedy
Apr 22 2013, 7:30 PM
Referenced Files
None
Tokens
"Love" token, awarded by Danny_B."Love" token, awarded by Ricordisamoa."Like" token, awarded by Dalba."Like" token, awarded by werdna.

Description

As a maintainer of Lua modules and abuse filters, I want to be able to use the same language for both tasks, so that the learning curve is not so steep.

Rather than using AbuseFilters own language, with its own bugs and limitations, could/should we switch it to Lua? Just dropping support of the current language would be bad, but preferring to use something more standard?

See Also:

Details

Reference
bz47512

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:27 AM
bzimport set Reference to bz47512.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 49248 has been marked as a duplicate of this bug. ***

It is a very good idea. Now the conditions of all filters are tested for all edits, by using Lua we can group the conditions and don't test, for example, filters for anonymous in registered edits.

Not sure whether it's good to add bug 50454 to "see also" and/or reopen it, but this Lua (for AbuseFilter) needs to include a regex library (see ngx.re in the nginx version of Lua), or it will be impossible to migrate current filters.

He7d3r set Security to None.

I think it's worth considering to let the next generation of AbuseFilter work on the HTML DOM rather than wikitext.

Advantages:

  • Matching semantic HTML can be more precise and simpler than matching wikitext.
  • We can speed up HTML saves by returning success after checking the HTML and saving it to the DB only, without waiting for the re-parse from wikitext to HTML.
  • We can continue to use AbuseFilter with HTML-only wikis and other HTML-primary features.

Issues:

  • On wikitext edit, we'd need to parse the new revision from the edited wikitext before feeding it to AbuseFilter and then saving it. While the total time this takes should be unaffected by reordering, there are potentially issues with {{REVISION*}} magic words. Those would need to be updated in the HTML after the parse, which is possible but not implemented so far. This might already be partly solved by T66767.

@Ricordisamoa added a project: Scribunto.

I'm not sure what this has to do with Scribunto, besides that AbuseFilter would likely want to reuse Scribunto's classes for interfacing with Lua and that might want some refactoring.

Not sure whether it's good to add bug 50454 to "see also" and/or reopen it, but this Lua (for AbuseFilter) needs to include a regex library (see ngx.re in the nginx version of Lua), or it will be impossible to migrate current filters.

Scribunto isn't going to get a regex library, for the reasons described in T52454#555457.

But that doesn't mean AbuseFilter can't create its own instance of Scribunto_LuaEngine or Scribunto_LuaInterpreter and inject extra libraries, including one that provides regular expression matching, if it's deemed safe for that context.

@Anomie写道:

I'm not sure what this has to do with Scribunto, besides that AbuseFilter would likely want to reuse Scribunto's classes for interfacing with Lua and that might want some refactoring.

Not only this, I would also suggest to add Wikidata here, since Lua things are more and more intimacy with them.

Daimona subscribed.

My assessment:

  1. We only need a fairly simple syntax for filters, there's no need to allow complicated stuff
  2. Our dedicated parser is probably faster than Lua's (exactly because it's dedicated), and this will be especially true as CachingParser will be implemented
  3. Higher expressive power means potential performance troubles from badly-written filters
  4. Adapting Lua to our needs (removing potentially dangerous stuff, injecting our variables, regex library etc.) could possibly nullify the performance gain
  5. AbuseFilter's syntax is actually simpler than Lua's, and would only make writing filters harder for non-tech people (they'd have to learn a full programming language, instead of a basic language).

Personally I never had the necessity to do something in AbuseFilter for which Lua would be needed, nor I see possible use cases. Also, bugs from AF parser should be resolved inside AF parser, so that's not a good reason for the switch.
If someone can find a use case for Lua syntax, a simple way to adapt Lua to our needs without performance loss, and it turns out that Lua is easier to learn for non-tech people, then feel free to reopen the task.