Page MenuHomePhabricator

[Regression] IRC string output for log messages no longer compatible
Closed, ResolvedPublic

Description

Author: seahorseruler

Description:
A change in 1.19 to the irc.wikimedia.org recent changes feed has broken multiple bots and tools that rely on it. It appears that the change was made to log reports, but I am not sure exactly what was changed in the format. The Countervandalism network bots are failing to report log actions, and StewardBot is not properly reporting user right changes. I have heard of other tools being affected by this too. If this can be looked into, that would be great. Thanks.


Version: 1.19
Severity: critical

Details

Reference
bz34508

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:11 AM
bzimport set Reference to bz34508.

If you could mention what has actually changed (before/after examples would also be helpful) would be great.

I'm going to guess this is due to the rewrite of how we manage our logging system on the MW end as well.

Can someone point to the web pages/maintainer information for at least one or two of the most important bots in question?

Could be related to the special page name ("Log") being translated. Although that was supposed to be reverted and afaik was.

There is an API to translate special page names but right now bot maintainers don't know about this and it has worked for years.

The the CVN[1] this is breaking all after-log-action hooks they have (including those for reporting of local blocks and preventing cross-wiki vandalism). As well as our hooks for the patrol-action, delete-
action, upload-action (commons upload patrol!) on others.

  • Krinkle

[1] https://meta.wikimedia.org/wiki/CVN

Krinkle, is r97711 the one you were thinking of?

The only difference I see are in the actual log message text, which comes from MediaWiki i18n files: the username is now prepended in the message, and the exact wording of some logs have changed.

If that is the case it's not a simple one line fix, either you revert the whole new logging system (which you can't, since it stores logs in format not compatible with 1.18), or write new code that only emulates the old log format output. Would this be a sign that you could help us to make proper replacement like XMPP?

Anyway, please confirm what is the actual cause, Krinkle already asked before this bug was reported.

(In reply to comment #5)

Could be related to the special page name ("Log") being translated. Although
that was supposed to be reverted and afaik was.

That was bug 30245 and it's reverted in 1.19, so the problem is elsewhere.
I've seen mention of section names and other things which might have changed, can't we possibly have a description of what is actually broken?


(I've seen Philippe, robla, Jamesofur, Pathoschild, Bsadowski1 and Tanvir discussing it on IRC, I don't know where they put the result of their investigation.)

This has been told to be the old format:
<Bsadowski1> <rc-pmtpa> [[Special:Log/block]] block * MikeLynch * blocked User:116.240.175.168 (anon. only, account creation blocked) with an expiry time of 24 hours: [[WP:Vandalism|Vandalism]]

And this appear to be the new one:
<Nikerabbit> [[Special:Log/block]] block * Nemo bis * Nemo bis blocked [[User:QoyuxeLetoho]] with an expiry time of indefinite (account creation disabled, e-mail blocked, cannot edit own talk page): Spamming links to external sites

(In reply to comment #8)

Would this be a sign that you could help us to make proper replacement
like XMPP?

And this is [[m:Recentchanges via XMPP]] aka bug 17450.

(In reply to comment #8)

The only difference I see are in the actual log message text, which comes from
MediaWiki i18n files: the username is now prepended in the message, and the
exact wording of some logs have changed.

If that is the case it's not a simple one line fix, either you revert the whole
new logging system (which you can't, since it stores logs in format not
compatible with 1.18), or write new code that only emulates the old log format
output.

Can't we just revert the messages?

(In reply to comment #11)

Can't we just revert the messages?

Niklas said «or write new code that only emulates the old log format output» because apparently the two things (RC output and logging system) are currently bundled.

They're not that closely bundled. The $actionComment parameter to RecentChange::notifyLog() only exists to supply a comment for IRC, this was always the case. So all you have to do to fix it is to set the right thing for $actionComment, or to update the interpretation of it in RecentChange::getIRCLine() to account for the new system.

All of the logentry-* messages in English happen to start with "$1 ". The duplicated username in the IRC log line could be fixed by taking the log message and substituting the $1 with getIRCLine()'s formatted username, something like "\00303$user\003 \0035*\003 \00310".

Any minor changes in the wording of the log entries can be reverted. As I said on bug 30245, I'm not opposed to localisation changes in the IRC line format, as long as sufficient notice is given to bot authors. That notice hasn't been given.

I think everyone would be happy with the revert by anyone who knows how to do it (also in trunk), if it kept the new log structure on wiki.

Krinkle has started looking into this, and found that log actions weren't making it from irc://irc.wikimedia.org/#meta.wikimedia to irc://irc.freenode.net/#cvn-meta. At least move-log has changed in 1.19, and probably other actions as well.

This is being discussed on #wikimedia-dev as of this writing.

(In reply to comment #7)

Krinkle, is r97711 the one you were thinking of?

Indeed, but it appears that's not the problem (which would've been weird, as that rev has been reverted).

So after a little investigation just now with the CVNBots and robla and Nikerabbit on IRC, it appears we have good news, bad news and more good news.

Good news:
The format for IRC/UDP message notifications has not been changed. As such the bots are still reporting edits, just not log messages. The log messages are quite important though (for CVN especially: block, delete, upload, move and patrol).

Most bots do (and all bots should) support changes to these i18n messages. And they do. They support this by regularly emptying their cached version of the downloaded MediaWiki interface message used for the various log actions. They extract information from log actions (such as move-origin and move-target in a 'move' log action) like so (sample taken from the C# source of SWMTBot[1], [2], [3])

  1. Project: generateRegex("MediaWiki:1movedto2_redir", 2, ref moveredirRegex, false);
  2. RCReader: Match mrm = ((Project)Program.prjlist[rce.project]).rmoveredirRegex.Match(rce.comment); rce.movedTo = Project.translateNamespace(rce.project, mrm.Groups["item2"].Captures[0].Value);

Bad news:
The messages didn't just change (since that would not break the bots as wikis sometimes change the message locally as well and the bots support i18n updates), but the message-keys themselves and the order and meaning of the $1, $2 parameters changed. And according to Nike this is pretty much unrevertable as it is a fairly crucial part of how the new logging system works.

More good news:
The IRC line is easily overwritten in the logging system. The plan is to build a simple extension that will hook into the code (this hook doesn't exist yet but is easily added) and overwrite the irc-line with the old method (using the old message key and value and the old variable replacements).

  • Bug 34495 has been marked as a duplicate of this bug. ***

Created attachment 10055
bug34508.1.patch

Nikerabbit posted this suggested framework here:
http://translatewiki.net/static/llogs.txt

I've extended this with the attached patch, to see if it is a workable approach. I've handled exactly one case first: move_redir, and I think I've got something that's identical to 1.18 in this case.

Attached:

Created attachment 10056
bug34508.2.patch

bug34508.2.patch represents another attempt at fixing this problem, after talking with Tim on IRC. Tim isn't a big fan of only going halfway with splitting out the IRC code into an extension; either move it all out, or figure out how to fix the core code.

So, I took a stab at fixing the core code. This version is a bit naive about where else LogFormatter::getPlainActionText() is used. So, this had the unintended side effect of also changing the format that shows up in page history (though not recent changes).

If one wanted to keep going on this approach, it wouldn't be hard to create LogFormatter::getIRCActionText() and adapt this patch accordingly, having LogEvent::publish() call that instead.

Someone is going to have to take one of these approaches the rest of the way to fixing all of the log messages. I'm indifferent about which one, but we need to get this done prior to pushing to commons.

Attached:

Created attachment 10057
bug34508.3.patch

This is another stab at the problem, this time with the getIRCActionText() approach described in a previous comment. I started the process of actually stepping through the messages, but it turned out to be a little harder than I thought. delete/delete as found in 1.18 needed to be constructed a little bit differently than the 1.19 version. I got stuck on delete/restore, which looks like it's coming in from a different call path than what I'm expecting. The other messages may be a bit easier.

Attached:

Just for the record: Path 2 and 3 as it currently is would still break most bots since the parameter numbers are still incompatible with the previous messages. Patch 1 does fix the bots (not tested though, but pretty sure it is).

The following is a list of things that MediaWiki does not provide to bots and as such is hardcoded:

  • Log action name (e.g "move")
  • Log message edit-summary message-key (e.g. "1movedto2_redir")
  • Which variable replacements represent what (e.g. movedTo = $2)

Anything else is variable and the bots support changes to that. So contrary to what I think Tim is thinking, it's totally fine if the log message edit summary reads "TimStarling renamed awesome article [[Foo]] to [[Bar]]" instead of "moved [[Foo]] to [Bar]]".... as long as the i18n message that produces that is still stored at the same message-key and replaces the variables with the values.

(In reply to comment #21)

Just for the record: Path 2 and 3 as it currently is would still break most
bots since the parameter numbers are still incompatible with the previous
messages. Patch 1 does fix the bots (not tested though, but pretty sure it is).

I see what you're saying about the old messages. What I was thinking about doing was some semi-automated approach to taking the old messages and turning them into new messages.

Anything else is variable and the bots support changes to that. So contrary to
what I think Tim is thinking, it's totally fine if the log message edit summary
reads "TimStarling renamed awesome article [[Foo]] to [[Bar]]" instead of
"moved [[Foo]] to [Bar]]".... as long as the i18n message that produces that is
still stored at the same message-key and replaces the variables with the
values.

Well, if we're going to maintain backwards compatibility, let's do it up, especially since we're also talking about using old messages.

I think a hybrid approach of patch #1 and patch #3 might be the way to go here. I was initially enthusiastic about a new hook, but I've come to realize we don't want to create a new hook in haste that we have to support in perpetuity. It's better to butcher the core a little bit in 1.19, and then potentially create a proper "external notification" interface down the road that would be implemented by a LegacyLogs extension.

I'm inclined to agree.

Patch #3 is a pretty direct "back compat hack" which is at least plainish to read.

Would love to have a cleaner extension interface for 1.20, where we can implement legacy IRC logging *and* saner machine interfaces like XMPP with structured data and no tiny line length limit.

This bug should be fixed by the set of revs merged into 1.19 as r112073. There may be individual cases that may not work, still, but the framework is in place now to fix any problems fairly quickly.

Thanks guys for doing the dirty job!

Progress: r112045 fixes action upload.

Looks like 'patrol' is also having a big problem. May not be related to this bug, but the line in IRC is horribly broken:

rc-pmtpa: [[Special:Log/patrol]] patrol * Tiptoety * marked <!-- ERROR -->revision 3486813 of <a href="/wiki/Meta:Requests_for_adminship/mmovchin_(temp)" title="Meta:Requests for adminship/mmovchin (temp)">Meta:Requests for adminship/mmovchin (temp)</a> patrolled $3

  • what's <!-- ERROR -->
  • It's inserting HTML instead of a [[wikilink]]
  • $3 is undefined

Still need to resolve the lingering issues, but bumping down from highest.

I have spend sometime today figuring out issue mentioned by Krinkle in comment 28. And I could not reproduce the issue, all the code is fine.

The patrol issue was fixed by Aaron with r112079 and merged to live with r112130.

So it looks like we can finally mark this bug as fixed.

The test suite I wrote today was added with r112415.

The EtherPad state that delete/restore does not work, the action is bound to 'undeletedmessage'. By looking at the bot source code and our hack in 1.19wmf1, I am not sure what might prevent it from running.

Trying to get some bot user to get a status about delete/restore and the newusers/create2 new users/autocreate actions as well.

Hashar and I did a little more research, in no particular order:

  • Changed 'delete/restore' from 'fails' to 'pending test', CVNBot doesn't monitor the restore event (apparently they want/need it in the countervandalism feed)
  • Of the events on the list, CVNBot does not monitor delete/restore, newusers/create2 and newusers/autocreate.
  • There are no more events on the etherpad list marked as "FAIL", only "Works" and "pending test" (create2, autocreate, restore)
  • It turns out 'autocreate' event doesn't get pushed to IRC at all (not to irc.wikimedia.org either), that's a separate bug, not related to this one
  • Since our unit tests as passing (including for delete/restore) and we can't find bots monitoring delete/restore to test, marking this bug as fixed.