Page MenuHomePhabricator

VisualEditor: Set up an IRC bot in #mediawiki-visualeditor to report visualeditor-needcheck edits on WMF wikis
Closed, ResolvedPublic

Description

This will alert us to things going wrong quickly. Hopefully. (Yes, there's a risk of noise, but there should be 0 of these, anyway.)


Version: unspecified
Severity: enhancement

Details

Reference
bz62860

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:05 AM
bzimport set Reference to bz62860.

The CVNBot software used for #cvn-sw, #cvn-commons, #cvn-meta etc. can't provide this because of limitations in the source feed, irc.wikimedia.org, which doesn't expose revision change tags.

Looks like realistically the fast way to make this happen is probably a little nodejs project ran from tool labs that polls wikis using one of these approaches:

  1. Socket to irc.wikimedia.org, join all channels, filter for lines that look like edits/page creations (any line that includes a url with rcid), extract rcid from url, make API request and retrieve change tags.

Pros:

  • Only one socket for events.
  • No API polling.

Cons:

  • It's friggin IRC
  • Still requires an API request, looots of them (one for every edit/newpage across all of Wikimedia). Could be done in batches when implementing it with a short local delay/buffer before outputting it, but still a ton of requests.

1b) Alternative: Like #1a, but do the changetag retrieval via SQL query to labsdb instead of API request.

Pros:

  • Only one socket for events.
  • No API polling.
  • No API requests at all.

Cons:

  • dbreplag might cause problems.
  1. Have the app generate a list of API entry points for all Wikimedia wikis (using either operations/mediawiki-config data or using centralauth/sitematrix API), and poll all these periodically for action=recentchanges, taking care to ensure we don't miss edits (lower query is faster/cheaper, but means if there is more than limit N number of edits since the last query, you miss out).

Pros:

  • Edit information included in main event stream (ApiRecentChanges).

Cons:

  • API polling.
  • One API request for each wiki, at an interval.
  • Not missing events is going to be hard.
  1. Have the app fetch a list of wikis from labsdb.meta.wiki, open 1 connection for each db shard, and start polling recentchanges for each wiki (using WHERE query to find everything since the last poll, potentially LIMIT still to keep things rate limited)

Pros:

  • Edit information included in main event stream (recentchanges table).
  • Only a few sockets needed (7 or 8) to be able to query all 100s of wikis.
  • No API polling.
  • No API requests at all.

Cons:

  • Slight delay due to dbreplag to labs, but we don't use anything else so it's consistent should the app should be blind to it.

As I said in IRC, option 3 seems sanest to me, but that's not really my call as I'm not the one that'll write it. :-)

Should vepawns edits be tracked as well?

Went with approach #3, it's a python script run under the 'wm-ve-needcheck-reporter' service user in tool labs. Cron set up to run every day at 16:30 (UTC).

(In reply to Elitre from comment #3)

Should vepawns edits be tracked as well?

Perhaps we should have a separate bug for this - it sounds like a good idea to mark edits which add new pawn characters to the page as needcheck

@Alex, that filter doesn't seem to be catching anything, I have no idea if it's still functional or not, that's why I asked.

(In reply to Elitre from comment #7)

The tag vepawns, i.e. :
https://en.wikipedia.org/w/index.php?title=Special:
RecentChanges&tagfilter=vepawns .

That's not done by VisualEditor; these must be a local AbuseFilter written by the community…