Page MenuHomePhabricator

RCFeed should include change tags
Open, MediumPublic

Description

Author: matthew.britton

Description:
Currently, the irc.wikimedia.org and RCStream feeds do not contain any information about change tags.

Given that

  • change tags can be (and are) applied after the change is made
  • feed lines are already too long sometimes
  • preserving backwards compatibility with existing clients is important

the act of tagging would need to be sent to the feed as a separate line, in contrast to Recent Changes where tags are displayed alongside changes.

All feed clients that I am aware of apply line-based pattern matching and ignore unrecognized lines, so this should not break anything.

See related T20080, which requests that the AbuseFilter extension includes details of filter hits. Since change tags are a core feature, it would be cleaner to include this in core, and not have to re-implement it for any other extensions that start using change tags.


This is mostly a task for RCFeed as emitted from MediaWiki core. This is not a task for RCStream or EventBus, those interfaces will naturally contain this as part of the core RCFeed format.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:57 PM
bzimport set Reference to bz22509.
bzimport added a subscriber: Unknown Object (MLST).

No point IMHO. I would direct efforts to bug 17450.

matthew.britton wrote:

(In reply to comment #1)

No point IMHO. I would direct efforts to bug 17450.

How would that resolve this? Implementing that feature request would change the UDP output format from one intended for IRC to one intended for XMPP, and send it to a Wikimedia XMPP server. You wouldn't get any additional information beyond what is already being sent to IRC.

matthew.britton wrote:

(In reply to comment #2)

(In reply to comment #1)

No point IMHO. I would direct efforts to bug 17450.

How would that resolve this? Implementing that feature request would change the
UDP output format from one intended for IRC to one intended for XMPP, and send
it to a Wikimedia XMPP server. You wouldn't get any additional information
beyond what is already being sent to IRC.

Added tags to XMLRC's query in r62453. That extension is still nowhere near finished, though. I've been waiting over a year for MediaWiki to have proper change tagging support for external tools and I really don't want to wait much longer.

I don't think the act of tagging an edit should be in RC. A log might make sense, but its questionable which user preformed the action of tagging an edit, since things like the abusefilter (who aren't users) often are the people tagging.

@Bawolff: I agree they don't need to be recentchanges events. However if we don't want to do that, we need to remove the ability to add tags at arbitrary points in the future.

Right now changes tags are added after the revision and rc entry are created. I don't think this was done to allow tags to associated with a revision afterwards. That could be an interesting feature in theory, but is a very different kind of system and we're not using it that way now.

Instead, this was done because amending the revision table is scary. It was implemented as a separate table, which meant it needed the revision ID as foreign key, which meant it had to created afterwards. However we can hide that implementation detail without making it a public interface by requiring it to be part of the Revision and/or RecentChange object and internally apply it after insertion in the database. Thus removing the ability in the software to insert them manually.

This would allow us to cary on that information to RCFeed implementations (along with other generated or provided context data not part of the recentchanges row, such as server name and user details etc.).

Our options:

  • Broadcast change tag updates as recentchanges events. This means an edit with tags emits two events instead of one.
  • Forbid creation of change tags afterwards and require them to be part of the Revision and RecentChange objects (but kept in a separate table internally).

Are there other ways to provide the information to RCFeed consumers?

Krinkle renamed this task from Change tags in RC feed to Change tags should be included in the RC feed.Jan 5 2015, 4:06 PM
Krinkle raised the priority of this task from Low to Medium.
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Krinkle edited subscribers, added: ori; removed: Unknown Object (MLST).

Change 195872 had a related patch set uploaded (by TTO):
Output change tags in the machine-readable RCFeeds

https://gerrit.wikimedia.org/r/195872

Per discussion on https://gerrit.wikimedia.org/r/195872 I propose we solve this as follows. I'm seeking feedback from others (especially @Anomie) for how feasible this is and for possible issues with this approach.

  • Allow tags to be passed into the creation step of Revision and/or RecentChange objects. Perhaps similar to what we do with $patrolled in RecentChange::notifyEdit and notifyNew.
  • When tags are created at any other point in time, they always emit an RCFeed event. This includes all tags associated through the current way by adding right after creating the RC object in the same PHP process. Similar to tags added/removes later on through the API or SpecialTags (but without the RC and Log entries presumably).

Thus resulting in the end use case that RCFeed subscribers have a complete, accurate, and up to date reflection of which tags are associated with a revision – without querying the API.

I'm seeking feedback from others (especially @Anomie) for how feasible this is and for possible issues with this approach.

Seems generally ok to me, I can't think of any insurmountable issues.

  • Allow tags to be passed into the creation step of Revision and/or RecentChange objects. Perhaps similar to what we do with $patrolled in RecentChange::notifyEdit and notifyNew.

A potential difficulty of adding it to a Revision constructor is that we need to know the rc_id as well as the rev_id to properly add it to the change_tag and tag_summary tables. And we'd need to do the same for log entries, as those can be tagged too. But we could just have RecentChange update those tables once the rc_id is assigned, if we go that route.

Either one would be liable to require extra parameters to methods such as WikiPage::doEditContent() to pass the tags-to-be-added down. Not a blocker, certainly, but could get annoying when adding user-tagging to various other logged actions (e.g. page moves, deletions, undeletions, blocks, unblocks, and so on). If we do do that, BTW, let's also look into passing the 'bot' flag down into RecentChange::newLogEntry() instead of it looking at $wgRequest.

  • When tags are created at any other point in time, they always emit an RCFeed event. This includes all tags associated through the current way by adding right after creating the RC object in the same PHP process. Similar to tags added/removes later on through the API or SpecialTags (but without the RC and Log entries presumably).

We might wind up in a situation where a single action throws off several RCFeed events (although not all of these on a single request):

  • one for the action itself
  • one from user-specified tagging when performing the action, if it's not being done as in your first bullet for this action.
  • one from OAuth's 'RecentChange_save' hook, if the action was from an OAuth client
  • one from AbuseFilter's 'RecentChange_save' hook adding tags for hit filters
  • one from TorBlock's 'RecentChange_save' hook adding the 'tor' tag
  • one from MobileApp's 'RecentChange_save' hook, if the agent starts with "WikipediaApp/"
  • two possible from VE, one if the request includes 'veswitched' and one if it's API action=visualeditoredit
  • two possible from MobileFrontend's 'RecentChange_save' hook, although those could easily be combined into one event that adds both possible tags.
  • one from WikiLove, if it's API action=wikilove
  • one from ContentTranslation, if it's action=cxpublish
  • if Gerrit change 194458 gets merged, one or two from that if it doesn't get updated as in your first bullet
  • stuff like the 'hhvm' tag we used when testing that out
  • and I didn't even check non-WMF-deployed extensions for calls to ::addTags()

Most of these could use a "RecentChange_create" or "RecentChange_beforeSave" hook to somehow allow for injecting additional tags before the RC gets saved and published to the feed to avoid this. VE's, ContentTranslation's, and WikiLove's API actions would be more difficult, though, since they internally call API action=edit but probably don't want their tags to be considered user-supplied as would necessarily be the case if they just pass them in.

Or I suppose you might decide that the multiple RCFeed events are ok.

I'm not a fan of the multiple RCFeed events, but it's OK. It's semantically correct and unavoidable (because of the second bullet; users can add/remove them) so clients need to support it either way.

We should avoid adding tags after the fact as much as possible and move this into relevant constructors, factory methods and hooks as appropriate.

We could also, as an additional migration smoothening, somehow delay the RCFeed event until the WebRequest shutdown and buffer all tag additions until then. That would magically migrate 99% of cases without needing any code changes.

Not some super prefect solution, just something that could show abusefilter edit tags would suffice for now IMO.

Krinkle renamed this task from Change tags should be included in the RC feed to RCFeed should include change tags.Dec 16 2015, 11:28 PM

Ideally both tag itself and its appearance, maybe even its description?

Perhaps something like (tags taken from mediawiki.org):

"tags": [
	{
		"tag": "visualeditor",
		"appearance": "[[{{MediaWiki:visualeditor-descriptionpagelink}}|Visual edit]]", /* see comment below */
		"description": "Edit made using the [[{{MediaWiki:visualeditor-descriptionpagelink}}|visual editor]]" /* see comment below */
	}, {
		"tag": "HHVM",
		"appearance": "[[mw:Special:MyLanguage/HHVM/About|HHVM]]",
		"description": "Revisions made with the HipHop Virtual Machine enabled instead of the Zend PHP interpreter (expected to improve performance, tagged for debugging/analysis)"
	}
]

Comment: Decide, whether use the source wikitext of such messages or expanded templates & parsertags.

Tag content (such as appearance and description; as opposed to the tag ID) changes with time and has no place in the RCFeed. That information should be retrieved from the API and may be cached accordingly by a consumer.

Would a standalone stream of revision-tag-change events separate from recentchanges / revision-create help here? Something like:

{
   ...
   rev_id: 12345

   prior_state: {
       tags: [...]
   },
   tags: [ ... ]
}

To indicate that tags have been changed for a revision

Within MediaWiki itself, some changes are required to make this possible. Right now the way MediaWiki works is that first it creates the revision, then it creates the recent change event, and *then* it adds any change tags. This means that when the RecentChange object is sent to EventStreams, the change tags are not known yet. But a few milliseconds later, by the time any human is realistically looking at the diff or the history page, the tags are indeed displayed.

To make this possible, MediaWiki needs to be altered so that it will fetch the added tags after RecentChange event is created but before it is sent to EventStreams. In addition, all extensions that provide tags (such as AbuseFilter) will need to be altered so that they make use of this new program interface of providing tags together with the edit. Because right now the only way to add change tags for extensions is to wait until after the Revision and RecentChange logic is done. Once we add a way to do this earlier, extensions can be updated to make use of it, then, once all is done, we will have change tags in RecentChange event streams.

There is a work-in-progress patch from a while back:

Change 195872 had a related patch set uploaded (by TTO):
Output change tags in the machine-readable RCFeeds

https://gerrit.wikimedia.org/r/195872

@Anomie I'd like to get a plan of action around this:

User stories:

  • Consuming from RCFeed provides me with all tags that apply to a recent change.
    • There are events for the edit or page create, which MAY contain an initial set of tags.
    • There are events for when tags are added/removed after the the RC object is published. This is mainly intended for when another user explicitly adds a tag after the fact. Although we can also use this event type to publish tag changes that are added by extensions too late in the request when the RC object is already published to the feeds.
    • The same tag addition from the same user action MUST NOT be published both ways.

Implementation proposal:

  1. Add method RecentChange::addTags(). This allows adding of tags without being concerned about how or when database insertion happens without needing to know about the rc_id dependency.
  2. Add $tags parameter to static RecentChange creation methods (e.g. notifyEdit and notifyNew). Further simplifies interaction.
  3. Ensure external modifications of to change tags (e.g. via ChangeTags::addTags) get published to the RCFeed.
  4. Reduce noise and consumer complexity by making sure use of addTags and $tags parameters before RC save follows a path of publishing tags via RCFeed together with the original event (via a new tags property for RC events), not in a separate event afterwards.
  5. Migrate popular extensions to use RecentChange::addTags() or $tags parameters where possible.

Implementation proposal:

[...]

  1. Ensure external modifications of to change tags (e.g. via ChangeTags::addTags) get published to the RCFeed.

My patch from more than 8 years ago (https://gerrit.wikimedia.org/r/195872) implements this, but clearly needs a thorough rebase and refactor. Although I'm formally the patch "owner" in Gerrit, I'm no longer in a position to pursue it. I'd be grateful if somebody else would take it over.