Page MenuHomePhabricator

Echo notifications may display oversighted content
Closed, ResolvedPublic

Description

As reported by MBisanz:

On my Special:Notifications, it includes article titles of articles that link to pages I created, the subject lines of discussions that mention me or that are made on my talk page, and the usernames of all people who trigger notices. This is true even if the article is deleted, the discussion is renamed, or the user is renamed. This would seem to be bad from an oversight perspective, because if a discussion subject or article title contained private information, like Jenny at 867-5309 or ==Matt Bisanz from 34th Street in New York is being mean to me== and was properly oversighted, it would still appear in any number of people's Special:Notifications page, defeating the purpose of oversight. This would also be true if the action was done by User:MBisanz is at 867-5309 and the username was suppressed or forcibly renamed. MBisanz talk 19:31, 3 May 2013 (UTC)


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:36 AM
bzimport set Reference to bz48059.
bzimport changed Security from none to Software security bug.
Restricted Application changed the visibility from "Public (No Login Required)" to "acl*security (Project)". · View Herald TranscriptNov 22 2014, 1:36 AM
Restricted Application changed the edit policy from "All Users" to "acl*security (Project)". · View Herald Transcript

Not sure if this should be tagged as Echo or Security, but for the time being I prioritized it as high, because of its serious impact on Oversight, according to Philippe.

The description is not quite accurate. Any content that is actually changed in the database at the time the user sees the notification will also be changed in the web notification - section titles, article titles, usernames, etc. The web notifications themselves just contain ID pointers to the relevant information. There may be some types of information, such as edit summaries, however, that are not actually changed or removed in the database, but just flagged. These we would need to add checks for.

For email notifications, I'm not sure anything can be done as you can't unsend an email.

I tried to look up the documentation for how revision data is actually marked for suppression. Unfortunately, the header at the top of the documentation says "This section is largely, if not wholly, inaccurate." (https://www.mediawiki.org/wiki/Bitfields_for_rev_deleted#Bitfield_values_for_rev_deleted) Who would be the best person to get a technical rundown from on interpreting the bitfield values?

Actually, it looks like we are storing the page titles themselves rather than the IDs, so we'll probably want to change that.

Nevermind about the bitfield values, looks like I can just use isDeleted() and the Revision deletion constants in the Revision class :)

lwelling wrote:

Yes I think that the bug is accurate for page titles.

It would be one of a number of valid reasons to store page.page_id instead of titles.

It looks like titles are handled in the same way in watchlist.

Is there hook functionality that cleans up renamed pages in watchlists that could be duplicated?

Or if there is not, and this has not in practice generated issues with watchlists we should bear that in mind when deciding how urgently to tackle this.

lwelling wrote:

Watchlist has some code to address page moves, but it does not look like it was written to handle oversighted data, just user convenience.

If you move a page and suppress creating a redirect, then people watching the problem page get a new watchlist entry created to watch the new title, but they still have a watchlist entry for the old title too.

I think we ended up storing page titles so we could avoid doing expensive joins against the page table, and also so the notifications could be more self-contained to facilitate storing them outside the database (in Reddis for example).

Since watchlists and notifications are both private data, is it even a requirement that oversighting apply to them? Frankly it seems a bit pointless to make sure the data isn't exposed in the web notifications when they're going to get the information by email anyway.

(In reply to comment #8)

I think we ended up storing page titles so we could avoid doing expensive
joins
against the page table, and also so the notifications could be more
self-contained to facilitate storing them outside the database (in Reddis for
example).

Since watchlists and notifications are both private data, is it even a
requirement that oversighting apply to them? Frankly it seems a bit pointless
to make sure the data isn't exposed in the web notifications when they're
going
to get the information by email anyway.

My understanding from talking to Philippe is that this is very much a requirement despite the fact that it goes by email (if an only if they wanted that specific message by email).

Personally I think it's a bit insane not to do it. I read the message on the feedback page and immediately thought of a bunch of ways I could abuse that (and not in 'ways that I could do because I know the system' but methods of vandalism already done that would be made worse). Honestly the biggest issue is user names (a popular way to cause vandalism is to create a user name such as 'Jamesofur is a childmolesting retard!' or something like that) though it also could come in play with page edits (especially edit summaries), discussion subjects and page titles to name a few. Page titles are the 'less' problematic of that list because you have a couple hoops to jump through before you can create pages but they are certainly not impossible and are certainly used for vandalism.

Given the relatively large opening for abuse here I think we would be greatly remiss in not blocking whatever we can. We can't stop something once it's gone out of our system (emails) but my uneducated guess is that the most of the notifications will not be seen by email but will be web. Those web notifications are still in our power and we can, and should, control so that if something had to be removed from view it is 'really' removed from view and not floating around.

We have to remember WHO is seeing the messages. Yes, the messages are private (and not public) but who is getting a message from a vandal user name for example? Probably the person being targeted, so no we've actually defeated much of the purpose of the oversight in the first place because the victim STILL has to see this attack despite it being deleted.

That's especially true since, as far as I can tell, that notification will be sitting around (with the attack) for at least a length of time in the notifications history page (I'm not sure back that goes? But I assume at least a week or two, or a certain # of notifications?) and as far as I can tell there is no way to remove those notifications from the history. Now they get to see the attack for even longer :). We probably need to look at that as well, if something is deleted or suppressed it probably should be deleted or suppressed in the notification history as well or at least that notification removed.

(In reply to comment #7)

Watchlist has some code to address page moves, but it does not look like it
was
written to handle oversighted data, just user convenience.

If you move a page and suppress creating a redirect, then people watching the
problem page get a new watchlist entry created to watch the new title, but
they
still have a watchlist entry for the old title too.

While this is true (except if you actually do the move where you have to check the 'watchlist source and target page' or you don't won't watch the target page and you remove the source page' ) it is less connected with what happens here. The watchlist respects all suppressing/oversight as far as I can tell (I was pretty sure already so just did a couple tests). If we moved a page because of a bad page title for example we would suppress the log entry so that others can't see the bad title. If you do that the pieces you suppressed are struck out on the watchlist as well ( example http://prntscr.com/13h90b ) up to and including a full suppression of the entry removing it from 'normal' view and view on the watchlist. I assume this is because the watchlist is pulling it from live code and so knows if something should be removed or not.

Just in case it isn't clear because of conflicting terminology we use 'suppress' to mean oversight (or deletion from admin view) but it is in no way the same as 'suppressing a redirect' in regards to a page move. In a page move any admin (or bot) can move a page without creating a redirect from the source page, that is always expected and supposed to be a 'public' action and therefore isn't trying to hide anything at all. It's just saying that there does not need to be a redirect on the source page (when the default is that you want to make old links keep working). Many users would indeed want to keep watching a deleted page (or a page where they suppressed a redirect being made) because they want to see if it was made again so that they could protect it.

Let's see what we can do with Revision::isDeleted() without changing our database schema. Hopefully this will cover most of the issue. We can revisit the page name issue again later (which I think will be a hairier problem to resolve).

I have reviewed Echo and started implementing one possible solution(described as EASY SOLUTION below). This is 75% of the way there but I am starting to have some concerns as I solved it. Due to these concerns I tried to draw up a document going over what is affected and possible solutions. I would appreciate feedback from other developers more familiar with oversight and mediawiki to comment on the concerns i address below:

Summary:

This document attempts to look at what parts of the current Echo notifications need to be adjusted to handle oversight. First we will
do a quick review of what all oversight can change. Then an overview of all the events in the Echo extension and what parts of the
oversight applys to them. Finally we will review a few possible solutions ranging from easy but low performance, to higher performance
but more complex. The reason for investigating a range of solutions is because oversighting is applied to only a small number of the
total revisions but must be checked for each and every item displayed. Various optimizations currently existing in Echo are invalidated
by this oversight.

Oversight needs to be able to redact:

Revision::DELETED_TEXT 
        revision content must not be shown when this is set. Echo does not display any revision content currently.
Revision::DELETED_COMMENT
        Revision summary must not be displayed when this is set.  Currently the NotificationFormatter hits the db with a primary key query
        to load revisions 1 at a time and uses $revision->getComment( Revision::FOR_THIS_USER, $user).  As such
        No summary is displayed when oversighted.  This is a possible performance issue due to an extra  query for every
        summary that needs to be displayed. This is a version of the performance problem commonly known as N+1
Revision::DELETED_USER
        The username (agent in many cases) needs to be redacted in a variety of cases throughout Echo.  
Revision::DELETED_RESTRICTED - ???
        Not sure what this is or how its used.  Users with the 'supressrevision' permission are excempt from this restriction.
Revision::SUPPRESSED_USER - ???
        Not sure what this is or how its used
Page rename without redirect: 
        If a page was renamed without a redirect to the new page then it was (most likely?) performed as a part of oversight.
        Anything in echo which refers to a page by its namespace+title will have to first make sure the page was not removed in this manner

QUICK AUDIT OF CURRENT EVENTS IN ECHO FOR OVERSIGHTED DATA:

welcome
        - nothing to suppress this is a static message sent only to the new user.
edit-user-talk
        - agent writing on the talk page may need to be suppressed, not currently done
        - revision summary may need to be suppressed, already handled within echo
        - formatters which need updating: agent
reverted
        - agent doing the revert may need to be suppressed, not currently done
        - page title may need to be suppressed if it has been renamed without a redirect, not currently done
        - formatters which need updating: agent, title, difflink(?)
page-linked
        - agent doing the linking may need to be suppressed, not currently done
        - link from page namespace/title may need to be updated to its new oversighted title, not currently done
        - link to page title string may need to be updated to its redirected page, not currently done
        - formatters which need updating: agent, title, link-from-page, 
mention
        - agent doing the mention may need to be suppressed, not currently done
        - the page title where you were mentioned ( a user talk page ) may need to be updated to 
          its redirected page, not currently done
        - The subject line of a mention is the section-title within the user talk page.  This should
          be redacted if the revision content was redacted?
        - formatters which need updating: agent, subject, title
user-rights
        - zomg, nothing to oversight!

REQUIRED FOR ALL SOLUTIONS:

Anything that is created by a particular revision *MUST* reference that revision from the event, as
        the oversight flags are stored in the revision. 
Currently I think page-linked is the only event created by a revision that does not also store that revision id.

EASY SOLUTION:

Load the revisions on demand in the formatter and use the appropriate methods on revision which take into account
        current users permissions regarding the display of information.
The title string stored in echo_event cannot be trusted, and should be replaced with the id of the title so we can
        find and display the correct page name on pages that have been renamed by oversight.
Downsides:
        Lots of extra queries for every notifications API call / Special:Notification render
        Oversight effects (guessing) <1% of all revisions, so its alot of query overhead that in most cases does nothing.
        Current optimizations like storing the link-from-namespace/title within the page-linked should be removed as the
                authoritative information must be pulled from the database

SLIGHTLY MORE COMPLEX:

Just like the easy solution, except:
Pre-process the events to be displayed on page and load all the revisions and titles in one query prior to running
        the formatter.

Downsides:
        This will complicate the formatter but is likely a reasonable middle ground.
        Performing a join between echo_event, revision, and title may be a very complex query against some of the largest
                tables in mediawiki.

LOWER OVERHEAD SOLUTION:

Keep echo events mostly as is, but move revid from inside the extra field to its own nullable foreign key on the event table so it can be queried
When oversight happens insert a job into the queue which will find all events related to the revision that was oversighted and
        adjust their content

Downsides:
        the echo_event table will grow to unfathomable sizes.  Running joins against it to find events that match a particular rev_id
                could be time consuming, but as it only happens once instead of every render of the notifications this is likely better than above.
        there will be a time gap (however long the job queue is, likely we could push up the priority on oversight jobs) between the 
                oversight reporting as complete and echo notifications being properly oversighted.
        jobs are not likely to fail, but its a possibility that we have to keep in mind.  If an oversighting job were to fail then 
                users will see things that they shouldn't have.

wow that came out horribly formatted, here is an attempt at a wiki page containing the above content, but hopefully easier to read:

https://office.wikimedia.org/wiki/User:EBernhardson_(WMF)/Echo-Oversight

tchay wrote:

Erik, you should probably move the relevant parts of that off Office and onto Wikitech or MediaWiki. :-)

(In reply to comment #14)

Erik, you should probably move the relevant parts of that off Office and onto
Wikitech or MediaWiki. :-)

As long as you're sure it should be public, of course. :-)

tchay wrote:

Hmm didn't realize the bug was private. :-D

Putting it on office would keep it private. I agree that the long description in this bug is a bit too much for a Bugzilla ticket.

In the meantime, here's a useful link about Oversight and how it works:
http://en.wikipedia.org/wiki/Wikipedia:Oversight

Also, I have oversight rights on en-wiki for testing purposes, so I can test out some of these features for you when you are ready for them.

Just as a note a good portion of that page is out of date now. All 'oversight' now is actually done through suppression (and just suppress it from administrators), Extension:Oversight is no longer in use and hasn't been since about 2010 at least on enWiki. Oversighters don't actually even have the 'hiderevision' right anymore (they have suppressrevision instead) though staff still does (but again does not and should not use).

James is right (and we REALLY gotta get around to making the culture change to stop saying "oversight" when we mean "suppress revisions").

Fabrice the right you were testing included suppression, not oversight... though we still call it oversight. :)

pb

Thanks for your good insights, James and Philippe.

We will start using the word 'suppression' going forward -- it also makes it easier for supporting other languages, like German or French, who also tend to rely on variations of that wording instead the 'oversight' wording.

I'm sorry that some of this page is out of date. Could someone give some guidance to Erik B. on how to revise his proposal to support your needs?

Thanks for your help in sorting this out in a timely manner. We know it's important to the community and legal team, and would like to resolve that issue soon ...

I think the EASY SOLUTION is acceptable for now, but I would definitely favor moving to the LOWER OVERHEAD SOLUTION for the long term.

A related issue to this that I would like to mention is that the documentation on suppression is pretty terrible. Right now we have:

  • in-code documentation and comments - Almost nothing. The main function Revision::isDeleted doesn't even have a function description, and there are no comments for the member variable that stores the state or any of the constants:

const DELETED_TEXT
const DELETED_COMMENT
const DELETED_USER
const DELETED_RESTRICTED
const SUPPRESSED_USER

Is there anyone who would be able to write some documentation on using suppression from a development point-of-view?

lwelling wrote:

Yes, if somebody (James Forester?) really familiar with it could update or at least strike the known wrong parts before Flow gets too far along it would be very helpful.

I suspect Flow will present more obvious oversight/suppression needs and it would be good to get them right from the start.

Created attachment 12301
First step for echo-oversight, updates db to new schema

Provides the first step of adding and populating a new database field for Echo oversight deployment. The new field i populated via a maintenance script and Event::loadFromRow will accept both new and old results. Everything will still run when the 2 now unused fields are later dropped from the db schema.

Attached:

Created attachment 12302
Second patch for oversight-echo, utilizes new db fields and respects oversight

Updates the echo formatters to output properly suppressed revisions.

Attached:

Would anyone object to posting this patch to gerrit (with a non-attention grabbing commit summary)? It's somewhat difficult to review and discuss otherwise. The security implications are relatively minor (compared to what is typically considered a 'security bug'.

tchay wrote:

I do not. A reference was mentioned in passing already in Echo's talk page on enwiki (not very detailed, but enough). So the cat is out of the bag. Plus, it's not that easy to exploit and it's a privacy issue.

Let's get this posted to gerrit and fixed. :-)

Patches are up at gerrit, anyone who can review:

First patch updates database schema to include new column, migrates old data into new column, and handles data stored both old and new way.
https://gerrit.wikimedia.org/r/#/c/63076/

Second patch respects suppressed revisions utilizing the adjusted database
https://gerrit.wikimedia.org/r/63572

Hi folks, we wanted to let you know that we have just deployed this feature on the English Wikipedia.

I was able to test it on test.wikipedia.org, where it worked as intended: it correctly hid the notification for a talk page message that was suppressed by my oversighter account.

But we would prefer if authorized users could test this feature on the English Wikipedia, and have invited Philippe and James to ask oversighters if it is working for them.

The way to test this is to create a new account with no special rights, then post a message on the talk page of that account from another account. The new account should get a talk page notification. Then log in from a third 'oversighter' account and delete/suppress that original talk message. When you go back to your new account, this notification should no longer appear in its flyout or archive page.

Please let us know if this works for you, and report any issues on this Bugzilla ticket. Thanks!

(In reply to comment #28)

The way to test this is to create a new account with no special rights, then
post a message on the talk page of that account from another account. The new
account should get a talk page notification. Then log in from a third
'oversighter' account and delete/suppress that original talk message. When
you go back to your new account, this notification should no longer appear
in its flyout or archive page.

Please let us know if this works for you, and report any issues on this
Bugzilla ticket. Thanks!

This appears to work from some superficial testing; could this ticket be moved out of the security component so that it can be seen by the OSers in question? Have copied Risker so she, at least, can see the question. :-)

This has been live approx 2 months without reported failures.

So why is this still a private bug then?

EBernhardson changed Security from Software security bug to None.Mar 5 2015, 6:09 PM
Jalexander changed the visibility from "Custom Policy" to "Public (No Login Required)".Mar 5 2015, 6:33 PM
Jalexander changed the edit policy from "Custom Policy" to "All Users".

Change 293029 had a related patch set uploaded (by Catrope):
Clean up and fix updateEchoSchemaForSuppression.php

https://gerrit.wikimedia.org/r/293029

Change 293039 had a related patch set uploaded (by Catrope):
Migrate and remove event_page_namespace and event_page_title

https://gerrit.wikimedia.org/r/293039

Change 293029 merged by jenkins-bot:
Clean up and fix updateEchoSchemaForSuppression.php

https://gerrit.wikimedia.org/r/293029

Change 293039 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Migrate and remove event_page_namespace and event_page_title

https://gerrit.wikimedia.org/r/293039