Page MenuHomePhabricator

Unwatched recent changes page via rc_watched
Open, LowPublic

Description

Author: mk45654

Description:
Background: Vandalism is a problem on all pages, but it is a much larger problem on pages that, on average, have fewer or zero people watching them. One solution to this problem is Special:UnwatchedPages, which was requested by Jimbo, is accessible only to admins, is updated infrequently, and is therefore very difficult (impossible) to clean out.

This problem could be handled by adding an rc_watched column to the recentchanges table to store a watcher-count, based on the watchlist table at the time of the revision, and counting only editors that have been active in the past N (7) days. An alternative to schema modification is to use rc_type to store this information.

This would allow a "recent changes in unwatched articles" page. This page would catch vandalism that would usually have gone unnoticed for long periods of time, and encourage editors to take these pages into their watch-lists. It would eliminate the need for Special:UnwatchedPages. Vandalism of pages on this list would not be a problem, as such edits would be bumped to the top of the list and be twice as likely to be caught, and substantially more likely than at recentchanges due to lower volume.


See Also:

Details

Reference
bz18790

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:37 PM
bzimport set Reference to bz18790.
bzimport added a subscriber: Unknown Object (MLST).

happy.melon.wiki wrote:

Per bug11181#c11, no booleans --> bitfields (I only recently found that comment from Brion, using rc_type was originally my idea at enwiki VPR). The performance increase from having an rc_watched column is offset by having to *populate* an rc_watched column... which is harder on the servers, querying once for many pages for the Special:RecentChanges/unwatched, or querying many times for one page every time an edit is saved to the recentchanges table?

Doing the query on page save would likely be less load and if a schema change was done, it would be the only way; populating it on-demand would just be kind of odd and hard to do. If only 1 in 5 edits is to an unwatched page, you'd have to load the data for at least 250 to be able to hopefully generate a list of 50.

Counting only active editors would be possible, but it might add too much to the query to be usable.

mk45654 wrote:

Another option might be to add page_watchers to the page table itself, which would be changed whenever a user watches/unwatches. This might be much less useful, since it would miss articles that have been abandoned.

An optimization is to populate the field with active users only when the watchers count is under 20 or 100. If there are that many watchers we can assume that at least a few of them are active. This would cut down on the need to join to users, though it would be a strange or double query.

(In reply to comment #1)

Per bug11181#c11, no booleans --> bitfields (I only recently found that comment
from Brion, using rc_type was originally my idea at enwiki VPR). The
performance increase from having an rc_watched column is offset by having to
*populate* an rc_watched column... which is harder on the servers, querying
once for many pages for the Special:RecentChanges/unwatched, or querying many
times for one page every time an edit is saved to the recentchanges table?

The load for populating rc_watched on edit time shouldn't be too bad: determining whether a single page is watched is a very simple and very fast query.

mk45654 wrote:

(In reply to comment #4)

The load for populating rc_watched on edit time shouldn't be too bad:
determining whether a single page is watched is a very simple and very
fast query.

This would be more useful if we had more information than just a boolean 'has watchers'. Can it be made to count the number of watchers without much strain, and in particular, autoconfirmed watchers active in the past 7 (or 60) days?

Keep in mind that an rc_watched column will only be a page's watch status _at_ the time the action occurred. We won't go back and change old entries when they no longer are watched (or become watched).

Just something to keep in mind.

(In reply to comment #6)

Keep in mind that an rc_watched column will only be a page's watch status _at_
the time the action occurred. We won't go back and change old entries when they
no longer are watched (or become watched).

Just something to keep in mind.

I believe that would be the preferred scenario anyway: we want to know about changes that weren't being watched when they were made, not before or after.

mk45654 wrote:

At the enwiki VPR discussion for this feature (which, it should be noted, currently has 22 unanimous supporters), a couple of editors were concerned that making this feature immediately public might expose unwatched articles to vandalism, were the list subsequently shut down due to bugs. I don't think this is a problem - if even 1000 unwatched articles were edited during this time, editors would have little trouble watching all of them. But the concern deserves mention here. The discussion and straw poll:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)#Recent_unwatched_changes_straw_poll

(In reply to comment #7)

I believe that would be the preferred scenario anyway: we want to know about
changes that weren't being watched when they were made, not before or after.

This is my view also.

What steps would need to be taken to have this feature implemented? If someone were to try to create a patch, what files should they look at, what should they be aware of, and where should they begin? (Though this seems to be something for those familiar with the code, perhaps a few steps can be taken care of by others.)