Page MenuHomePhabricator

unread count for watchlist changes is not available via API
Closed, ResolvedPublic

Description


Version: unspecified
Severity: normal

Details

Reference
bz65246

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:23 AM
bzimport set Reference to bz65246.

"unread count for watchlist changes" meaning what exactly? The number of entries in the user's watchlist with wl_notificationtimestamp set?

On that assumption, I'd lean towards meta=userinfo as the best place for it. I'd also want to see Sean Pringle comment on whether "SELECT COUNT(*) FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL" would blow up the database if someone has a watchlist with a million entries and whether it should be aimed at the watchlist group or the general group or some other group.

Let me try to clarify…

When you go to your watchlist which have changed since you've seen them in the last XX days are displayed in bold. Currently there is no obvious way to get a count of these bolded items. We're obviously tracking them somehow, even if it's just on the fly. This bug is to create/discover a logical way to at minimum get a count of these items (unread/unviewed watchlisted items with changes) and ideally we able to generate a list of these pages as well.

Ok, that matches my assumption. Thanks for clarifying.

Once we have queries that we're reasonably sure won't kill the database, adding it to the API should be easy.

Conceptually, getting the count is as simple as "SELECT COUNT(*) FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL" and getting the list of only the unread changes would be a matter of adding "AND wl_notificationtimestamp IS NOT NULL" to the existing queries in list=watchlist.[1] But if someone has a watchlist with 1000000 pages and only 10 of them are unseen, that might overload the database. So it might be that we need to add an index on (wl_user,wl_notificationtimestamp), or do things in a less perfect way (e.g. fetch the 500 lines that would actually be displayed on Special:Watchlist using the existing queries and count up which ones would be bold). Part of my problem is that the watchlist queries are already fairly awful by my rules-of-thumb so I can't tell if they're getting worse or not. And Sean knows much more about the database stuff than I do, which is why I suggest asking him.

[1]: Those queries look something like "SELECT ... FROM recentchanges INNER JOIN watchlist ON (wl_user = ? AND wl_namespace = rc_namespace AND wl_title = rc_title) LEFT JOIN page ON (rc_cur_id = page_id) WHERE rc_timestamp >= ? AND rc_timestamp <= ? AND wl_namespace IN (...) AND (rc_this_oldid=page_latest OR rc_type=3) [and maybe more filtering on rc_minor/rc_bot/rc_user/rc_patrolled/rc_type fields here] AND (rc_type!=3 OR (rc_deleted&9)!=9) ORDER BY rc_timestamp, rc_id LIMIT 5001

How its displayed can be resolved, but might also imply a cap on querying this, e.g. if we detect a slowdown on watchlists with greater than 1000, 10,000 100,000 unread (or total) items. We could also limit the query to changes within the last 30 day (or less)

We already have a couple of COUNT(*) queries around for watchlists, like countItems(), so I think this would be OK to consider as long as it stays a simple select without joins. The extra covering index would be necessary.

Using countItems() as the example; it accepts a db connection arg, so I suggest stick with that approach and use the api group connection for api-generated traffic.

Both watchlist and api groups are on specific slaves now, so if needs be we can do further tuning for them relatively easily, such as partitioning by wl_user.

To be safe we could also:

a) Limit by date as Jared said. Presumably wouldn't help with run-away bots generating large lists, though.

b) Cap the number of rows hit using a subquery, like:

SELECT COUNT(*) FROM (SELECT wl_user FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL LIMIT 1001);

If < 1000, count is accurate.

If > 1000, be vague: "more than..", "thousands", a magic constant, etc.

Looking through some of the complex joins used for watchlist display: adding wl_notificationtimestamp IS NOT NULL doesn't really affect the query plans. I say trial it.

Ok, I'll see about writing patches at some point soon. One to add the index, and then a followup to add the features to the API.

Limiting the count by newer than $wgRCMaxAge makes sense, since the watchlist can't show anything older than that anyway.

@Brad, is this something you're actively working on, wanted to know if its something we can plan on working for the next version of the compact personal bar…

(In reply to Jared Zimmerman (WMF) from comment #9)

@Brad, is this something you're actively working on

It wasn't, but I did it this afternoon. Patches to be uploaded momentarily.

(In reply to Brad Jorsch from comment #7)

Limiting the count by newer than $wgRCMaxAge makes sense, since the
watchlist can't show anything older than that anyway.

Now that I think about it again, it doesn't make sense. If you haven't checked your watchlist in 2 months, wl_notificationtimestamp for [[Barack Obama]] might be 2 months ago but it'll still show up in your watchlist as unread if it was edited today. To really put a date limit on it we'd need to join with recentchanges, which I doubt would be a net gain.

Change 137147 had a related patch set uploaded by Anomie:
API: Add show=unread to ApiQueryWatchlist

https://gerrit.wikimedia.org/r/137147

Change 137149 had a related patch set uploaded by Anomie:
Add wl_user_notificationtimestamp index

https://gerrit.wikimedia.org/r/137149

Change 137150 had a related patch set uploaded by Anomie:
API: Add prop=unreadcount to ApiQueryUserInfo

https://gerrit.wikimedia.org/r/137150

Change 137149 merged by jenkins-bot:
Add wl_user_notificationtimestamp index

https://gerrit.wikimedia.org/r/137149

Change 137150 merged by jenkins-bot:
API: Add prop=unreadcount to ApiQueryUserInfo

https://gerrit.wikimedia.org/r/137150

Change 137147 merged by jenkins-bot:
API: Add show=unread to ApiQueryWatchlist

https://gerrit.wikimedia.org/r/137147

All the changes have been merged now. This should be deployed to WMF wikis with 1.24wmf9, see https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the schedule.