Page MenuHomePhabricator

Page counts should not be incremented by pages viewed by bots
Closed, ResolvedPublic

Description

Author: corth

Description:
On our site we have a google appliance indexing our wiki that logs in as user "Google" which is a member of the Bot group. It would be nice if there were some way for us to tell MediaWiki not to count visits to pages by this user. Otherwise even unused pages are showing up as being hit thousands of times.

We'll likely write a patch for our own setup no matter what, but if something has been done already or there is a specific way we should implement this so we could submit this patch officially please let me know.


Version: unspecified
Severity: enhancement

Details

Reference
bz14044

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:11 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz14044.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

If you have a patch, please attach it here. I'll be willing to look at it and commit it if you test it and it works. For completeness, it might also be a good idea to exclude common known web crawlers by User-Agent.

leon wrote:

Fixed in SVN trunk, r34436.

This behavior would be inconsistent with everything else -- other bots (such as general search spiders) will still be hitting things without any such marker, and will be counted.

ayg wrote:

That's what I thought at first, but I reconsidered. It's at least closer to the "real" figure, so it's an improvement. As I indicated in comment #1, it would also be good to exclude known web crawlers. But if a bot is running (for whatever reason) that screen-scrapes every page once or more per day, say, that's obviously going to seriously reduce the usefulness of this count.

I do wonder about whether it's a good idea to fold this into the bot permission. It would probably be best not to make that a grab-bag of unrelated functionality; this is why we switched from group-based to permission-based controls to begin with. A separate permission seems better. Maybe we should rename the 'bot' permission to 'rc-hidable' or something.