Page MenuHomePhabricator

List recent User-Agents for a user or IP
Open, MediumPublicFeature

Description

Similar to the other options in the current CU interface, it would be good to have an option to list all User Agents for a User or IP that was used in the past 90 days. This will save some time and help the checkusers to perform better.

At times, we end up checking thru many shared IPs a user had used to find a different UA that can connect the two ids in question. It is possible that at a weak moment, we may fail to see the connection as we did not see both sharing that unique UA.

Details

Reference
bz24411

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:10 PM
bzimport added a project: CheckUser.
bzimport set Reference to bz24411.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 7813
Patch fixing the bug and cleaning up the code

attachment 24411.patch ignored as obsolete

In the patch I attached to the previous comment, I cleaned up the code (used more XML methods, etc), and fixed this bug.

The patch should be reviewed by someone with more knowledge on the indexes used in CU tables. I think the part that handles the index use is a little messy now.

The HTML stuff should be done alone as a patch against /trunk. Simple things like <td> tags should be left using raw string operations, but things with attributes, like <table>, should use HTML functions.

After that's applied, the UA patch can be put up.

matthew.britton wrote:

More privacy invasion, lovely. Did you ever consider actually mentioning in the Foundation privacy policy that you track everyone's user-agent?

(In reply to comment #4)

More privacy invasion, lovely. Did you ever consider actually mentioning in the
Foundation privacy policy that you track everyone's user-agent?

We already scan them to get access to the site.

matthew.britton wrote:

(In reply to comment #5)

(In reply to comment #4)

More privacy invasion, lovely. Did you ever consider actually mentioning in the
Foundation privacy policy that you track everyone's user-agent?

We already scan them to get access to the site.

Yep. Absolutely zero need to store them in order to do that, though -- storage of UAs is done solely to let checkusers loose on them. According to the privacy policy, you only store IP addresses (and you only use them to deal with abuse, rather than say to find out what editing tool someone is using, but hey at least *admitting* you store UAs would be a start).

Please read the checkuser extn documentation in mediawiki. I am not sure why you sound surprised with this. All checkusers should be using the tool per checkuser policy. if you have specific complaints about some one misusing it, raise it in that wiki. This is probably not the best place to argue about it.

(In reply to comment #6)

(In reply to comment #5)

(In reply to comment #4)

More privacy invasion, lovely. Did you ever consider actually mentioning in the
Foundation privacy policy that you track everyone's user-agent?

We already scan them to get access to the site.

Yep. Absolutely zero need to store them in order to do that, though -- storage
of UAs is done solely to let checkusers loose on them. According to the privacy
policy, you only store IP addresses (and you only use them to deal with abuse,
rather than say to find out what editing tool someone is using, but hey at
least *admitting* you store UAs would be a start).

Aaron,

I will open a new bug for the HTML stuff and fix it, then I'll create a new patch against the new revision, which only deals with the current bug specifically.

Thanks for the advice,

Huji

matthew.britton wrote:

(In reply to comment #7)

Please read the checkuser extn documentation in mediawiki. I am not sure why
you sound surprised with this. All checkusers should be using the tool per
checkuser policy. if you have specific complaints about some one misusing it,
raise it in that wiki. This is probably not the best place to argue about it.

I'm not surprised about it. I do however disagree with the subversion of the privacy policy that implementation of this bug would contribute to.

Like the privacy policy, the CheckUser policy page on Meta also makes no mention of the fact that user-agent strings are stored.

The extension documentation does not mention it directly, but user-agent data is visible in the example screenshots.

I'm pretty sure having the information buried in a technical documentation page on another website doesn't fulfill the requirement of the privacy policy to state what data is being retained and why.

(In reply to comment #8)

Aaron,

I will open a new bug for the HTML stuff and fix it, then I'll create a new
patch against the new revision, which only deals with the current bug
specifically.

Thanks for the advice,

Huji

Note that I changed the UI lately slightly. Update SVN first :)

Gurch > please discuss the policy issue on meta or on wikimedia-l . This bug report is just about implementing the feature in the MediaWiki software.

Aaron > once reviewed, can you commit the patch so it get a larger audience ? Thanks :)

r76949 uses XML instead of hard-coded HTML in the user interface.

I will send a separate patch as soon as I can, which adds the UA feature.

Created attachment 7829
Updated patch, in accordance to recent changes in the user interface

The new patch is based on the most recent version of repository (hence being compatible with the recent UI changes). It follows the newly introduced UI modification of not having two separate radio buttons for IPs and Users when the action is basically similar.

Before getting committed, the patch should be reviewed in terms of its use of DB indexes.

Attached:

Why do we need user-agents for an IP or IP range? "get users" already does that to an extent (and grouped by user).

The whole point of adding this feature is to facilitate finding a UA without having to go through a long list of possibly duplicated UAs. Assume an IP range includes a very large number of users, and many of them have overlapping user-agents. When someone is only looking for specific user-agents, the tedious task of manually going through all these records could change into a rapid check of the list of unique user-agents used by the whole range.

Yes, but why are they looking at all agents for an IP range?

To "rule out" someone with their user-agent of interest is actually editing from that range.

matthew.britton wrote:

(In reply to comment #16)

Yes, but why are they looking at all agents for an IP range?

because privacy invasion is fun!

Dear Gurch,

You really need to find a better place to express your thoughts. Right here, we are only talking about the technical aspects of a software extension. How this extension is used on specific websites (an example is English Wikipedia) is not relevant here.

Wikimedia admins might decide not to allow using specific features of this extension on their website, as other might want to. Wikimedia owners might also want to explain the usage of user information in their privacy disclaimers in the way they prefer. Same applies to other wikis on the web. If you have a problem with them, talk with them! Don't continue to use this software bug tracker to bug the developers instead; I personally find that very insulting.

Wish you luck,

Huji

Aaron,

Any comments? I'd rather be finished with one bug before proceeding to another one.

I have limited time to look at this. But, for a patch:
(i) All agent results should deal with duplicate agent strings and the time bounds for each (of the 5000 checked).
(ii) I would remove the "get agents for IP" or at least display the results per each account (or IP for non-logged in edits). Doing it on a range gives hard-to-use results.

(In reply to comment #19)

You really need to find a better place to express your thoughts. Right here, we
are only talking about the technical aspects of a software extension. How this
extension is used on specific websites (an example is English Wikipedia) is not
relevant here.

Wikimedia admins might decide not to allow using specific features of this
extension on their website, as other might want to. Wikimedia owners might also
want to explain the usage of user information in their privacy disclaimers in
the way they prefer. Same applies to other wikis on the web. If you have a
problem with them, talk with them! Don't continue to use this software bug
tracker to bug the developers instead; I personally find that very insulting.

Is this new feature going to be wrapped in a configuration variable? If so, what will the default be? There are legitimate questions to be raised about the Wikimedia privacy policy when development is being done to an extension already installed on Wikimedia's wikis.

(In reply to comment #21)

I have limited time to look at this. But, for a patch:
(i) All agent results should deal with duplicate agent strings and the time
bounds for each (of the 5000 checked).
(ii) I would remove the "get agents for IP" or at least display the results per
each account (or IP for non-logged in edits). Doing it on a range gives
hard-to-use results.

I'm not sure I can understand the first part; I agree with the second part.

(In reply to comment #22)

Is this new feature going to be wrapped in a configuration variable? If so,
what will the default be? There are legitimate questions to be raised about the
Wikimedia privacy policy when development is being done to an extension already
installed on Wikimedia's wikis.

I think it's going to end up as a "turned-on-by-default" feature, but Wikimedia people are going to be informed about this, so they could make sure it complies with the privacy policy (or otherwise turn this feature off).

Any progress here? Has anybody had the time to check the patch?

(In reply to comment #23)

(In reply to comment #21)

I have limited time to look at this. But, for a patch:
(i) All agent results should deal with duplicate agent strings and the time
bounds for each (of the 5000 checked).
(ii) I would remove the "get agents for IP" or at least display the results per
each account (or IP for non-logged in edits). Doing it on a range gives
hard-to-use results.

I'm not sure I can understand the first part; I agree with the second part.

Identical agent strings should be grouped together or consecutive ones collapsed or something.

sumanah wrote:

Huji, thanks for the patch. I'm marking it "reviewed" -- if you have time, please revise it to group together or collapse identical agent strings, as Aaron requested. Thanks!

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM

Tools on enwiki already kind of do this. A userscript I have installed groups the users in the results into a large table with all the user agents that user used. Although it duplicates over each user, this suggests that this task may be okay to be implemented. Perhaps a reverse view where users are grouped by User Agent. The information that is shown is already fully accessible if a full check is run.

While not a direct solution to this task T311378 could address the underlying issues as it combines the results into UAs used by each account / IP in the results. This may be more useful than just a list of User agent strings.

I'm thinking about marking this as resolved/declined. Reasons why I'm thinking doing this:

  • A script has been added that will group the user-agent strings for each user and IP shown in the 'Get edits' or 'Get users' results in T311378. This will make it easier to look at the UAs in 'Get edits', and resolves what seems to be the most major concern here. (This is the resolved part)
  • The new checktype would have to have either a similar or the same maximum LIMIT of results (thus you wouldn't see any more UA strings than you would see in 'Get users' or 'Get edits')
  • User-agent strings are becoming less useful now (T242825). As such I don't see much regular enough need to just see user-agent strings, because I'm not sure in the majority of cases just the user agent string would help determine socking. When this is combined with other information (such as IP and XFF data), this becomes more of a strong case to support (or not support) socking.

However, I don't oppose this task as stated. I'm just not seeing a big enough use for what it would implement to justify the extra maintenance cost. I'll likely post in the checkuser list before closing this.