Page MenuHomePhabricator

View contributions / recentchanges for an IP range
Closed, DuplicatePublic

Description

Author: fennec

Description:
There should be a method of viewing contributions for all anonymous users within
a certain range of IP addresses, or at the very least a list of IPs in a certain
range which have contributed. This would be quite useful at tracking down
malicious contributions from users who have a dynamic IP within a certain range.

Optionally, this feature could be restricted to certain users (on Wikipedia,
perhaps sysops) in the interest of privacy and database performance.


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Special:Contributions/127.0.0.1

Details

Reference
bz1035

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:06 PM
bzimport set Reference to bz1035.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

Would it be sufficient to list the contributions of the last week?
Could reduce the DB load while preparing these reports.

fennec wrote:

(In reply to comment #0)

There should be a method of viewing contributions for all anonymous users within
a certain range of IP addresses, or at the very least a list of IPs in a certain
range which have contributed. This would be quite useful at tracking down
malicious contributions from users who have a dynamic IP within a certain range.

Optionally, this feature could be restricted to certain users (on Wikipedia,
perhaps sysops) in the interest of privacy and database performance.

(In reply to comment #1)

Would it be sufficient to list the contributions of the last week?

I don't know... sometimes you might find some vandalism which is older than a week,
and a simple way to check whether or not there are any similar contributions lurking
about. It'd be better than nothing, mind you, but...

shakes wrote:

The problem I see is that IP addresses are stored as x.x.x.x strings against
contributions. That doesn't make it easy to write a query that's going to
perform well for a range search, unless you limit it to be able to search:

x.*.*.*
x.y.*.*
x.y.z.*

and nothing finer grained. Would that be sufficient perhaps?

fennec wrote:

(In reply to comment #3)

x.y.*.*
x.y.z.*
and nothing finer grained. Would that be sufficient perhaps?

Actually, I think that would be quite effective with most inquiries, and much
better than "nothing".

shakes wrote:

OK then, I'll implement that. :)

(In reply to comment #6)

OK then, I'll implement that. :)

While you are at it, don't forget the following examples:

192.168.0.0/16

You can get some c code using netmask sources:
http://ftp.scarlet.be/pub/debian/pool/main/n/netmask/netmask_2.3.7.tar.gz
(c) Robert Stone <talby(at)debian(dot)org> GPL

krubokrubo wrote:

*ping* I'm voting for any form of this you can write.

x.y.*.* for the last week would be very helpful in spotting vandalism.

magnusrk+wiki wrote:

This isn't just an enhancement. When you ban an IP range with the block user page and a range like 12.64.96.0/24, the confirmation page
includes a link to the contributions of said user. This obviously leads to nothing, since there have been no contributions from that
literal user or IP. I suppose this could be changed to not give a link, but I would much rather see a.b.c.d/x giving results for the
appropriate IPs.

zigger wrote:

*** Bug 3404 has been marked as a duplicate of this bug. ***

Created attachment 2273
Patch that allows to view contribs of IP range

Here's what I can think of. Hope it isn't too time-consuming.

Attached:

It was: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/
SpecialContributions.php?r1=20104&r2=20143

by Tim with commnet "Revert r20075, causes SQL error."

The relevant revert was r21379 by Brion with comment "remove sssllloowwwwwww
range checks" --> Reopen

Maybe, it should be possible to enable this feature in LocalSettings.php? And
disable it by default on Wikimedia wikis, until someone invents a way to make it
faster?

robchur wrote:

*** Bug 10320 has been marked as a duplicate of this bug. ***

(In reply to comment #17)

Maybe, it should be possible to enable this feature in LocalSettings.php? And
disable it by default on Wikimedia wikis, until someone invents a way to make it
faster?

Such as if an IP hex was stored, which would allow for any CIDR too.

  • Bug 11887 has been marked as a duplicate of this bug. ***

Wiki.Melancholie wrote:

Shouldn't that be easy to fix, as this already works with the API of MediaWiki!?
See

ayg wrote:

Well, it's mainly a question of writing an interface for it, yes. Notice, though, that it can't be sorted sensibly (i.e., by date), by either the API or the usual interface, without some rethinking of how things are stored/indexed.

(In reply to comment #22)

Well, it's mainly a question of writing an interface for it, yes. Notice,
though, that it can't be sorted sensibly (i.e., by date), by either the API or
the usual interface, without some rethinking of how things are stored/indexed.

A simple index ain't gonna solve this: we need the index to have user_text first so we can use the LIKE, but timestamp has to come first if we wanna sort by date. With the current schema, it's simply impossible because we can't create an index with two fields first :D

A proper solution would probably involve putting the binary form of the IP address in a separate field.

ayg wrote:

(In reply to comment #23)

A simple index ain't gonna solve this: we need the index to have user_text
first so we can use the LIKE, but timestamp has to come first if we wanna sort
by date. With the current schema, it's simply impossible because we can't
create an index with two fields first :D

As I think I recall explaining to you, yes.

A proper solution would probably involve putting the binary form of the IP
address in a separate field.

There's no real proper solution, but a reasonably *fast* solution would probably involve having a table of (ip_range, timestamp, rev_id), with primary index (ip_range, timestamp). ip_range might actually represent two columns, say some number n with the IP address of the revision with the last n bits (or last n octets) masked off. Thus an edit by 123.45.67.89 to a page might insert a few rows into this table, like (1, '123.45.67.0', timestamp, rev_id) *and* (2, '123.45.0.0', timestamp, rev_id) *and (3, '123.0.0.0', timestamp, rev_id). Then a range search that falls along octet boundaries would be simple, and one that doesn't could be either prohibited outright or shoehorned in by doing a bunch of narrow searches and merging, or a broad search and filtering.

PostgreSQL has somewhat better tools to handle this. It at least wouldn't require the creation of another table, the indexes could be added to the revision table. But it's still three more indexes, which is kind of unreasonable for a not-horribly-important feature.

mike.lifeguard+bugs wrote:

Splarka has written some js to get range (and wildcard) contribs from the API - performance doesn't seem to be an issue for that. Is there some reason the same can't be implemented in core PHP?

ayg wrote:

I'm guessing that's not sorted in a sensible fashion. I.e., it's presumably sorted by the IP address, not by date. If it's sorted by date, it's probably querying a heck of a lot of rows. Just because something performs well enough for a JS hack to not be so horrible that the sysadmins track it down and delete it because it's causing things to explode, doesn't mean it performs well enough to be put in the core software.

mike.lifeguard+bugs wrote:

(In reply to comment #26)

I'm guessing that's not sorted in a sensible fashion. I.e., it's presumably
sorted by the IP address, not by date. If it's sorted by date, it's probably
querying a heck of a lot of rows. Just because something performs well enough
for a JS hack to not be so horrible that the sysadmins track it down and delete
it because it's causing things to explode, doesn't mean it performs well enough
to be put in the core software.

IIRC, it's by IP, yes. Not sure how big an issue that is...

ayg wrote:

If that would be acceptable, then sure, that could be added to the human interface. Something similar is already available in the API.

http://en.wikipedia.org/w/api.php?action=query&list=usercontribs&uclimit=50&ucuserprefix=217.123

That doesn't allow CIDR ranges, but you could filter it reasonably cheaply in most cases (I assume that Splarka's tool does that), so it would be acceptable to add.

Changed component to "RecentChanges"

jasonspiro4 wrote:

FYI, a tool at http://toolserver.org/~soxred93/rangecontribs/ already lets anyone search for contributions made by an arbitrary IP range or list of IPs. It's probably not obvious to all editors how to use it though: you must understand CIDR notation in order to use it.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

john wrote:

+reviewed

Was applied in core and revert later due to slowness

Now we need to support IPv6 as well, if this is ever going to be implemented.

We need this to work with the following features:

  1. It must show autherized users all links the Special:Contributions shows - including links to RevDel-ed edits, for example
  2. It must be available for Special:DeletedContributions - again, only for users autherized to see this page.
  3. It must be chronological in order, not sorted by IP address
  4. I think we do ned this for arbitrary blockable ranges, not just /16 and /24.

The first 2 features are impossible in a toolserver page (although http://toolserver.org/~soxred93/rangecontribs/ is inactive now anyway); and the latter are not supported by the old fix.

Does this feature need to be solved by MediaWiki Core, or can this very specific use case be provided by a specific web tool?

(In reply to Quim Gil from comment #36)

Does this feature need to be solved by MediaWiki Core, or can this very
specific use case be provided by a specific web tool?

It should be supported by core.

Diffusion added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:20 AM

Accidental clash. Known issue. Sorry for the noise.