Page MenuHomePhabricator

CheckUser log indefinitely retains private information
Closed, DeclinedPublic

Description

Currently the log associated with the CheckUser extension stores its information indefinitely. This log contains private information in a number of ways. It may make sense to truncate this log after a certain amount of time.

This is somewhat related to bug 37573.


Version: unspecified
Severity: normal

Details

Reference
bz37626

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:25 AM
bzimport added a project: CheckUser.
bzimport set Reference to bz37626.
bzimport added a subscriber: Unknown Object (MLST).

That seems reasonable, but that amount of time should be at least one year - preferably two. It's important that it be possible to investigate complaints or impropriety in the use of CheckUser that might not have been suspected or noted previously.

Then again, it's clear that "indefinite" is much too long. I don't think anything beyond five years can be justified.

Err... You seem to be forgetting that the log's primary use isn't for dealing with complaints. One can look up every checks performed upon an IP or IP range; this is its main use. Helps a lot with long term vandals. 2 years is far too short in that respect.

The CheckUser wiki serves the need of retaining long-term data for persistent vandals; the logs really only should be used for auditing purposes. (For one thing, it's not always clear whether an IP check following a name check truly is related, and the absence of the result and context make using logs for anything else but auditing fraught with dangers).

Checkuser wiki helps once you've identified a long term vandal. But without logs, you lose any info on what that vandal has done in the past, until it's manually moved to the Checkuser wiki. So losing the logs would lose some of the historical data that's rich for predicting how they operate.

With that said, I think MZ's initial comment here is not off base. Personally, I'd be fine with truncating the data somehow, but I'm not the primary user of that tool: checkusers and stewards are. Particularly for smaller wikis, we should retain the logs for some time, but I don't think ther'es a need to retain them indefinitely.

I"m going to chat with LCA's lawyers and see where they fall on the question.

I agree with Félix M. and Philippe Beaudette here. The CheckUser log contains valuable information for us to check whether an IP range contained vandal accounts for a longer period of time. I'm against this change.

Also in agreement with Felix, Philippe, and Trijnstel. The ability to search past cases of abuse without truncation is invaluable.

(In reply to comment #4)

With that said, I think MZ's initial comment here is not off base. Personally,
I'd be fine with truncating the data somehow, but I'm not the primary user of
that tool: checkusers and stewards are. Particularly for smaller wikis, we
should retain the logs for some time, but I don't think ther'es a need to
retain them indefinitely.

Truncation is one option. Anonymization of the IP address information is another option. I think that's what places such as Google do. Or just removing the IP checks from the log altogether after a certain period of time, right? And just keeping the checks of usernames? Though... maybe truncation is best. I'm not sure much good comes from keeping this data around indefinitely.

I"m going to chat with LCA's lawyers and see where they fall on the question.

Any follow up on this?

(In reply to comment #7)

(In reply to comment #4)

With that said, I think MZ's initial comment here is not off base. Personally,
I'd be fine with truncating the data somehow, but I'm not the primary user of
that tool: checkusers and stewards are. Particularly for smaller wikis, we
should retain the logs for some time, but I don't think ther'es a need to
retain them indefinitely.

Truncation is one option. Anonymization of the IP address information is
another option. I think that's what places such as Google do. Or just removing
the IP checks from the log altogether after a certain period of time, right?
And just keeping the checks of usernames? Though... maybe truncation is best.
I'm not sure much good comes from keeping this data around indefinitely.

I"m going to chat with LCA's lawyers and see where they fall on the question.

Any follow up on this?

Again, strongly against this. We need to know the IPs - and there isn't much info left in the logs besides the IPs and accounts. If we lose these too we can't perform our checks well anymore. I really hope this isn't going to happen.

raxwp wrote:

I strongly agree to what Trijnstel and others wrote above: The long term CU-Logs are simply instruments to preserve Wikipedias quality and to protect users from vandals, to cut this instrument will make the work more difficult.

Apart from this - hey - by definition there are only very few users with access to the logs. These users are elected by the community or an arbcom as trusted to deal respectfully and cautious with the data they have access to - and they do so.

(excuse my broken english please)

There is clearly no consensus for this change (even Philippe agreed with that) and with no new comments since a year I close this as "RESOLVED WONTFIX".