Page MenuHomePhabricator

mediawiki.user: Anonymous users should not be identifiable cross sessions
Closed, ResolvedPublic

Description

There has already been some good discussion on the mailing list:
http://comments.gmane.org/gmane.science.linguistics.wikipedia.technical/65913

I consider this a serious issue because we are infringing on anonymous users' anonymity.

  1. Anonymous users are given a 1-year cookie which uniquely identifies them. After logging out and clearing all cookies from my browser, I visited en.wikipedia.org and received this cookie. Why would an anonymous user be given an identifying token?

mediaWiki.user.id=oDNtHcMSeGMSZyRehhuC7ypQRuPEGk3a; expires=Wed, 18 Dec 2013 18:25:38 GMT; path=/; domain=en.wikipedia.org

  1. Anonymous users are enrolled in clicktracking. I was surprised because the extension page at http://www.mediawiki.org/wiki/Extension:ClickTracking specifies that it affects "users", and I think it should very explicitly state that it affects "logged-in users and anonymous visitors" if that is really the intention.

clicktracking-session=0orJJTU79otWR6x1m8ykUAyasVpZJBn2x; path=/; domain=en.wikipedia.org

  1. Registered user's cookies are not cleared at logout. This seems like a pretty basic fix.

enwikiUserName=Adamw; expires=Sun, 16 Jun 2013 18:43:51 GMT; path=/; domain=en.wikipedia.org; Secure; HttpOnly


Version: 1.22.0
Severity: critical

Details

Reference
bz44327

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:38 AM
bzimport set Reference to bz44327.

I'll throw this at the pertinent researchers; to my knowledge, ClickTracking is completely deprecated, I'd note.

ClickTracking data is currently routed to /dev/null (literally); you can verify by looking at the value of $wgClickTrackingLog in CommonSettings.php.

Leaving cookie garbage in user's browsers is not cool, obviously, nor is collecting data destined for the waste basket. I'll take a look at what is still depending on ClickTracking. When I last checked AFTv5 had extensive dependencies on it, but Matthias Mullie has since removed it.

(In reply to comment #2)

I'll take a look at what is still depending on ClickTracking.

IIRC it's UserDailyContribs.

(In reply to comment #4)

IIRC it's UserDailyContribs.

I don't see any dependencies there. patches/UserDailyContribs.sql makes an incorrect comment about ClickTracking, seems to be a relic from before the code was split out of UsabilityInitiative.

Dependent on ClickTracking:

  • ArticleCreationWorkflow
  • ArticleFeedback
  • ArticleFeedbackv5
  • CustomUserSignup
  • E3Experiments
  • Vector extension
  • WikimediaMaintenance

This was from a grep and quick examination sort of thing. Many of these extensions were designed to have a soft dependency on ClickTracking, but accidentally make it a hard dependency.

(In reply to comment #6)

Dependent on ClickTracking:

  • ArticleCreationWorkflow
  • ArticleFeedback
  • ArticleFeedbackv5
  • CustomUserSignup
  • E3Experiments
  • Vector extension
  • WikimediaMaintenance

Related: "Bug 29858 - Remove all unrequired dependences from AFT"

FYI, I've informed the [[m:Ombudsman commission]] with an official complaint.
Not that there's someone to complain about or that people are not working towards a fix, but I think it's formally correct to have them keep an eye on this.

I'm confused. You don't think it's within their remit, you know it's being solved, you don't think they can correct for it, but you're still occupying their time? Don't get me wrong, privacy policy issues should be dealt with promptly - but given that this data isn't being used (indeed, I don't know if ClickTracking is even deployed) and the situation is being fixed, I'm not seeing what the actual issue is.

(In reply to comment #9)

I'm confused. You don't think it's within their remit, you know it's being
solved, you don't think they can correct for it, but you're still occupying
their time? Don't get me wrong, privacy policy issues should be dealt with
promptly - but given that this data isn't being used (indeed, I don't know if
ClickTracking is even deployed) and the situation is being fixed, I'm not
seeing what the actual issue is.

I think Nemo was basically informing ("cc'ing") the Ombudsman commission on the existence of this bug report, as it's relevant/of interest to the commission and its work. If the commission has a Bugzilla account, perhaps it could even be cc'd on this bug report.

That makes slightly more sense than the first reading of his comment, sure. I'm not sure if they do, or if that's something they want to set up.

(In reply to comment #9)

You don't think it's within their remit,

I said the opposite. It's formally their duty.

you know it's being
solved, you don't think they can correct for it,

I didn't say this either. I'm not qualified to say if there's a problem, how to fix it and if it's being fixed; we have a body responsible for doing so, therefore I want them to, very simple. This way, I can avoid caring about things that it's not my duty to care or do something about; actually, I'll now happily remove myself from cc and just rely on their answer.

but you're still occupying
their time? Don't get me wrong, privacy policy issues should be dealt with
promptly - but given that this data isn't being used (indeed, I don't know if
ClickTracking is even deployed) and the situation is being fixed, I'm not
seeing what the actual issue is.

No issue, indeed. Just normal administration. No idea why you're complaining.

To begin to answer Oliver's question, ClickTracking is enabled on en and de wikipedias, not on fr, nl, or it. Where is that damn matrix of Special:Version per project?

Although we are not storing the data, nor doing anything useful, we are still collecting by phoning home with the user's unique id, as cleartext.

(In reply to comment #9)

.... the situation is being fixed,

Nobody has committed resources, and it's a "nontrivial" problem. Even once we patch the hole, we will need a written policy, and ongoing automatic and manual verification to ensure continued compliance. Not it ;)

Perhaps it's just ignorance, but I'm not understanding why it's a nontrivial problem. Remove the extension; done. Nobody is using clicktracking actively and we advise people /not/ to use it because EventLogging is both nicer and less liable to randomly explode and void your data.

This issue is not specific to Clicktracking: the method setting a persistent token (mw.user.id()) is in core and can be called by any extension (including EventLogging) or gadget/user script. The use of a token for analytics purposes for past and existing projects (like AFT or E3 experiments) has been reviewed with Legal on an ad-hoc basis, but that's not a viable approach for the future.

WMF is now resuming work on the privacy policy. We started a conversation last week to determine whether WMF needs this data at all and under what conditions/restrictions it can or should be collected. I honestly don't know yet what the answer to this question is, at least until we've documented different data needs in the organization.

(In reply to comment #15)

Perhaps it's just ignorance, but I'm not understanding why it's a nontrivial
problem. Remove the extension; done.

My understanding (please correct me if I'm wrong) is that other extensions depend on it now, and that the dependency has not yet been removed.

AFAIK, ClickTracking is required by UserDailyContribs. It waits to get replaced by EventLogging (cf. bug 41238 comment 1).
For a list of wikis that ClickTracking is deployed on, see https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=blob;f=wmf-config/InitialiseSettings.php#l10141

There are two more alternative ways to solve the problem, which I had not considered earlier.

  1. Since there is other potentially identifying information transmitted from the client (user-agent, other IDs), perhaps there is no privacy concern at all, and the problem is only that we are being sloppy and rude.
  1. The functional reason to use something like js: mediaWiki.user.id() -- the real culprit behind ClickTracking's bad behavior -- is to correlate data between sessions. Where this only being used for UX feedback, we could back off data collection so that, for example, a user is guaranteed to participate in the study for only one consecutive intersession interval. This means that their cookie would reset after data is sampled, so there is nothing identifying to tie the 1st-2nd sessions with the 3rd and continuing...

I've implemented Comment #19, in https://gerrit.wikimedia.org/r/#/c/53195/

Feedback or alternative designs would be appreciated!

The idea is that mediaWiki.user.id() will return an ID with session expiry, unless the anonymous user has been randomly selected (with probability $wgStudyAnonymousPopulation) to be part of a study. The selected users will transmit the same ID during their next session only, then the intersession ID will be cleared. No anonymous user is tracked for three sessions consecutively.

Continuing the discussion from that Gerrit, it seems appropriate to go back to patch set 1 (making it a thin wrapper around getName() (username) or a session [deleted when browser closed] id).

If more elaborate/long-lived functionality is needed, I'm inclined to agree with Adam and Tyler: It should be done (still weighing the privacy issues) in an extension. There should be a reasonable transition period to make sure any one who needs the old functionality can use the extension (e.g. EventLogging). I see EventLogging currently reads mediaWiki.user.id (the cookie in question) in one place, but I'm not sure what if anything writes it (calls mw.user.id()). There is no call in EventLogging.

Relevant to part of the original report, ClickTracking was disabled entirely on all WMF wikis (7f54a47d33cf3ad59fadbf759b617ae9f865a351) a month ago.

Related URL: https://gerrit.wikimedia.org/r/62728 (Gerrit Change If2f096dadb639769d859e1596d84b3ad5775a01d)

Related URL: https://gerrit.wikimedia.org/r/62732 (Gerrit Change If2f096dadb639769d859e1596d84b3ad5775a01d)

Change-Id: If2f096dadb639769d859e1596d84b3ad5775a01d

Landed in master.
Backported to milestone 1.21.
Backported to stable 1.20.