Page MenuHomePhabricator

EventLogging's country column are logging unwanted (sensitive) chunks of cookie
Closed, ResolvedPublic

Description

Column for country data in EventLogging tables sometimes not only
contain the country code, but also larger chunks of the clients
cookies. Sometimes even the sessionId.

The columns look for example like [1]

GeoIP%3D%3A%3A%3A%3Avx; mediaWiki.user.sessionId=<SESSION_ID_REMOVED>; GeoIP=

or

US%3A<CITY_REMOVED>%3A<LAT_REMOVED>%3A<LON_REMOVED>%3Av4; ve-beta-welcome-dialog=1; centralnotice_bucket=0-4.2; GeoIP=CH

(replaced potentially sensitive data by <..._REMOVED>).

Initial report is at

https://lists.wikimedia.org/mailman/private/analytics-internal/2014-June/001540.html

At least

NavigationTiming_7494934
NavigationTiming_8365252
MultimediaViewerNetworkPerformance_7917896

tables are affected, likely more tables. I'll run tests against all
tables containing 'country' in their column names.

[1] To see unredacted examples, run for example

SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE LENGTH(event_originCountry) > 2 LIMIT 20;

or

SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE event_originCountry LIKE '%session%' LIMIT 20;

against dbstore1002.


Version: unspecified
Severity: normal
Whiteboard: u=caistleitner@wikimedia.org c=EventLogging p=8 s=2014-06-12

Details

Reference
bz66478

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:18 AM
bzimport set Reference to bz66478.
bzimport added a subscriber: Unknown Object (MLST).

Change 138748 had a related patch set uploaded by QChris:
Avoid encoding issues by fetching GeoIP cookie through jquery.cookie

https://gerrit.wikimedia.org/r/138748

Change 138748 merged by Mwalker:
Avoid encoding issues by fetching GeoIP cookie through jquery.cookie

https://gerrit.wikimedia.org/r/138748

Change 139353 had a related patch set uploaded by QChris:
Ignore country values that are not two characters long

https://gerrit.wikimedia.org/r/139353

Change 139357 had a related patch set uploaded by QChris:
Reset GeoIP cookie upon encountering invalid country code

https://gerrit.wikimedia.org/r/139357

Change 139357 merged by jenkins-bot:
Reset GeoIP cookie upon encountering invalid country code

https://gerrit.wikimedia.org/r/139357

Change 139353 merged by Nuria:
Ignore country values that are not two characters long

https://gerrit.wikimedia.org/r/139353

Change 140023 had a related patch set uploaded by QChris:
Fixup country column names in post_validation_fixups

https://gerrit.wikimedia.org/r/140023

Change 140023 merged by jenkins-bot:
Fixup country column names in post_validation_fixups

https://gerrit.wikimedia.org/r/140023

Change 140061 had a related patch set uploaded by QChris:
Fix revision check for MultimediaViewerDuration in post validation fixup

https://gerrit.wikimedia.org/r/140061

Affected columns (currently) are

MultimediaViewerDuration_8318615.event_country
MultimediaViewerDuration_8572641.event_country
MultimediaViewerNetworkPerformance_7917896_1.event_country
MultimediaViewerNetworkPerformance_7917896.event_country
NavigationTiming_7494934.event_originCountry
NavigationTiming_8365252.event_originCountry

Of those, only MultimediaViewerDuration_8572641.event_country is still
getting affected rows. Once that is solved, I'll start cleaning up the
tables.

Change 140061 merged by jenkins-bot:
Fix revision check for MultimediaViewerDuration in post validation fixup

https://gerrit.wikimedia.org/r/140061

Since last Wednesday, Ops (RT: 7708) are running the cleanup scripts.

NavigationTiming_7494934 is cleaned up. Thanks Sean!

For the other 5 tables, Ops currently paused the script due to some
unrelated outages on the databases. But the scripts will resume
soonish.

I'll mark this resolved from our point of view. Once Ops finishes running the scripts, we just have to notify people the fix is complete.

The tables

MultimediaViewerDuration_8318615
MultimediaViewerDuration_8572641
MultimediaViewerNetworkPerformance_7917896
NavigationTiming_7494934
NavigationTiming_8365252

have been cleaned up. Thanks Sean!

MultimediaViewerNetworkPerformance_7917896_1

is still missing cleanup, but due to analytics thread at

http://lists.wikimedia.org/pipermail/analytics/2014-June/002233.html

we'll drop the table altogether.

Meanwhile

MultimediaViewerNetworkPerformance_7917896_1

has been dropped (thanks Andrew and Sean) for bug 66649, so all affected
database tables have either been scrubbed clean or dropped.