Column for country data in EventLogging tables sometimes not only
contain the country code, but also larger chunks of the clients
cookies. Sometimes even the sessionId.
The columns look for example like [1]
GeoIP%3D%3A%3A%3A%3Avx; mediaWiki.user.sessionId=<SESSION_ID_REMOVED>; GeoIP=
or
US%3A<CITY_REMOVED>%3A<LAT_REMOVED>%3A<LON_REMOVED>%3Av4; ve-beta-welcome-dialog=1; centralnotice_bucket=0-4.2; GeoIP=CH
(replaced potentially sensitive data by <..._REMOVED>).
Initial report is at
https://lists.wikimedia.org/mailman/private/analytics-internal/2014-June/001540.html
At least
NavigationTiming_7494934 NavigationTiming_8365252 MultimediaViewerNetworkPerformance_7917896
tables are affected, likely more tables. I'll run tests against all
tables containing 'country' in their column names.
[1] To see unredacted examples, run for example
SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE LENGTH(event_originCountry) > 2 LIMIT 20;
or
SELECT event_originCountry FROM log.NavigationTiming_8365252 WHERE event_originCountry LIKE '%session%' LIMIT 20;
against dbstore1002.
Version: unspecified
Severity: normal
Whiteboard: u=caistleitner@wikimedia.org c=EventLogging p=8 s=2014-06-12