Different teams have implemented ad-hoc solutions to introduce sampling in EventLogging in order to perform measurements of the usage of features where a sample provides sufficient data to answer a research question.
In some cases, sampling needs to be applied to all events (so that, for example, only 1 out of 1000 events is logged). In other cases, unique clients need to be sampled by setting a session token so that only data for clients included in the sample is collected.
This pattern is sufficiently common to justify the creation of a general purpose solution to the problem (the most recent request for sampled data is [1]). The desired sampling method and rate could be specified via a dedicated element of a JSON schema; by default no sampling would be applied.
[1] http://lists.wikimedia.org/pipermail/analytics/2014-May/002053.html
Version: unspecified
Severity: normal