Page MenuHomePhabricator

Add proper bucket field to aft_article_feedback table
Closed, ResolvedPublic

Description

With the start of AFTv5 Stage 2 we lost the ability to identify which experimental bucket a user is assigned to when a feedback record is created in the aft_article_feedback table. This information is now only available in the clicktracking logs. This is due to a confusion between "design" and "bucket" in the DB which I previously reported. These two notions were interchangeable in Stage 1 but are different in Stage 2 (since we're using only Option 1 as a design but we still bucket users in 3 different experimental groups). Because we don't make a distinction in the DB between a proper bucket id and the design id we cannot accurately analyze the volume and quality of feedback submitted by users in each of the 3 experimental conditions.

As a result, after discussing with Fabrice, Aaron and Oliver, I would like to make this request for a change to the aft_article_feedback table:

  1. we rename the current "af_bucket_id" field to "af_design_id". We keep using this field for storing the AFT design identifier (i.e. 1 from now on)
  1. we keep the af_link_id field as it is now. We use it to store the link that was actually clicked by the user (if any) in order to display a widget (i.e. 1, 5 or 0 from now on). All feedback submitted from the bottom-placed widget will have a value of 0 regardless of which bucket the user is assigned to.
  1. we introduce a *new* INT field called af_bucket_id to store the actual experimental bucket condition the user is assigned to. We can still use INT values (1,2,3) to represent the three OptionA, OptionE and OptionX buckets.

Version: unspecified
Severity: major

Details

Reference
bz35619

Related Objects

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 12:15 AM
bzimport set Reference to bz35619.

reha wrote:

  1. We'll need to call this af_form_id. It's already confusing enough that it's called "bucket" in the code and "form" in the url. The least we could do is call it "form" in the database as well, instead of introducing a third term.
  1. Ok.
  1. This needs to be a string, not an int, since its meaning will shift over time. I suggest we call it af_experiment (to avoid more overlap for the meaning of bucket) and use "1X", "1A", or "1B". Going forward this should always match what we're using for clicktracking.

For older rows in the database, I suggest:

  • Anything recorded before Mar 21 gets "1", "2", or "3", depending on the value of af_form_id.
  • Anything recorded after Mar 21 gets "1A", "1E", or "1X", depending on the value of af_link_id.

Alternatively, we could use "1?" instead of "1X", to indicate that we don't know what was displayed, only what they clicked on.

  1. agreed
  1. I don't have strong feelings on the type as long as we have a dedicated field with an index to filter on. I like the idea of matching the bucket IDs used in clicktracking, let's go for a string then.
  • Anything recorded before Mar 21 gets "1", "2", or "3", depending on the value of af_form_id.

agreed

Anything recorded after Mar 21 gets "1A", "1E", or "1X", depending on the

value of af_link_id.

Alternatively, we could use "1?" instead of "1X", to indicate that we don't

know what was displayed, only what they clicked on.

I like the latter (we really need to avoid confusion for the records collected with af_link_id = 0 so far).

af_link_id: 1 => af_bucket_id: 1A
af_link_id: 5 => af_bucket_id: 1E
af_link_id: 0 => af_bucket_id: 1?

reha wrote:

Committed and submitted for review:

https://gerrit.wikimedia.org/r/4030

Reassigning to Yoni for confirmation.