There seems to be an excessive number of 'hide' events on the firstedit tour. It's possible there's somehow double-counting.
Version: master
Severity: normal
• Mattflaschen-WMF | |
Oct 10 2013, 12:28 AM |
There seems to be an excessive number of 'hide' events on the firstedit tour. It's possible there's somehow double-counting.
Version: master
Severity: normal
swalling wrote:
This comes from examining the counts of event actions on the last tour step, for tour "firstedit". Like so:
SELECT COUNT(*),event_action FROM GuidedTour_5222838 WHERE event_tourname = "firstedit" AND wiki = "enwiki" AND timestamp >= 20131009000000 AND event_step = 4 GROUP BY event_action;
This produces the following results:
175 button-click
285 complete
442 hide
286 impression
Some other steps in the tour also produce this discrepancy, for example the step 2 results:
173 button-click
345 hide
242 impression
We do have some users who have more hide events recorded than impressions.
(Note that user IDs have been censored for privacy.)
SELECT event_userId,
MIN(timestamp) AS first_event,
SUM(event_action = "hide") as hides,
SUM(event_action = "impression") as impressions
FROM GuidedTour_5222838
WHERE timestamp > "20131009"
AND wiki = "enwiki"
GROUP BY event_userId
HAVING SUM(event_action = "hide") > SUM(event_action = "impression") LIMIT 10;
+--------------+----------------+-------+-------------+
event_userId | first_event | hides | impressions |
+--------------+----------------+-------+-------------+
<snip> | 20131013014124 | 6 | 5 |
<snip> | 20131009001851 | 1 | 0 |
<snip> | 20131011033720 | 4 | 2 |
<snip> | 20131011163054 | 3 | 2 |
<snip> | 20131012163322 | 2 | 1 |
<snip> | 20131013060206 | 4 | 3 |
<snip> | 20131013151144 | 2 | 1 |
<snip> | 20131015082941 | 1 | 0 |
<snip> | 20131015120432 | 5 | 3 |
<snip> | 20131023143148 | 2 | 1 |
+--------------+----------------+-------+-------------+
10 rows in set (2.87 sec)
I picked out a user with his first event was well after the "20131009"
cutoff.
SELECT timestamp, event_action, event_tourName
FROM GuidedTour_5222838
WHERE event_userId = <snip>
AND timestamp >= "20131009";
+----------------+--------------+-----------------------------+
timestamp | event_action | event_tourName |
+----------------+--------------+-----------------------------+
20131013014124 | impression | gettingstartedtasktoolbarve | |
20131013014126 | hide | gettingstartedtasktoolbarve | |
20131013014131 | impression | gettingstartedtasktoolbarve | |
20131013014133 | hide | gettingstartedtasktoolbarve | |
20131013122419 | impression | gettingstartedtasktoolbarve | |
20131013122422 | hide | gettingstartedtasktoolbarve | <-- |
20131013122427 | hide | gettingstartedtasktoolbarve | <-- |
20131013122431 | impression | gettingstartedtasktoolbarve | |
20131013122433 | hide | gettingstartedtasktoolbarve | |
20131013122502 | impression | gettingstartedtasktoolbarve | |
20131013122507 | hide | gettingstartedtasktoolbarve | |
+----------------+--------------+-----------------------------+
11 rows in set (1.05 sec)
Note the two hide events occurring 5 seconds apart. I see this sort of pattern
when I look through other users too. We'll often have an "impression" followed
by one or more "hide"s that are separated by 5-10 seconds.
This doesn't explain the counts not matching, but I don't think https://git.wikimedia.org/blob/mediawiki%2fextensions%2fGuidedTour.git/HEAD/modules%2fext.guidedTour.lib.js#L230 should use guiders._lastCreatedGuiderID . I don't know why I didn't notice that before. I think it could lead to the wrong ID being used, especially with the preloading.
swalling wrote:
This is likely not relevant anymore, since we've switched to a new schema for this.