Page MenuHomePhabricator

Double-counting of 'hide' logging
Closed, DeclinedPublic

Description

There seems to be an excessive number of 'hide' events on the firstedit tour. It's possible there's somehow double-counting.


Version: master
Severity: normal

Details

Reference
bz55542

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:22 AM
bzimport set Reference to bz55542.
bzimport added a subscriber: Unknown Object (MLST).

swalling wrote:

This comes from examining the counts of event actions on the last tour step, for tour "firstedit". Like so:

SELECT COUNT(*),event_action FROM GuidedTour_5222838 WHERE event_tourname = "firstedit" AND wiki = "enwiki" AND timestamp >= 20131009000000 AND event_step = 4 GROUP BY event_action;

This produces the following results:

175 button-click
285 complete
442 hide
286 impression

Some other steps in the tour also produce this discrepancy, for example the step 2 results:

173 button-click
345 hide
242 impression

content hidden as private in Bugzilla

We do have some users who have more hide events recorded than impressions.
(Note that user IDs have been censored for privacy.)

SELECT event_userId,

MIN(timestamp) AS first_event,
SUM(event_action = "hide") as hides,
SUM(event_action = "impression") as impressions
FROM GuidedTour_5222838
WHERE timestamp > "20131009"
AND wiki = "enwiki"
GROUP BY event_userId
HAVING SUM(event_action = "hide") > SUM(event_action = "impression") LIMIT 10;
+--------------+----------------+-------+-------------+

event_userIdfirst_eventhidesimpressions

+--------------+----------------+-------+-------------+

<snip>2013101301412465
<snip>2013100900185110
<snip>2013101103372042
<snip>2013101116305432
<snip>2013101216332221
<snip>2013101306020643
<snip>2013101315114421
<snip>2013101508294110
<snip>2013101512043253
<snip>2013102314314821

+--------------+----------------+-------+-------------+
10 rows in set (2.87 sec)

I picked out a user with his first event was well after the "20131009"
cutoff.

SELECT timestamp, event_action, event_tourName

FROM GuidedTour_5222838
WHERE event_userId = <snip>
AND timestamp >= "20131009";
+----------------+--------------+-----------------------------+

timestampevent_actionevent_tourName

+----------------+--------------+-----------------------------+

20131013014124impressiongettingstartedtasktoolbarve
20131013014126hidegettingstartedtasktoolbarve
20131013014131impressiongettingstartedtasktoolbarve
20131013014133hidegettingstartedtasktoolbarve
20131013122419impressiongettingstartedtasktoolbarve
20131013122422hidegettingstartedtasktoolbarve<--
20131013122427hidegettingstartedtasktoolbarve<--
20131013122431impressiongettingstartedtasktoolbarve
20131013122433hidegettingstartedtasktoolbarve
20131013122502impressiongettingstartedtasktoolbarve
20131013122507hidegettingstartedtasktoolbarve

+----------------+--------------+-----------------------------+
11 rows in set (1.05 sec)

Note the two hide events occurring 5 seconds apart. I see this sort of pattern
when I look through other users too. We'll often have an "impression" followed
by one or more "hide"s that are separated by 5-10 seconds.

This doesn't explain the counts not matching, but I don't think https://git.wikimedia.org/blob/mediawiki%2fextensions%2fGuidedTour.git/HEAD/modules%2fext.guidedTour.lib.js#L230 should use guiders._lastCreatedGuiderID . I don't know why I didn't notice that before. I think it could lead to the wrong ID being used, especially with the preloading.

swalling wrote:

This is likely not relevant anymore, since we've switched to a new schema for this.