Page MenuHomePhabricator

Skin and active editor correlation
Closed, DeclinedPublic

Description

On English Wikipedia, among people who made 5 or more edits to the main namespace (namespace 0), what is the percentage using each skin?

Discussion started at http://lists.wikimedia.org/pipermail/analytics/2013-January/000362.html . Several people have helped out, but we still haven't hit a script that we're sure is correct.

Details

Reference
bz45179

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:40 AM
bzimport set Reference to bz45179.

I left one part out:

Five or more edits to the main namespace *in the last month*

The goal is to answer the question for active editors.

Can you add the latest version of the mysql query as an attachment?

Created attachment 11810
Last SQL script emailed by Oliver

Attached:

I posted the last version by Oliver, but it seems (though I'm not positive) that Dan's suggestion (http://lists.wikimedia.org/pipermail/analytics/2013-January/000380.html) regarding the left join is correct.

Yup; as said, even with a left join it comes up short.

The query that produces the answer to the question posed here:

select coalesce(up_value, 'default') as skin,
       count(*) as skin_users
  from (select user.user_id,
               count(*) as edits_in_last_30_days
          from recentchanges
                   inner join
               user            on user.user_id = recentchanges.rc_user
         where recentchanges.rc_namespace = 0
           and recentchanges.rc_type < 2
           and recentchanges.rc_user > 0
         group by user_id
        having edits_in_last_30_days >= 5
       ) as active_editors
           left join
       user_properties     on user_properties.up_user = active_editors.user_id
                          and user_properties.up_property = 'skin'
 group by skin;

My fumbling process to find this query, if anyone's interested:

https://gist.github.com/milimetric/5262726

And results:

/*
+-------------+------------+

skinskin_users

+-------------+------------+

2525
032
chick21
cologneblue104
default26582
modern329
monobook2810
myskin8
nostalgia15
simple21
standard98
vector74

+-------------+------------+
*/

You said at https://bugzilla.wikimedia.org/show_bug.cgi?id=44448#c27 that default is Monobook. Are you sure that's correct? If so, it means almost no active editors use Vector.

I think this query should map the values so it only shows rows for actual skins (no blank, blank row, 0), so I'm reopening for now.

Sorry, I meant to say "no default", not "no blank".

I think he may have misspoken; default is almost certainly Vector. And, actually when it comes to serious power users a substantial chunk are still on monobook, globally.

So you only want 'places where the user has actively set a preference', then?

I realize more power users use monobook than regular users do. However, if default were monobook, that would mean only 74 used Vector (unless Vector was the blank row too).

No, I don't think we should ignore users who have not set a preference. Maybe they like the default preferences (more likely the more editing they do), maybe they don't know there's a choice. Regardless, I'd just like to see stats on what skins active editors are actually using.

Also, 0 is apparently also default (again, Vector), so that should also be combined in. See https://www.mediawiki.org/wiki/Manual:$wgDefaultUserOptions .

Sure; see 'I'm pretty sure it's vector' :). Is this a query you expect to be consistently rerunning? (or, to put it in other words: is there value in Analysis and others spending time filling an already-completed query with CASE WHENs, as opposed to spending two minutes in LibreOffice Calc merging cells?)

Yeah, I saw where you said it was Vector.

I do expect people will want to re-run this (and run it on other wikis). That doesn't mean the query needs to be improved right now, though.

If you adapt that query into a maintenance script that doesn't try cleaning up values inside the SQL. You can use the skin methods to merge stuff together the way MW will do it.

  • Create an array to store counts in
  • For each of the rows returned by that query:
    • Run the 'skin' value through Skin::normalizeKey to get the real skin name to use as an array key.
    • In the counts array increment the value associated with the key matching the real skin name (starting from 0 if the key is not defined).
  • Now you'll have a counts array with normalized skin names.

It's worth noting that Chad is soon to run a maintenance script that scrubs old up_value possibilities from skins. This should simplify the output and make it more obviously correspond to actual skins rather than pointers to them.

(In reply to comment #14)

It's worth noting that Chad is soon to run a maintenance script that scrubs
old up_value possibilities from skins. This should simplify the output and make
it more obviously correspond to actual skins rather than pointers to them.

That'd be nice. Is this being tracked in Bugzilla somewhere?

[[Wikipedia:Database reports/User preferences#Skin]] has been messy since it was created. It'd be great to see it cleaned up. :-)

Not to my knowledge; it was one of those requests that came about via "this table is disgustingly inconsistent" "I'll fix it" "thanks!" conversations. I may be misremembering on the BZ front, however.

Removing the URL; this query is unrelated to those statistics (which are based on global usage, rather than local usage)

(In reply to comment #16)

Not to my knowledge; it was one of those requests that came about via "this
table is disgustingly inconsistent" "I'll fix it" "thanks!" conversations. I
may be misremembering on the BZ front, however.

Filed as bug 52778.

So should we close this bug and continue with 52778?

Well, this one is about showing readable results with the current database. Bug 52778 is about removing cruft from the database.

There's some overlap, but even after bug 52778 is fixed, at least one piece of canonicalization will be needed here. That is combining 'no preference set' and DefaultSkinName (currently 'Vector') into DefaultSkinName.

It might be useful to have one version showing 'no preference set' broken out separately, and another combining it as stated in my last comment. The latter shows the skins effectively being used.

(In reply to Matthew Flaschen from comment #1)

I left one part out:

Five or more edits to the main namespace *in the last month*

The goal is to answer the question for active editors.

Matt F did you mean to have this block 44448? or was that a mistype?

Yes, see https://bugzilla.wikimedia.org/show_bug.cgi?id=44448#c13 .

However, although this query could still be improved (i.e. just put the effective skin name, don't make readers of the data think about '', 'default', and '0'), the data above is enough to make it no longer a blocker.

Ironholds lowered the priority of this task from Medium to Lowest.Dec 5 2014, 7:47 PM
Ironholds edited projects, added Research; removed Analytics-General-or-Unknown.
Ironholds set Security to None.

Adding FINCH project as this ties to user setup/user fingerprint for segmentation.

That is this task. Surely you meant to link to something else.

@aripstra removing this from R&D but please look into this if it's of any relevance for design standardization.

Aklapper subscribed.

Closing task as declined as FINCH is dead and archived. See T76809#1807024.

Follow-up was accomplished in T147696 with clearer classification of skin users.