Page MenuHomePhabricator

DBQ-142 Query user contributions to gauge strength of inter-project and inter-language connections
Closed, DeclinedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-142.
Summary: Query user contributions to gauge strength of inter-project and inter-language connections
Issue type: Task - A task that needs to be done.
Priority: Trivial
Status: Done
Assignee: Hoo man <hoo@online.de>


From: alec conroy <alecmconroy@gmail.com>

Date: Wed, 15 Jun 2011 18:33:03

The 2011 board elections demonstrated a need to proactively engage the global community in the movement. Language barriers in particular played a role, as this was our first truly "global" election.

In the future, it would be helpful to know which languages and which projects are 'least connected' to each other, as these may warrant special attention to ensure their inclusion in the movement. One measure of this is how many of our users actively contribute to multiple projects of different languages.

That is-- for any given pair of languages, I would like to know how many of our active editors have edited a project in both those languages.

I am not directly familiar with the scheme of MW, but in words, I want:

a list of "Potentially Multilingual" active users--
So, a list of users (not bots),
who have been active (or 'touched') in the last 30 days (or whenever),
who have more than 500 edits each on at least two different projects
who have more than 500 edits each on at least two different projects not counting commons or meta.
who have 500 edits each on at least two project of a different language, not counting commons or meta.

For each such "Potentially Multilingual" user, I need to know their aggregate edit count on each individual project. So the output should be something like:

userid1 , enwp-editcount1 , dewp-editcount1 , zh-editcount1 , etc.
userid2 , enwp-editcount2 , dewp-editcount2 , zh-editcount2 , etc.
(editcounts of zero can be ommitted or included)

Basically, whittle it down for me as best you can on your end-- restricting to as narrow a group as is convenient for you to do. Once we get to the most narrow group-- "contributing to projects of different languages", that might be beyond what SQL can easily do. So, depending on your setup, just give m your best shot and I can whittle it further.

For my investigation, I don't need usernames or any other identifying info-- but I might point out that this same exact query with usernames included might be very useful to the Translation Committee in its mission to recruit translators.

Alec


Version: unspecified
Severity: trivial

Details

Reference
bz59399

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:28 AM
bzimport set Reference to bz59399.

From: Hoo man <hoo@online.de>

Date: Thu, 16 Jun 2011 13:48:48

That's hardly possible, cause the various wiki databases are on different database servers so they can't be joined. The only way to do that would be to export the data, load it onto a single server and work with it then, but that would take some time.
That way I could do the stuff with users with more than 500 edits, but not the thing with active users, that would produce to much (temporary) data I think.

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de, alecmconroy@gmail.com