Page MenuHomePhabricator

DBQ-3 List of English Wikipedia users by number of articles they started
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-3.
Summary: List of English Wikipedia users by number of articles they started
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Bryan Tong Minh <bryan@tools.wikimedia.de>


From: Bryan Tong Minh <bryan@tools.wikimedia.de>

Date: Wed, 31 Oct 2007 13:10:20

(see http://en.wikipedia.org/w/index.php?title=User_talk:Bryan&redirect=no#Heavy_duty_SQL_query_request)

I'd like to have a list of English Wikipedia users by number of articles they have started. Someone "started" an article if they make the first non-redirect edit to the article.


Version: unspecified
Severity: major

Details

Reference
bz59256

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:18 AM
bzimport set Reference to bz59256.

From: Daniel Kinzler <daniel@brightbyte.de>

Date: Wed, 31 Oct 2007 13:13:35

sadly, finding the first revision for a given page is rather expensive. to get what you want, this would have to be done for every page, and the result then grouped by user. I can think of an efficient way to do that.


From: Bryan Tong Minh <bryan@tools.wikimedia.de>

Date: Thu, 01 Nov 2007 22:16:34

I would come up with something like this:

SELECT lr_user, COUNT(lr_title) FROM (SELECT page_title AS lr_title, MIN(rev_id) AS lr_rev_id, rev_user_text AS lr_user FROM page, revision WHERE page_namespace = 0 AND rev_page = page_id GROUP BY rev_page) AS lowest_revision GROUP BY lr_user;

Unfortunately, that does not look like it will run very efficiently. Even an EXPLAIN takes ages.


From: Bryan Tong Minh <bryan@tools.wikimedia.de>

Date: Mon, 05 Nov 2007 09:20:49

Final query:

mysql -hsql-s1 enwiki_p -e "SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; SELECT lr_user, COUNT(lr_title) AS lr_creation_count FROM (SELECT page_title AS lr_title, MIN(rev_id) AS lr_rev_id, rev_user_text AS lr_user FROM page, revision WHERE page_namespace = 0 AND page_is_redirect = 0 AND rev_page = page_id GROUP BY rev_page) AS lowest_revision GROUP BY lr_user HAVING lr_creation_count > 5 ORDER BY lr_creation_count DESC;" >enwiki_page_creation.txt

Approximate runtime: 30h

Results: http://tools.wikimedia.de/~bryan/stats/dbquery/enwiki_page_creation.txt


From: Dutchresearch <mail@basvaneijk.nl>

Date: Tue, 03 Mar 2009 15:20:30

Hi i am new to the TS and i would like to have similar info.

Could anyone help me get results for

member name | total started articles per user | total edits on articles per user | registration date | Last active date

I will use it for my thesis on UGC which i will publish CC

i runned the SQL above on the sco.wiki took me 10s. ![][1]

but i want to adjust this SQL to get the info above

Bas

SELECT lr_user, COUNT(lr_title) AS lr_creation_count
FROM
(SELECT
page_title AS lr_title,
MIN(rev_id) AS lr_rev_id,
rev_user_text AS lr_user
FROM page, revision
WHERE page_namespace = 0
AND page_is_redirect = 0
AND rev_page = page_id
GROUP BY rev_page) AS lowest_revision
GROUP BY lr_user
HAVING lr_creation_count > 1
ORDER BY lr_creation_count DESC

[1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: Bryan.TongMinh@Gmail.com
CC list: Bryan.TongMinh@Gmail.com, daniel.kinzler@wikimedia.de