Page MenuHomePhabricator

DBQ-189 firsttime contributors to DEWIKI in June
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-189.
Summary: firsttime contributors to DEWIKI in June
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>


From: Gregor Martynus <gregor@martynus.net>

Date: Mon, 02 Jul 2012 19:03:53

I'd need 3 things for a study of a friend of mine:

  1. users that manually signed up in 2012 and have not been blocked.
  2. All pages
  3. revisions by the users from 1.

As a Sidenode, could I have a SQL dump of a small wikipedia like simple.wikipedia.org so I can build and test the queries myself, so you don't have to? Tables: user (without sensitive columns of course), ipblocks, logging, revision, page.

Could anybody run these queries against DE wiki and send me the results? I think the SQL queries should look like this:

-- Users that manually signed up in 2012, flagged if they have been blockd

SELECT
  user_id,
  user_name,
  log_timestamp AS signed_up_at
  ipb_timestamp AS blocked_at
FROM
  user
LEFT JOIN logging  ON user_id = log_user
LEFT JOIN ipblocks ON user_id = ipb_user
WHERE type = 'new_users' 
  AND action = 'create'
  AND log_timestamp > "2012";

-- all pages
SELECT
  page_id,
  page_name,
  page_namespace,
  page_is_redirect,
FROM
  page;

-- revisions from 2012
SELECT
  rev_id,
  rev_page,
  rev_comment,
  rev_deleted,
  rev_user,
  rev_timestamp
FROM
  revision
WHERE
  rev_timestamp > "2012";




-- THANKS A LOT

Version: unspecified
Severity: major

Details

Reference
bz59472

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:33 AM
bzimport set Reference to bz59472.

From: Hoo man <hoo@online.de>

Date: Mon, 02 Jul 2012 19:25:43

Ok, I got a few questions regarding this ticket:
First of all, is it ok, if I exclude users which are blocked? There might be privacy issues with the names etc. Same goes for deleted revisions in 3.
Then for the third, is it ok, if it only includes the users from 1? As your query doesn't do this atm.

Finally, no I can't dump simplewiki_p (alone the revision database will have several GiB, I guess), I'll just write the queries myself, as they are rather trivial and that should be faster for me than dumping a wiki.


From: Gregor Martynus <gregor@martynus.net>

Date: Mon, 02 Jul 2012 19:56:44

thanks a lot for your quick response!

First of all, is it ok, if I exclude users which are blocked? There might be privacy issues with the names etc. Same goes for deleted revisions in 3.

Yes, sure, you can exclude users that are blocked and revisions that have been deleted.

Then for the third, is it ok, if it only includes the users from 1? As your query doesn't do this atm.

Absolutely, actually I'd prefer that, I just didn't want to make the query to complex.

Thanks a lot for your help!


From: Hoo man <hoo@online.de>

Date: Mon, 02 Jul 2012 23:06:43

As told, this was an easy one, so here we go:
SQL:

--1
SELECT user_id, user_name, user_registration FROM user INNER JOIN logging ON log_user = user_id WHERE LEFT(user_registration, 4) = 2012 AND user_id NOT IN (SELECT ipb_user FROM ipblocks) AND log_type = 'newusers' AND log_action = 'create';
--2
SELECT page_id, page_title, page_namespace, page_is_redirect FROM page;
--3
INSERT INTO u_hoo.dbq189 SELECT user_name FROM user INNER JOIN logging ON log_user = user_id WHERE LEFT(user_registration, 4) = 2012 AND user_id NOT IN (SELECT ipb_user FROM ipblocks) AND log_type = 'newusers' AND log_action = 'create';
SELECT rev_id, rev_page, rev_comment, rev_user_text, rev_user, rev_timestamp FROM revision INNER JOIN u_hoo.dbq189 ON rev_user_text = dbq189.user_name WHERE rev_deleted = 0 AND rev_user != 0;

Result:
http://toolserver.org/~hoo/dbq/dbq-189_0.txt (plain text)
http://toolserver.org/~hoo/dbq/dbq-189_1.txt (plain text)
http://toolserver.org/~hoo/dbq/dbq-189_2.txt (plain text)


From: Gregor Martynus <gregor@martynus.net>

Date: Tue, 03 Jul 2012 23:15:07

thanks Hoo man, you saved me so much work, really appreciate your help!!


From: Gregor Martynus <gregor@martynus.net>

Date: Wed, 04 Jul 2012 16:57:05

sorry to bother you once more,
but there seem data to be missing.

Example:

SELECT * FROM revisions WHERE rev_user = 1426164;

Returns 5 results. But when I open http://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/Johannes_Michels I see way more contributions by this user. Does that make any sense to you?


From: Hoo man <hoo@online.de>

Date: Wed, 04 Jul 2012 19:42:00

hoo@yarrow:~/public_html/dbq$ cat dbq-189_2.txt | grep -P '\t1426164\t' | wc -l

32

So the problem seems to be on your side, maybe you're not importing the data properly?


From: Gregor Martynus <gregor@martynus.net>

Date: Thu, 05 Jul 2012 00:02:40

you are right, sorry. I used a MAC app, I didn't get any errors, but data was indeed missing. Now I used mysqlimport and it works. Thanks again!


From: Gregor Martynus <gregor@martynus.net>

Date: Fri, 06 Jul 2012 00:00:44

Quick question regarding reverted revisions:

I asked initially for the rev_deleted column because I thought it would be the flag for reverted revisions, but I guess that's not the case, right?

Is there any chance to get an additional flag if a revision has been reverted? If yes, could you add it to the dbq-189_2.txt dump? It would really help a lot for my friend's study


From: Hoo man <hoo@online.de>

Date: Fri, 06 Jul 2012 00:04:14

Sadly, that's not possible, rev_deleted indeed stores whether the revision has been deleted and not if it has been reverted.


From: Gregor Martynus <gregor@martynus.net>

Date: Fri, 06 Jul 2012 00:07:10

Thanks for the quick response!

I'm just wondering: how can a revision be highlighted as reverted, when there is now such flag in the database? Is there a trick I could use myself to identify revisions that have been reverted?


From: Hoo man <hoo@online.de>

Date: Fri, 06 Jul 2012 00:11:25

The only way I see at the moment would be to get follow-up revisions and check whether they're reverts, getting those shouldn't be to complicated, but detecting reverts will be more hard. And the data gathered that way wont be fully reliable either, as there are various cases in those you wont be able to detect reverts that way.


From: Gregor Martynus <gregor@martynus.net>

Date: Fri, 06 Jul 2012 00:13:55

okay, thanks a lot for your instant help!


From: Gregor Martynus <gregor@martynus.net>

Date: Thu, 06 Sep 2012 23:22:31

May I ask you to run the same 3 queries from the comment above and provide me the results? Thanks for your help!

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de