Page MenuHomePhabricator

DBQ-195 Firsttime contributors to DEWIKI in 2012
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-195.
Summary: Firsttime contributors to DEWIKI in 2012
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>


From: Gregor Martynus <gregor@martynus.net>

Date: Thu, 06 Sep 2012 23:57:40

Can somebody please run the following 3 SQL queries and provide me the results?

--1
SELECT user_id, user_name, user_registration FROM user INNER JOIN logging ON log_user = user_id WHERE LEFT(user_registration, 4) = 2012 AND user_id NOT IN (SELECT ipb_user FROM ipblocks) AND log_type = 'newusers' AND log_action = 'create';
--2
SELECT page_id, page_title, page_namespace, page_is_redirect FROM page;
--3
INSERT INTO u_hoo.dbq189 SELECT user_name FROM user INNER JOIN logging ON log_user = user_id WHERE LEFT(user_registration, 4) = 2012 AND user_id NOT IN (SELECT ipb_user FROM ipblocks) AND log_type = 'newusers' AND log_action = 'create';
SELECT rev_id, rev_page, rev_comment, rev_user_text, rev_user, rev_timestamp FROM revision INNER JOIN u_hoo.dbq189 ON rev_user_text = dbq189.user_name WHERE rev_deleted = 0 AND rev_user != 0;

These queries are put together by user hoo in this ticket:
https://jira.toolserver.org/browse/DBQ-189

Thanks for your help


Version: unspecified
Severity: major

Details

Reference
bz59478

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:33 AM
bzimport set Reference to bz59478.

From: Hoo man <hoo@online.de>

Date: Sun, 09 Sep 2012 12:27:01

Took the SQL queries from above and re-run them, only changing the DB name for the temp. table.

Results:
http://toolserver.org/~hoo/dbq/dbq-195_0.txt (plain text)
http://toolserver.org/~hoo/dbq/dbq-195_1.txt (plain text)
http://toolserver.org/~hoo/dbq/dbq-195_2.txt (plain text)


From: Gregor Martynus <gregor@martynus.net>

Date: Sun, 09 Sep 2012 19:24:47

Thanks again hoo for your help!

I've a question to the page query results. I can't the user discussion page of user "LKD" (http://de.wikipedia.org/wiki/Benutzer_Diskussion:LKD). It's not a big problem, I'd just like to understand the reason why the page is there to avoid problems in the future. The user page exists (page_id 1039890), but not the discussion page. Here's the query I use to find the user discussion page:

SELECT * FROM page WHERE page_namespace = 3 AND page_title = 'LKD';

Any idea?


From: Gregor Martynus <gregor@martynus.net>

Date: Sun, 09 Sep 2012 20:46:05

Sorry, another question, regarding user "Gertio". When I look at the User Contributions page, I see that he contributed since 2011:
http://de.wikipedia.org/w/index.php?limit=50&tagfilter=&title=Spezial%3ABeitr%C3%A4ge&contribs=user&target=Gertio&namespace=&tagfilter=&year=&month=-1

But I don't see any of these revisions in your dumps. I also cannot find the page "Zehn_von_Renesse" (http://de.wikipedia.org/wiki/Zehn_von_Renesse) in the dump.

Is it possible that the user has been active in the Dutch Wikipedia and then has been migrated to DEWIKI? Is there any possibility that we could exclude these kind of accounts? Thanks again for your help!


From: Gregor Martynus <gregor@martynus.net>

Date: Sun, 09 Sep 2012 23:34:28

I found out that 34990 pages are referenced in revision dump but do not exist in the page dump, here's the list of the missing IDs:
https://dl.dropbox.com/u/732913/missing_pages.txt

Is there a way to get the missing pages?


From: Hoo man <hoo@online.de>

Date: Mon, 01 Oct 2012 22:00:15

Sorry, for some reason I wasn't assigned on this one, so I didn't get any mails. In case the questions are still worth answering:

About the missing pages: I tried some at random and all of them were in the dump I gave you.

hoo@yarrow:~/public_html/dbq$ cat dbq-195_1.txt | grep '6659878'
6659878 Freie_Syrische_Armee 0 0

Same for the issue with user Gertio:

hoo@yarrow:~/public_html/dbq$ cat dbq-195_2.txt | grep 'Gertio' | wc -l
42

It seems like the data wasn't imported properly by you, as those rows seem to be missing.

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de