Page MenuHomePhabricator

Broken revisions with rev_page=0
Open, MediumPublic

Description

244 revisions on sourceswiki have rev_page=0, so they can't be loaded properly.

They date from 2004-2005, and at least some of them are current revisions on their pages (referenced by rev_latest).

Noted from http://wikisource.org/wiki/Wikisource:Scriptorium#database_did_not_find_the_text_of_a_page

http://wikisource.org/wiki/%E0%A4%8B%E0%A4%97%E0%A5%8D%E0%A4%B5%E0%A5%87%E0%A4%A6:_%E0%A4%B8%E0%A5%82%E0%A4%95%E0%A5%8D%E0%A4%A4%E0%A4%82_1.36
http://wikisource.org/wiki/%E0%A4%8B%E0%A4%97%E0%A5%8D%E0%A4%B5%E0%A5%87%E0%A4%A6:_%E0%A4%B8%E0%A5%82%E0%A4%95%E0%A5%8D%E0%A4%A4%E0%A4%82_1.38
http://wikisource.org/wiki/%E0%A4%8B%E0%A4%97%E0%A5%8D%E0%A4%B5%E0%A5%87%E0%A4%A6:_%E0%A4%B8%E0%A5%82%E0%A4%95%E0%A5%8D%E0%A4%A4%E0%A4%82_1.50

The above links are appearing in Uncategorized Pages, but when I go to them, it says like

The database did not find the text of a page that it should have found, named "ऋग्वेद: सूक्तं 1.50" .
This is usually caused by following an outdated diff or history link to a page that has been deleted.
If this is not the case, you may have found a bug in the software. Please report this to an administrator, making note of the URL.

hence reporting to you. Please suggest what is to be done. I have otherwise removed the problems of some > 400 Pages of Hindi/ Sanskrit that were listed in Uncategorized Pages yesterday.


Version: unspecified
Severity: normal
URL: http://wikisource.org/wiki/%E0%A4%8B%E0%A4%97%E0%A5%8D%E0%A4%B5%E0%A5%87%E0%A4%A6:_%E0%A4%B8%E0%A5%82%E0%A4%95%E0%A5%8D%E0%A4%A4%E0%A4%82_1.36

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:29 PM
bzimport set Reference to bz16674.
bzimport added a subscriber: Unknown Object (MLST).

Ouch, only 3 are current. It may be difficult to recover the rest cleanly. :(

mysql> select page_id,page_latest from page,revision where page_latest=rev_id and rev_page=0;
+---------+-------------+

page_idpage_latest

+---------+-------------+

43896133785
43898133789
43910133813

+---------+-------------+
3 rows in set (0.13 sec)

On sourceswiki, there are now 245 revisions with rev_page=0; the extra one has a date stamp of 20081627
On enws there are two revisions with rev_page=0, both with datestamp of 20031218
For comparison, on enwp there are 709 revisions with rev_page=0

There are 318 orphaned revisions on mediawikiwiki. This was the cause of bug 43625.

Looks like new revs are still getting assigned to page_id 0 somehow.

Bumping severity per comment 4

Within the past two years, sorted by latest timestamp:

max(rev_timestamp) - count(*) - dbname
20150314151620 - 1300 - knwiki
20150123102603 - 89 - orwiki
20150109170449 - 9 - enwikivoyage
20141109142220 - 36 - fawikivoyage
20141016100102 - 502 - maiwiki
20140923170000 - 5 - idwiktionary
20140609215946 - 403 - metawiki
20140430194021 - 290 - hewikivoyage
20140421205055 - 420 - itwikisource
20131222123349 - 32 - zhwikivoyage
20131124123227 - 99 - tewiki
20131030141235 - 412 - nowiki
20131013172657 - 2104 - dewiki
20130729220133 - 2470 - mlwiki
20130723065815 - 6 - enwiktionary
20130718033244 - 26 - viwikivoyage
20130506140648 - 128 - simplewiki
20130329211512 - 26 - testwiki
20130329012222 - 4 - glwiktionary
Nemo_bis subscribed.

Found while looking into T111605: many of those revisions have rev_user = 0 and a rev_user_text which is not an IP address, hence they were likely imported.

MariaDB [dewiki_p]> select count(rev_id) from revision where rev_page = 0;
+---------------+
| count(rev_id) |
+---------------+
|          2104 |
+---------------+
1 row in set (0.00 sec)

MariaDB [dewiki_p]> select count(rev_id) from revision where rev_page = 0 and rev_user = 0;
+---------------+
| count(rev_id) |
+---------------+
|           908 |
+---------------+
1 row in set (0.01 sec)

MariaDB [dewiki_p]> select count(rev_id) from revision where rev_page = 0 and rev_user = 0 AND rev_user_text RLIKE "[[:alpha:]]{3}";
+---------------+
| count(rev_id) |
+---------------+
|           440 |
+---------------+
1 row in set (0.01 sec)

Many of the revisions on knwiki with rev_page = 0 occurred when a user ran a simultaneous transwiki and upload import of the same pages. The transwiki import was of all revisions, while the upload import contained only the top revision. Several revisions during this import got imported with rev_page = 0. See the log. Obviously running two simultaneous imports of the same pages is not recommended.

On enwikivoyage, one of the affected pages was imported triply. See the log.

Indeed, I just reproduced this on testwiki (revisions 259876 and 259877) by clicking the submit button on Special:Import 3 times in a row.

A slightly different thing happened on orwiki. See the log. In this case, the import was working fine, until all of a sudden the pages started to be imported twice. I don't think we can blame the user for this.


I should note that the timestamps that @Krenair listed out are misleading. Since these edits are imported, the timestamp reflects the time the edit was made on the source wiki, not the time that they were imported with rev_page = 0.

As for fixing the problem on WMF wikis, I would favour just deleting these rows from the DB. There's no point trying to reconnect them up with the pages they belong to.

Change 262078 had a related patch set uploaded (by TTO):
Import: Try to stop revisions getting created with rev_page = 0

https://gerrit.wikimedia.org/r/262078

Change 262078 merged by jenkins-bot:
Import: Try to stop revisions getting created with rev_page = 0

https://gerrit.wikimedia.org/r/262078

As for fixing the problem on WMF wikis, I would favour just deleting these rows from the DB. There's no point trying to reconnect them up with the pages they belong to.

How many are they? Do we need a maintenance script and if so is there one already?

Change 263604 had a related patch set uploaded (by TTO):
Prevent revisions with rev_page = 0 from being inserted into the DB

https://gerrit.wikimedia.org/r/263604

As for fixing the problem on WMF wikis, I would favour just deleting these rows from the DB. There's no point trying to reconnect them up with the pages they belong to.

How many are they? Do we need a maintenance script and if so is there one already?

1legoktm@terbium:~$ cat rev_page_zero.txt | grep "=>" | grep -v "=> 0"
2acewiki: [COUNT(*)] => 1
3alswiki: [COUNT(*)] => 1
4arwiki: [COUNT(*)] => 16
5arwikiversity: [COUNT(*)] => 20
6arwiktionary: [COUNT(*)] => 67
7arzwiki: [COUNT(*)] => 158
8azbwiki: [COUNT(*)] => 164
9azwiki: [COUNT(*)] => 5
10barwiki: [COUNT(*)] => 4
11bawiki: [COUNT(*)] => 2
12betawikiversity: [COUNT(*)] => 36
13bewiki: [COUNT(*)] => 1
14bgwiki: [COUNT(*)] => 1
15bhwiki: [COUNT(*)] => 5
16bjnwiki: [COUNT(*)] => 2
17brwikimedia: [COUNT(*)] => 3
18bswiki: [COUNT(*)] => 1
19cawiki: [COUNT(*)] => 1
20chywiki: [COUNT(*)] => 1025
21commonswiki: [COUNT(*)] => 39
22cowikimedia: [COUNT(*)] => 1
23crwiki: [COUNT(*)] => 3
24cswiki: [COUNT(*)] => 8
25cswikisource: [COUNT(*)] => 75
26cywikisource: [COUNT(*)] => 7
27dewiki: [COUNT(*)] => 2105
28dewikiversity: [COUNT(*)] => 706
29dewikivoyage: [COUNT(*)] => 23
30dewiktionary: [COUNT(*)] => 1
31dsbwiki: [COUNT(*)] => 3
32elwiki: [COUNT(*)] => 255
33enwiki: [COUNT(*)] => 21
34enwikibooks: [COUNT(*)] => 3308
35enwikinews: [COUNT(*)] => 5
36enwikisource: [COUNT(*)] => 4
37enwikiversity: [COUNT(*)] => 64
38enwikivoyage: [COUNT(*)] => 9
39enwiktionary: [COUNT(*)] => 6
40eswiki: [COUNT(*)] => 31
41eswikibooks: [COUNT(*)] => 1
42eswikinews: [COUNT(*)] => 1
43eswikivoyage: [COUNT(*)] => 5
44etwiki: [COUNT(*)] => 1
45euwiki: [COUNT(*)] => 1
46extwiki: [COUNT(*)] => 100
47fawiki: [COUNT(*)] => 24
48fawikinews: [COUNT(*)] => 1
49fawikivoyage: [COUNT(*)] => 36
50fiwiki: [COUNT(*)] => 3
51foundationwiki: [COUNT(*)] => 24
52frrwiki: [COUNT(*)] => 225
53frwiki: [COUNT(*)] => 23
54frwikibooks: [COUNT(*)] => 45
55frwikinews: [COUNT(*)] => 1
56frwikiquote: [COUNT(*)] => 3
57frwikisource: [COUNT(*)] => 2
58frwiktionary: [COUNT(*)] => 184
59glwiktionary: [COUNT(*)] => 4
60gomwiki: [COUNT(*)] => 29
61guwiki: [COUNT(*)] => 1308
62hewiki: [COUNT(*)] => 1
63hewikiquote: [COUNT(*)] => 165
64hewikisource: [COUNT(*)] => 34
65hewikivoyage: [COUNT(*)] => 290
66hifwiki: [COUNT(*)] => 5
67hiwiki: [COUNT(*)] => 314
68huwiki: [COUNT(*)] => 16
69huwikinews: [COUNT(*)] => 2
70idwiki: [COUNT(*)] => 9
71idwiktionary: [COUNT(*)] => 5
72incubatorwiki: [COUNT(*)] => 787
73itwiki: [COUNT(*)] => 64
74itwikibooks: [COUNT(*)] => 2
75itwikisource: [COUNT(*)] => 420
76itwikiversity: [COUNT(*)] => 53
77itwikivoyage: [COUNT(*)] => 1673
78itwiktionary: [COUNT(*)] => 23
79jawiki: [COUNT(*)] => 9
80jawikiversity: [COUNT(*)] => 7
81kaawiki: [COUNT(*)] => 1
82kawiki: [COUNT(*)] => 1
83kbdwiki: [COUNT(*)] => 33
84kkwiki: [COUNT(*)] => 4
85kmwiki: [COUNT(*)] => 13
86knwiki: [COUNT(*)] => 1300
87knwikisource: [COUNT(*)] => 1
88koiwiki: [COUNT(*)] => 8
89kowikiversity: [COUNT(*)] => 962
90krcwiki: [COUNT(*)] => 1
91kshwiki: [COUNT(*)] => 1
92ladwiki: [COUNT(*)] => 1
93liwiktionary: [COUNT(*)] => 1
94ltgwiki: [COUNT(*)] => 29
95ltwiki: [COUNT(*)] => 1
96ltwiktionary: [COUNT(*)] => 1
97maiwiki: [COUNT(*)] => 516
98mdfwiki: [COUNT(*)] => 1
99mediawikiwiki: [COUNT(*)] => 320
100metawiki: [COUNT(*)] => 403
101mhrwiki: [COUNT(*)] => 7
102mkwiki: [COUNT(*)] => 1
103mlwiki: [COUNT(*)] => 2503
104mlwiktionary: [COUNT(*)] => 3
105mrjwiki: [COUNT(*)] => 1
106mrwiki: [COUNT(*)] => 1
107mwlwiki: [COUNT(*)] => 5
108mznwiki: [COUNT(*)] => 1766
109napwiki: [COUNT(*)] => 1
110nlwiki: [COUNT(*)] => 16
111nlwikisource: [COUNT(*)] => 2
112nlwiktionary: [COUNT(*)] => 1
113nowiki: [COUNT(*)] => 412
114nsowiki: [COUNT(*)] => 4
115nycwikimedia: [COUNT(*)] => 372
116officewiki: [COUNT(*)] => 4
117orwiki: [COUNT(*)] => 89
118outreachwiki: [COUNT(*)] => 200
119pflwiki: [COUNT(*)] => 2
120plwikibooks: [COUNT(*)] => 1
121plwikisource: [COUNT(*)] => 1
122pnbwiki: [COUNT(*)] => 16
123pnbwiktionary: [COUNT(*)] => 5
124pntwiki: [COUNT(*)] => 6
125ptwiki: [COUNT(*)] => 33
126ptwikinews: [COUNT(*)] => 1
127ptwikisource: [COUNT(*)] => 1
128ptwikiversity: [COUNT(*)] => 12
129quwiki: [COUNT(*)] => 1
130rowiki: [COUNT(*)] => 4
131ruewiki: [COUNT(*)] => 4
132ruwiki: [COUNT(*)] => 81
133ruwikimedia: [COUNT(*)] => 17
134ruwikisource: [COUNT(*)] => 3
135ruwikiversity: [COUNT(*)] => 5
136sahwiki: [COUNT(*)] => 5
137sawikiquote: [COUNT(*)] => 3
138sawiktionary: [COUNT(*)] => 17
139sewiki: [COUNT(*)] => 299
140sewikimedia: [COUNT(*)] => 1
141simplewiki: [COUNT(*)] => 128
142simplewiktionary: [COUNT(*)] => 10
143skwiki: [COUNT(*)] => 1
144skwikisource: [COUNT(*)] => 67
145slwiki: [COUNT(*)] => 1
146sourceswiki: [COUNT(*)] => 245
147sqwikinews: [COUNT(*)] => 5
148svwiki: [COUNT(*)] => 5
149svwikiversity: [COUNT(*)] => 11
150szlwiki: [COUNT(*)] => 2
151tawiki: [COUNT(*)] => 1
152test2wiki: [COUNT(*)] => 1
153testwiki: [COUNT(*)] => 28
154tetwiki: [COUNT(*)] => 13
155tewiki: [COUNT(*)] => 99
156thwiki: [COUNT(*)] => 2
157tlwiki: [COUNT(*)] => 2
158tpiwiki: [COUNT(*)] => 68
159trwiki: [COUNT(*)] => 7
160ukwiki: [COUNT(*)] => 2
161urwiki: [COUNT(*)] => 3
162vecwikisource: [COUNT(*)] => 276
163vepwiki: [COUNT(*)] => 5
164viwiki: [COUNT(*)] => 503
165viwikibooks: [COUNT(*)] => 106
166viwikivoyage: [COUNT(*)] => 26
167wikidatawiki: [COUNT(*)] => 3
168wikimania2014wiki: [COUNT(*)] => 129
169wuuwiki: [COUNT(*)] => 1
170yiwikisource: [COUNT(*)] => 1
171zhwikisource: [COUNT(*)] => 94
172zhwikivoyage: [COUNT(*)] => 32
173zhwiktionary: [COUNT(*)] => 1

We can use runBatchedQuery.php to delete the rows.

Please make sure I am near before running a script that deletes rows so we have proper backups/reversion process. 0:-)

Change 263604 merged by jenkins-bot:
Prevent revisions with rev_page = 0 from being inserted into the DB

https://gerrit.wikimedia.org/r/263604