Author: wikibugs
Description:
I imported the page, pagelinks and redirect tables from the 20120702 dump in order to analyze the pagelink structure. For this i need to resolve redirects and found some weirdness in the process:
There are some pages in the redirect table which don't have the page.page_is_redirect flag set:
select * from redirect join page on rd_from=page_id where page_is_redirect != 1 limit 100;
Results in 23 rows: http://p.defau.lt/?smGJYbXXlqwLPhreqbe8VA
Another weirdness: there are pages marked as page.page_is_redirect but don't appear in the redirect table, even though that might be explained here: http://www.mediawiki.org/wiki/Manual:Redirect_table (see the NOTE as of Aug 2007)
select * from page left join redirect on page_id=rd_from where page_is_redirect != 0 and rd_from is null limit 100;
Results in 5 rows: http://p.defau.lt/?pwJZMIVtTGGZYb69o8YWxg
Matthew_ from #wikimedia-tech ran these queries on the toolserver and seems the same problems exist in production.
Yet another weirdness can be found in the pagelinks table: there are ~25000 pagelinks for redirect pages which don't just link to the redirected page but to other pages alone in the article namespace:
select count(1) from page inner join redirect on page_id = rd_from inner join pagelinks on rd_from=pl_from and rd_namespace=pl_namespace where pl_namespace = 0 and rd_title != pl_title;
+----------+
count(1) |
+----------+
25168 |
+----------+
1 row in set (1 hour 44 min 35.16 sec)
Would be cool if someone could have a look.
Version: unspecified
Severity: normal