Page MenuHomePhabricator

redirects to other wiki erroneously lead to CircularRedirect
Closed, ResolvedPublic

Description

python scripts/redirect.py double -family:wikisource -lang:en

Retrieving special page...

The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum <<<

ERROR: Page [[en:The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum]] is a circular redirect.
Skipping [[en:The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum]].

Page content is:
The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum
Redirect to:

la:The Army and Navy Hymnal/Catholic/Tantum Ergo

The Army and Navy Hymnal/Catholic/Tantum Ergo
Redirect to:

la:The Army and Navy Hymnal/Catholic/Tantum Ergo

The prefix la: is not considered, so the scripts is assuming it is on the en: site.


Version: core-(2.0)
Severity: minor
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=39492

Details

Reference
bz73184

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 4:00 AM
bzimport set Reference to bz73184.
bzimport added a subscriber: Unknown Object (????).

The problem is not that 'la:' is not considered but that the redirect target is not a page on that site. If you compare the request with "Main Page:English" (which is a redirect to "Main Page") you get:

http://en.wikisource.org/w/api.php?action=query&prop=pageprops&titles=Main%20Page:English&redirects=

While your example gets the following:

http://en.wikisource.org/w/api.php?action=query&prop=pageprops&titles=The%20Army%20and%20Navy%20Hymnal/Catholic/Tantum%20Ergo&redirects=

The problem there is if there is no 'pages' in the result it does treat that as a circular redirect. Problem is that a legitimate double redirect doesn't contain a 'pages' entry either:

http://test.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Double&redirects=

So I think the only way is to have the comparison more intelligent and only declare it a circular redirect if pywikibot does go over each redirect and finds a already seen page. Or use the link parser to detect if it's an interwiki link.

[[s:en:The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum]] is an interlang redirects.

And this is why interwikis are, by policy but not software, not allowed in redirects, and should be replaced with {{soft redirect}}. The mediawiki bug for this is bug 39492.

I've lower the priority because the script skips this page, and should skip this page - the output is wrong.

This problem dates back to the original 'core' function getredirtarget, line 547-8

http://git.wikimedia.org/blobdiff/pywikibot%2Fcore.git/852973b62c9d597db5fb0f6efd98b4aeafc0a303/pywikibot%2Fsite.py

if "pages" not in result['query']:
    # no "pages" element indicates a circular redirect
    raise pywikibot.CircularRedirect....

The result doesnt have pages, because en.ws cant return content that is on another wiki - la.ws.

{"query":{"redirects":[{"from":"The Army and Navy Hymnal/Catholic/Tantum ergo Sacramentum","to":"la:The Army and Navy Hymnal/Catholic/Tantum Ergo"}],"userinfo":{"id":10823,"name":"JVbot"}}}

There is a chance that other text in a #redirect might also cause the same error, but I doubt it.

Jayvdb, it ii not clear to me if this is a bug or a mistake of who created the redirect to another site.

If the first case, I have a fix that geets as redirect the page on the other site.
In the second case, we should close the bug.

@Mpaa, well mediawiki has limited support for interwiki links in redirects. It may even be a LocalSettings.php config variable to make interwiki links not automatically redirect.

Pywikibot needs to detect them, and .. either access them normally or raise an appropriate exception. InterwikiRedirectPage subclass of PageRelatedError.

I'd prefer that we consider it an invalid page and raise an exception, because I expect there is a lot of code which believes that a redirect target will always be on the same site as the redirect. The API cant 'follow redirects' if the redirect is an interwiki link.

Patch: Change-Id: Ia0d4dadf713fb97572c5d482485858331bda5ea8

Change 172176 had a related patch set uploaded by Mpaa:
Introduce InterwikiRedirectPage

https://gerrit.wikimedia.org/r/172176

Change 172176 merged by jenkins-bot:
Introduce InterwikiRedirectPage

https://gerrit.wikimedia.org/r/172176

jayvdb assigned this task to Mpaa.
jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).