Page MenuHomePhabricator

IWM: missing language in family causes exception in Page.langlinks
Closed, ResolvedPublic

Description

On en.wowwiki:

ERROR: testLinks (tests.page_tests.TestPageObject)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/page_tests.py", line 469, in testLinks
    for p in mainpage.langlinks():
  File "pywikibot/page.py", line 1189, in langlinks
    self._langlinks = list(self.iterlanglinks(include_obsolete=True))
  File "pywikibot/site.py", line 2987, in pagelanglinks
    source=self)
  File "pywikibot/page.py", line 4386, in langlinkUnsafe
    link._site = pywikibot.Site(lang, source.family.name)
  File "pywikibot/__init__.py", line 573, in Site
    _sites[key] = interface(code=code, fam=fam, user=user, sysop=sysop)
  File "pywikibot/site.py", line 1422, in __init__
    BaseSite.__init__(self, code, fam, user, sysop)
  File "pywikibot/site.py", line 451, in __init__
    % (self.__code, self.__family.name))
UnknownSite: Language nn does not exist in family wowwiki

Version: core-(2.0)
Severity: critical

Details

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:49 AM
bzimport set Reference to bz73534.
bzimport added a subscriber: Unknown Object (????).

On en.vikidia

ERROR: testLinks (tests.page_tests.TestPageObject)

Traceback (most recent call last):

File "tests/page_tests.py", line 469, in testLinks
  for p in mainpage.langlinks():
File "pywikibot/page.py", line 1185, in langlinks
  self._langlinks = list(self.iterlanglinks(include_obsolete=True))
File "pywikibot/site.py", line 2983, in pagelanglinks
  source=self)
File "pywikibot/page.py", line 4382, in langlinkUnsafe
  link._site = pywikibot.Site(lang, source.family.name)
File "pywikibot/__init__.py", line 573, in Site
  _sites[key] = interface(code=code, fam=fam, user=user, sysop=sysop)
File "pywikibot/site.py", line 1422, in __init__
  BaseSite.__init__(self, code, fam, user, sysop)
File "pywikibot/site.py", line 451, in __init__
  % (self.__code, self.__family.name))

UnknownSite: Language de does not exist in family vikidia

Well in the case of vikidia it assumes that the language code 'de' is a part of the family, but the Special:Interwiki page reveals that it's actually another wiki (http://grundschulwiki.zum.de/wiki/Hauptseite , although I don't know if they are affiliated): http://en.vikidia.org/wiki/Special:Interwiki

And the mainpage of the English Vikidia shows an interwiki link to 'de:': http://en.vikidia.org/w/index.php?title=Main_Page&action=edit

I'm not sure what the solution there is, because it seems like a method which analyses it without any prior checking. The most flexible way would be to use 'APISite.interwiki()' which would tell that there is no family for grundschulwiki.zum.de.

(In reply to Fabian from comment #2)

Well in the case of vikidia it assumes that the language code 'de' is a part
of the family, but the Special:Interwiki page reveals that it's actually
another wiki (http://grundschulwiki.zum.de/wiki/Hauptseite , although I
don't know if they are affiliated):
http://en.vikidia.org/wiki/Special:Interwiki

They are affiliated.

Wowwiki is the same: the 'family' is split across several domains - see our family file for wowwiki.

Okay I'm checking the source code about the support for 'from_url'. It currently supports all pages which uses the "<code>.url" scheme which vikidia is not following.

But otherwise it should be possible to just add an entry 'de' to the langs dictionary of vikidata.

It also appears that the wowwiki family also needs to be checked if they support 'from_url'.

Okay nevermind, from_url is flexible enough for that, so are there any caveats by simply adding the missing languages?

The problem is that adding missing languages isnt possible after the library is released into pypi. Options include:

  1. automatically find new subdomains in the Family layer (e.g. https://gerrit.wikimedia.org/r/#/c/171616/), or
  2. Load Site objects which are not in Family.langs (https://gerrit.wikimedia.org/r/#/c/170931/), or
  3. dont package family files with the library on pypi. They could be a separate package. (this sounds like it should be done anyway).

Well we (or pywikibot for that matter) can't know if a site is a part of a family. How could we know that the site linked with 'de' on en.vikidia is in the same family as vikidia itself.

We could assume that ISO language codes are part of the family, as those are by default shown on the sidbar. But apart from that we don't: e.g. test.wikidata which is in the wikidata family.

Assuming ISO language codes are part of the family would be quite a sophisticated strategy. That type of logic would be easily applied when creating the Link object; e.g. creating a Link object even if no Site object can be created.

A dumber version of that is for the family to register multiple domains / regexes in a class variables, and the family class then assumes any matching domain name is a member of the family, and create Site objects accordingly.

I expect we want to add a few classes to help us group types of families, and the functionality they contain. The most distinct type of family is the one with ISO codes for different languages of the project. MutlilangFamily / ISOLangFamily ?
those families usually include a non-ISO-code project, e.g. meta.anarchopedia.org, beta.wikiversity.org, and www.wikisource.org , however the last two could be/should be given the 'mul' ISO language code, and treated differently of course.
mul.wikiversity.org (doesnt work) and mul.wikisource.org (redirecter only). bug 41807/ bug 62717/ etc.

Xqt triaged this task as High priority.May 28 2017, 11:56 AM

Well in the case of vikidia it assumes that the language code 'de' is a part of the family, but the Special:Interwiki page reveals that it's actually another wiki (http://grundschulwiki.zum.de/wiki/Hauptseite , although I don't know if they are affiliated): http://en.vikidia.org/wiki/Special:Interwiki

And the mainpage of the English Vikidia shows an interwiki link to 'de:': http://en.vikidia.org/w/index.php?title=Main_Page&action=edit

I'm not sure what the solution there is, because it seems like a method which analyses it without any prior checking. The most flexible way would be to use 'APISite.interwiki()' which would tell that there is no family for grundschulwiki.zum.de.

As one of Vikidia sysadmins I can clarify the point with the de: interwiki on Vikidia. Years ago, there used to be a de.vikidia.org, but grundschulwiki.zum.de that was more active than de.vikidia.org contacted us in order to be "partners", that's why de.vikidia.org isn't active anymore and all Vikidia interwiki links now to "de" point to grundschulwiki.zum.de. It is in fact an interwiki link, with a wiki outside of the Vikidia Family. it is the same case with the nl: interwiki that points to wikikids.nl.

Change 366543 had a related patch set uploaded (by Linedwell; owner: Linedwell):
[pywikibot/core@master] update families per T75534

https://gerrit.wikimedia.org/r/366543

I am wondering:
https://de.vikidia.org exists and is different from http://grundschulwiki.zum.de which means removing 'de' from codes doesn't solve the problem because 'de' is a real/valid site code.

I think the vikidia problem was already solved with https://gerrit.wikimedia.org/r/#/c/286274/:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Alle Rechte vorbehalten.

C:\Users\Administrator>cd ..

C:\Users>cd ..

C:\>cd pwb/git/core

C:\pwb\GIT\core>pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> import pwb, pywikibot as py
>>> s = py.Site('de', 'vikidia')
>>> p = py.Page(s, 'Vikidia:Hauptseite')
>>> p.exists()
WARNING: C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\
urllib3\connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request
is being made. Adding certificate verification is strongly advised. See: https:/
/urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
True
>>>

Change 366543 abandoned by Xqt:
update families per T75534

Reason:
Se T75534

https://gerrit.wikimedia.org/r/366543

Dvorapa renamed this task from missing language in family causes exception in Page.langlinks to IWM: missing language in family causes exception in Page.langlinks.Jun 4 2018, 7:14 PM
Dvorapa removed a project: Pywikibot-Interwiki-Map.
Dvorapa subscribed.

Archiving unused project

Change 487511 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [L10N] Update wowwiki family file

https://gerrit.wikimedia.org/r/487511

Change 487511 merged by jenkins-bot:
[pywikibot/core@master] [L10N] Update wowwiki family file

https://gerrit.wikimedia.org/r/487511