Page MenuHomePhabricator

site.preloadpages does not preload all links and templates
Closed, ResolvedPublic

Description

When in def preloadpages(self, pagelist, groupsize=50, templates=False, langlinks=False) templates=True and langlinks=True, not all lnks/templates are returned.

import pywikibot

site = pywikibot.Site('en', 'wikipedia')
page = pywikibot.Page(site, 'Main Page')

for p in site.preloadpages([page], templates=True, langlinks=True):

pass

print 'p._templates', len(page._templates)
print 'p._langlinks', len(page._langlinks)

They are actually more, see https://en.wikipedia.org/w/api.php?maxlag=5&format=jsonfm&rvprop=ids|flags|timestamp|user|comment|content&prop=revisions|info|categoryinfo|templates|langlinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo|hasmsg


Version: core-(2.0)
Severity: normal

Details

Reference
bz60206

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:04 AM
bzimport set Reference to bz60206.
bzimport added a subscriber: Unknown Object (????).

Retrieving 1 pages from wikipedia:en.
p._templates 10
p._langlinks 10

The actual query used is https://en.wikipedia.org/w/api.php?maxlag=5&format=json&rvprop=ids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Ccontent&prop=revisions%7Cinfo%7Ccategoryinfo%7Ctemplates%7Clanglinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo%7Chasmsg

i.e.
maxlag: 5
format: json
rvprop: ids|flags|timestamp|user|comment|content
prop: revisions|info|categoryinfo|templates|langlinks
titles: Main Page
meta: userinfo
indexpageids:
action: query
uiprop: blockinfo|hasmsg

it's clear not all results are returned (see the continue header), BUT according to Yuri, the continue header uses here is broken (this is https://www.mediawiki.org/wiki/API:Legacy_Query_Continue instead of https://www.mediawiki.org/wiki/API:Query#Continuing_queries).

Is it an option to migrate to https://www.mediawiki.org/wiki/API:Query#Continuing_queries? This supported only from MediaWiki version: ≥ 1.21.

After re-reading the Legacy Query Continue page, I think supporting that in this case is not a huge hassle - we don't use a generator, so there is no need to seperate the different query-continue parameters...

There are 2 issues.

  1. query does not query-continue because self.continuekey is not recognized (see https://bugzilla.wikimedia.org/show_bug.cgi?id=55193)
  2. even if it did, there would be multiple chunks yielded for each page and api.update_page() just record the last returned

Change 110067 had a related patch set uploaded by Mpaa:
Bug 60206 - site.preloadpages does not preload all links and templates

https://gerrit.wikimedia.org/r/110067

Change 110067 merged by jenkins-bot:
Bug 60206 - site.preloadpages does not preload all links and templates

https://gerrit.wikimedia.org/r/110067