Page MenuHomePhabricator

Page._getVersionHistory returns only a part of a history
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1546/
Reported by: dixond
Created on: 2012-11-28 13:00:50
Subject: Page._getVersionHistory returns only a part of a history
Assigned to: xqt
Original description:
There is a bug in Page.\_getVersionHistory. It doesn't load the whole history it it is large. The problem in here \(wikipedia.py\):
if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount:
thisHistoryDone = True

I believe it should be as following:
if not getAll and len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) >= revCount:
thisHistoryDone = True

Version.py:
Pywikipedia trunk/pywikipedia/ \(r10745, 2012/11/20, 13:03:05\)
Python 2.7.3 \(default, Apr 10 2012, 23:31:26\) \[MSC v.1500 32 bit \(Intel\)\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1546

Details

Reference
bz55160

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:23 AM
bzimport set Reference to bz55160.
bzimport added a subscriber: Unknown Object (????).

Are you sure that you have set getAll=True while invoking that method?

  • assigned_to: nobody --> xqt

Yes, of course. It is quite obvious that the following code won't allow to load the rest of revisions by setting thisHistoryDone to True:
if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount:
thisHistoryDone = True

Am I missing anything?

first of all \_getVersionHistory\(\) is an internal method and you shouldn't use it directly. Use getVersionHistory\(\) instead. The the condition is quite right. Try the following statements:

import pywikibot as pwb
p = pwb.Page\('de', 'user talk:xqt'\)
h = p.getVersionHistory\(getAll=True\)
len\(h\)

which gives 4250 entries \(yet\).

Changing the condition will return 500 entries only.

Changing the condition still returns 4250 entries for me \(have you missed the "not getAll and " part in my code?\)

But if I use fullVersionHistory instead of getVersionHistory, it returns only 192 entries for me. I.e. try the following code:

import wikipedia as pywikibot
p = pywikibot.Page\('de', 'user talk:xqt'\)
h = p.fullVersionHistory\(getAll=True\)
print len\(h\)

Any updates? Are you able to reproduce this issue?

Change 105619 had a related patch set uploaded by Mpaa:
(bug 55160) Page._getVersionHistory returns only a part of a history

https://gerrit.wikimedia.org/r/105619

(In reply to comment #9)

Change 105619 had a related patch set uploaded by Mpaa:
(bug 55160) Page._getVersionHistory returns only a part of a history

https://gerrit.wikimedia.org/r/105619

h = p.getVersionHistory(getAll=True) returns the full history.

h = p.fullVersionHistory(getAll=True) returns 192 entries (now more ...).
Reason is that result might not be 'revCount' long also when 'query-continue' is returned, due to:

{u'result':{u'*': u'This result was truncated because it would otherwise be larger than the limit of 12582912 bytes'}}

So it is not enough to check only that len() < revCount to declare that thisHistoryDone = True.

Change 105619 merged by jenkins-bot:
(bug 55160) Page._getVersionHistory returns only a part of a history

https://gerrit.wikimedia.org/r/105619