Page MenuHomePhabricator

RepeatingGenerator intermittent failure on test.wikidata
Closed, ResolvedPublic

Description

We have seen a few intermittent failures of RepeatingGenerator. The most recent is on test.wikidata:

https://travis-ci.org/wikimedia/pywikibot-core/jobs/35913559

IIRC, the previous ones have also been test.wikidata

My guess is there is insufficient recent change data on this wiki, and a looping problem is causing infinite looping.


Version: core-(2.0)
Severity: normal

Details

Reference
bz71121

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:45 AM
bzimport set Reference to bz71121.
bzimport added a subscriber: Unknown Object (????).

The easiest workaround is to remove the test, and when we come up with a better test that guarantees that it won't cause a failure in any case, we can add it later.

Another workaround is to simulate a stream of recentchanges / newpages somehow to prevent insufficient recent change data.

If it's very urgent, you can remove the test right now. I can't code in next few days.

It isnt urgent. It only happens occasionally, and only happens on one site. Another one today.
https://travis-ci.org/wikimedia/pywikibot-core/jobs/37035200

(In reply to Sorawee Porncharoenwase from comment #1)

The easiest workaround is to remove the test, and when we come up with a
better test that guarantees that it won't cause a failure in any case, we
can add it later.

Another workaround is to simulate a stream of recentchanges / newpages
somehow to prevent insufficient recent change data.

The best 'quick' way to do this is move the test into a new class, and then in setUpClass skip the tests if there are not sufficient suitable recentchanges / newpages for the test to run against the live wiki.

@John Mark Vandenberg: class TestPageGenerators runs with family = 'wikipedia', code = 'en', doesn't it? Why should it run on test.wikidata?

Anyway, suppose that this bug really need to be fixed:

(In reply to John Mark Vandenberg from comment #3)

The best 'quick' way to do this is move the test into a new class, and then
in setUpClass skip the tests if there are not sufficient suitable
recentchanges / newpages for the test to run against the live wiki.

This is impossible because we don't know the future!

One way I can think of is that we might use multiprocessing to run the test while the main process wait for, say, fifteen seconds. If the process has not finished yet by that time, we terminate the process and assume that it works correctly.

(In reply to Sorawee Porncharoenwase from comment #4)

@John Mark Vandenberg: class TestPageGenerators runs with family =
'wikipedia', code = 'en', doesn't it? Why should it run on test.wikidata?

You're right; the test is supposed to be running against the site en.wikipedia.org, but a bug somewhere in pywikibot could mean that doesnt happen.

I find it hard to believe that en.wp doesnt have four namespace 0 edits on recentchanges for 10 minutes. In fact, 10 mins shouldnt even be required. If I understand correctly, this test is essentially asking the RC feed for four namespace 0 edits, any time in the past. This should always be an instant result.

(In reply to John Mark Vandenberg from comment #5)

recentchanges for 10 minutes. In fact, 10 mins shouldnt even be required.
If I understand correctly, this test is essentially asking the RC feed for
four namespace 0 edits, any time in the past. This should always be an
instant result.

This is wrong. RepeatingGenerator will ask the RC feed for the latest edit in the past (to be an indicator of "present") and three more edits in the future. Thus, it might not return an instant result for some sites. For English Wikipedia, however, it should return an instant result.

OK. thanks for clarifying; it makes more sense now, but still doesnt explain how it might take 10 mins to fetch 3 namespace 0 edits in enwp.

One way to avoid the problem is to add a timeout to RepeatingGenerator, so the caller can prevent it from locking up forever if new data doesnt arrive.

This weekend I will add a parameter "timeout." It's not an elegant solution, though, because it would be just a workaround -- hiding the real problem without fixing it. Some people might even disagree with this workaround.

Couldn't we print additional information instead? I'd prefer that first to determine who's fault it is (if there are really only so few edits). For example start time would be interesting and if it had fetch any pages.

Change 171830 had a related patch set uploaded by John Vandenberg:
Disable cache for RepeatingGenerator tests

https://gerrit.wikimedia.org/r/171830

Change 171830 merged by jenkins-bot:
Disable cache for RepeatingGenerator tests

https://gerrit.wikimedia.org/r/171830

Im pretty sure this is fixed now. Sorry I didnt notice this earlier.

So what's the problem? Cache?

Yes. TestRequest was forcing all subsequent queries to return the same result, consisting of the same pages, so it would never find new pages to yield.