Page MenuHomePhabricator

Querying for rvdiffto=prev fails for many revids: "notcached"
Closed, DeclinedPublic

Description

Sending queries to Wikipedia APIs (in many languages) asking for diff fails when requesting several revids at once. Only the first revid will contain diff information. All the rest will say "notcached".

Example1:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&revids=1564795|2544909|213252&rvprop=ids&format=jsonfm&rvdiffto=prev

Example2:
http://tr.wikipedia.org/w/api.php?action=query&prop=revisions&revids=1564795|2544909|234634&rvprop=ids&format=jsonfm&rvdiffto=prev

Expected behavior:
A diff for every requested revId.

Actual behavior:
A diff for only the first revId and "notcached" for everything else.


Version: unspecified
Severity: normal

Details

Reference
bz29223
TitleReferenceAuthorSource BranchDest Branch
Data checks for data pipeline outputsrepos/structured-data/image-suggestions!2cparleT312235main
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:35 PM
bzimport set Reference to bz29223.
bzimport added a subscriber: Unknown Object (MLST).

Works for me

On the first one, I get the diffs for all 3

On the second one, I get diffs for 2, and notcached for the last

Both for logged in and anonymous

And the most it will give you is one uncached diff, per the site config

reedy@fenari:/home/wikipedia/common/php-1.17/maintenance$ php eval.php

echo $wgAPIMaxUncachedDiffs;

1

(In reply to comment #1)

Works for me

On the first one, I get the diffs for all 3

On the second one, I get diffs for 2, and notcached for the last

Both for logged in and anonymous

And the most it will give you is one uncached diff, per the site config

Sure. But if you keep refreshing, it should eventually give you all of them, because the uncached ones it generates should be saved to cache, right? Except that's not what happens, and that's what this bug is about.

(In reply to comment #2)

(In reply to comment #1)

Works for me

On the first one, I get the diffs for all 3

On the second one, I get diffs for 2, and notcached for the last

Both for logged in and anonymous

And the most it will give you is one uncached diff, per the site config

Sure. But if you keep refreshing, it should eventually give you all of them,
because the uncached ones it generates should be saved to cache, right? Except
that's not what happens, and that's what this bug is about.

Nice when people explain things ;)

Cause is simple, the diffs are not even attempted to be cached afterwards by the API

(In reply to comment #3)

(In reply to comment #2)

(In reply to comment #1)

Works for me

On the first one, I get the diffs for all 3

On the second one, I get diffs for 2, and notcached for the last

Both for logged in and anonymous

And the most it will give you is one uncached diff, per the site config

Sure. But if you keep refreshing, it should eventually give you all of them,
because the uncached ones it generates should be saved to cache, right? Except
that's not what happens, and that's what this bug is about.

Nice when people explain things ;)

Cause is simple, the diffs are not even attempted to be cached afterwards by
the API

Bleh, the DifferenceEngine doesn't believe it should be cached for whatever reason

WHIIIIICCCCHHHH probably means it's not technically an API bug

(In reply to comment #5)

WHIIIIICCCCHHHH probably means it's not technically an API bug

Well presumably diffs viewed through the UI get cached...

if ( !is_null( $this->difftotext ) ) {

					$engine = new DifferenceEngine( $title );
					$engine->setText( $text, $this->difftotext );
				} else {
					$engine = new DifferenceEngine( $title, $revision->getID(), $this->diffto );
					$vals['diff']['from'] = $engine->getOldid();
					$vals['diff']['to'] = $engine->getNewid();
				}
				$difftext = $engine->getDiffBody();

function __construct( $titleObj = null, $old = 0, $new = 0, $rcid = 0,

		$refreshCache = false, $unhide = false )

And then it'll only do it if...

		if ( $this->mOldid && $this->mNewid ) {

So in some cases.

I'll look at looking into it properly later

(In reply to comment #7)

I'll look at looking into it properly later

So, it is now "later"... Is this resolved?

(In reply to comment #8)

(In reply to comment #7)
o

I'll look at looking into it properly later

So, it is now "later"... Is this resolved

One would presume as the bug is still open, it isn't resolved...

Are there any chances to get this issue solved?

This is low priority, so providing a patch would help speed up things. See https://www.mediawiki.org/wiki/Developer_access if you are interested.

I suspect the solution here is for clients to request one diff at a time if they want uncached diffs.

@Anomie, why is it desirable to have a client request each diff individually? Wouldn't it save bandwidth and time to return a set of diffs?

Requesting one diff at a time sounds more like a work-around and than a solution.

If we didn't care about the number of diffs generated per request, we wouldn't have $wgAPIMaxUncachedDiffs in the first place. Comments on T15209 may provide some history there.

@Anomie, I appreciate the $wgAPIMaxUncachedDiffs setting. It seems that the purpose of a low $wgAPIMaxUncachedDiffs setting is to minimize resource usage. However, 10 individual requests for a diff is more resource intensive than 1 request for 10 diffs. So, instead, it seems that the limitation is encouraging more resource usage. And based on this thread, it seems like this hacky, resource intensive work-around is recommended.

Could it be that $wgAPIMaxUncachedDiffs is not serving its purpose?