Page MenuHomePhabricator

Fetching revision history fails without error message
Closed, ResolvedPublic

Description

Details

Reference
bz11430

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:58 PM
bzimport set Reference to bz11430.

Weird. Maybe this is a PHP error or a DB error, but Commons doesn't have the right options enabled to print those errors (should it?). Try a similar query on another wiki (Wikipedia or a private one) and see if you get a sensible error. Alternatively, someone with access to LocalSettings.php should enable printing of DB and PHP errors or run the request from the command line.

jeluf wrote:

There is no page "User:Bryan/CategoryWatch" on commons => no history to return.

Bryan.TongMinh wrote:

Oops, I deleted the page. I'll undelete asap.

Bryan

Page undeleted, bug still present. Like I said in comment #1, this is probably a PHP error. I'll import your CategoryWatch page into my test setup and see what happens.

Hmm, I'm not getting any errors. Due to the page's hugeness (280K per rev), this may simply be an out-of-memory or max-exec-time-exceeded error.

cannon.danielc wrote:

Well, I'm positively puzzled by this one. I imported the top 100 revs of the page into my testwiki, and running the query there (with approx. the same version of the API as commons) produces no problems at all:

http://amidaniel.com/testwiki/api.php?&rvprop=content%7Ctimestamp&prop=revisions&titles=User%3ABryan%2FCategoryWatch&rvlimit=50&action=query

Nonetheless, the error is still present at commons. I'm especially confused as the query does quite comprehensive error-handling; if it was a problem with out-of-memory or another error being generated by the db server (possibly due to a configuration there that I don't have on my end), I should expect a message to that effect. As such, it seems this would have to be a syntax error or similar problem that causes an uncatchable exception. Is it possible there's a syntax error in some commons config file that for some reason only seems to be executed on this specific query? Again, it seems unlikely.

If someone with power on commons could provide a backtrace for the exception, that would really help :) Thanks.

AmiDaniel, you didn't import Bryan's page correctly: the revisions on your wiki are only around 10K each, while the original revisions at Commons are 280K each. That means your wiki only has to deliver 500K of stuff, while Commons is trying to give you a shocking 14M worth of revisions. If you want to run this test really well, you have to import all those 50 revs by hand (sucks, I know), 'cause there's no way your memory limit will allow you to import 14M in one go.

Changed summary -- obviously there's an *error*. :)

Yes, this is simply an out-of-memory condition.

Probably it would be wise for the API to bomb out gracefully with some limits of its own.

(In reply to comment #9)

Probably it would be wise for the API to bomb out gracefully with some limits
of its own.

How in the world are we gonna do that? How do you even detect a nearing OOM? PHP errors just happen when dealing with such massive quantities. I'm sure that if you asked the regular MW code to parse a 14M-page, it would also choke on an OOM or a max-exec-time.

That doesn't mean we shouldn't do better. :)

You can detect nearing OOM by checking the memory limit versus the used memory size, of course. Though it may be wiser to simply set some graceful limits on how much you work with at once. An API should never just allow arbitrary-size output, particularly since that output isn't being streamed, but played with in memory.

OOM/max exec checking should happen in the core rather than in the API. And it's not just the API that allows arbitrary-size output, $wgOut does that as well.

$wgOut is an internal service class, not a public API.

MediaWiki's public HTTP/HTML-level APIs have size limits for pages, rendered images, lists, etc. Though there likely are still going to be times when reasonable memory limits are exceeded because code can be sloppy, a "give me as many revisions as I like" interface is inherently going to be messier on that front because the total size can vary over many orders of magnitude.

Instead of returning exactly a requested N revisions, it could make sense to use the sort of interface that some other protocols like OAI-PMH use; the server returns as many items together as it feels are an acceptable size (eg fitting within its memory requirements and sensible time or transfer limits), and gives the client the opportunity to request the following items in another batch.

This would be a much more graceful behavior than just dying out with an error after plumping up memory. :)

Alternatively, the API code could be refactored to allow for streaming communications (as Special:Export does) instead of bulking up all requested data in memory as an array, transforming the array into an output buffer, compressing the output buffer, and then shipping it out.

I agree that we should handle OOM errors more gracefully, and the streaming/limiting by size thing also sounds like a good idea. However, we do need to know *where* the OOM error occurs (could someone with shell access to Commons PLEASE enable printing PHP errors?), because I think it may occur when requesting the revisions from the database. In that case, streaming output ain't gonna help, 'cause we'll be slaughtered brutally if we even suggest to obtain each revision through an individual query. If it occurs somewhere after, the best (and easiest) solution would probably be to keep an eye on the size of the result we're returning, and stop if that size exceeds a certain limit (say 1 MB).

Bryan.TongMinh wrote:

Proposed patch

I have created a patch that will not fetch more content than $wgMaxAPIContentSize. The downside is that it requires fetching the database twice.

attachment MaxAPIContentSize.diff ignored as obsolete

(In reply to comment #15)

Created an attachment (id=4176) [details]
Proposed patch

I have created a patch that will not fetch more content than
$wgMaxAPIContentSize. The downside is that it requires fetching the database
twice.

Thanks for the quick patch, but I don't think it's ideal. It'd be more practical to just check the result size in the ApiResult class and add results in a while(!$result->isFull()) loop. I'm willing to write this code, but I won't have time until after the weekend.

Bryan.TongMinh wrote:

(In reply to comment #16)

(In reply to comment #15)

Created an attachment (id=4176) [details] [details]
Proposed patch

I have created a patch that will not fetch more content than
$wgMaxAPIContentSize. The downside is that it requires fetching the database
twice.

Thanks for the quick patch, but I don't think it's ideal. It'd be more
practical to just check the result size in the ApiResult class and add results
in a while(!$result->isFull()) loop. I'm willing to write this code, but I
won't have time until after the weekend.

Ok, but how would the query-continue function then work? Also we need to know where exactly the time out condition occurs; it might as well be at the point where the database is queried. Is there anybody to enable debugging information on Commons?

Knowing where you ran out of memory usually doesn't tell you the culprit. :)

Here's where your above req was dying:

Sep 26 18:44:09 srv188 apache2[18228]: PHP Fatal error: Allowed memory size of 83886080 bytes exhausted (tried to allocate 23668231 bytes) in /usr/local/apache/common-local/php-1.5/includes/OutputHandler.php on line 89

By that time it'll have bulked up huge giant string buffer after huge giant string buffer; run that through some processing and eventually one of the steps is going to hit the limit.

(In reply to comment #17)

Ok, but how would the query-continue function then work?

When your ApiResult is full, you remove the last added value and add a query-continue field instead. I'll dive into this starting tomorrow.

And Brion, is there some way in which you can provide us with a backtrace of that OOM?

Fatal errors don't provide a backtrace.

Created attachment 4202
Proposed patch to ApiResult and ApiQueryAllLinks

I've drawn up a system in which every module checks whether the next result to be added will fit (through ApiResult::fits()), and adds a query-continue if it won't. This means that $wgAPIMaxContentSize does NOT limit the number of bytes in the output (XML overhead isn't counted), only the amount of bytes of all values added up. Also, checking is voluntary (can't be enforced since some modules rely on manipulating ApiResult's data array directly) and is ignored when adding the query-continue value. However, if all query modules are changed to implement this check (I've only changed one so far), this system will work.

Attached is a patch of ApiResult and ApiQueryAllLinks (of course all other ApiQuery* modules have to be patched too).

attachment result-limit.patch ignored as obsolete

Created attachment 4203
Revised patch

Correcting mistake in previous patch.

attachment result-limit.patch ignored as obsolete

Created attachment 4207
Revised and expanded patch

The attached patch implements ApiResult::fits() and changes ApiQueryAllLinks, ApiQueryAllPages, ApiQueryAllUsers, ApiQueryBacklinks, ApiQueryCategories, ApiQueryExtLinksUsage and ApiQueryWatchlist to use this interface.

Any thoughts?

attachment result-limit.patch ignored as obsolete

The biggest problem i see is for uneven results: user requests properties A & B for items 1,2,3,4. API appends property A to all items, but stops at 3 for property B. The result:

1: A,B; 2: A,B; 3: A; 4: A.
continue: A-none, B-item 3.

This would be very cumbersome for the clients to handle. They would much rather deal with:

1: A,B; 2: A,B;
continue: item 3.

Patch comments:

  • The size() function should be called recursively for each element (IIRC there is an array function that can do that, but we might have to do it manually), not just for the top elements.
  • The size of the current result data should be cached, not calculated each time fits() is called (huge overhead).
  • I think it would make more sense for ApiResult to check the size of the new value when it is being added, and return true/false if the append operation was successful. If it returns false, api adds "continue" value. If true, the size of the data array is updated.

(In reply to comment #24)

The biggest problem i see is for uneven results: user requests properties A &
B for items 1,2,3,4. API appends property A to all items, but stops at 3 for
property B. The result:

1: A,B; 2: A,B; 3: A; 4: A.
continue: A-none, B-item 3.

This would be very cumbersome for the clients to handle. They would much rather
deal with:

1: A,B; 2: A,B;
continue: item 3.

That's up to the API module in question to handle properly. However, all API modules that I know of add 1: A,B in one go, then 2: A,B in one go, etc., so you get exactly what you want.

Patch comments:

  • The size() function should be called recursively for each element (IIRC there

is an array function that can do that, but we might have to do it manually),
not just for the top elements.

Good point, hadn't thought about that.

  • The size of the current result data should be cached, not calculated each

time fits() is called (huge overhead).

  • I think it would make more sense for ApiResult to check the size of the new

value when it is being added, and return true/false if the append operation was
successful. If it returns false, api adds "continue" value. If true, the size
of the data array is updated.

For both cases, that means we have to close the back door, i.e. direct manipulation of the data array. I'll track down which modules use this back door and rewrite them to use addValue() instead. Also, setContinueEnumParameter() currently calls addValue(), which would block adding a continue value. I think that this one will have to use direct $data access, which means I'll have to move setContinueEnumParameter() from ApiQueryBase to ApiResult.

I'll be away for the week end, so I'll start working on the second draft on Monday.

(In reply to comment #25)

(In reply to comment #24)

The biggest problem i see is for uneven results: user requests properties A &
B for items 1,2,3,4. API appends property A to all items, but stops at 3 for
property B. The result:

1: A,B; 2: A,B; 3: A; 4: A.
continue: A-none, B-item 3.

This would be very cumbersome for the clients to handle. They would much rather
deal with:

1: A,B; 2: A,B;
continue: item 3.

That's up to the API module in question to handle properly. However, all API
modules that I know of add 1: A,B in one go, then 2: A,B in one go, etc., so
you get exactly what you want.

Not exactly -- the query module will have an issue with this. If you ask for page info, page links, and page content, query will execute pageinfo and pagelinks submodules completely (all pages will have both properties), but then somewhere in the middle of adding page content, the result will become too large. "continue" will be added to the result, informing how to get more.

So now the client has to understand that all pages have the first 2 properties, but only some pages have content. Also, if there was another request, like "backlinks", which is not even part of the page properties, it will be executed but will immediately have "continue" without any data.

(In reply to comment #26)

Not exactly -- the query module will have an issue with this. If you ask for
page info, page links, and page content, query will execute pageinfo and
pagelinks submodules completely (all pages will have both properties), but then
somewhere in the middle of adding page content, the result will become too
large. "continue" will be added to the result, informing how to get more.

So now the client has to understand that all pages have the first 2 properties,
but only some pages have content. Also, if there was another request, like
"backlinks", which is not even part of the page properties, it will be executed
but will immediately have "continue" without any data.

That is tricky, and I think the best option is just to accept that. Look at what you'd actually get:

api.php?action=query&list=backlinks|templatelinks&titles=Template:Separate%20continuity

<?xml version="1.0" encoding="utf-8"?>
<api>

<query>
  <pages>
    <page pageid="9149" ns="10" title="Template:Separate continuity" />
  </pages>
  <backlinks>
    <bl pageid="2335" ns="4" title="Battlestar Wiki:FAQ" />
    <bl pageid="9150" ns="11" title="Template talk:Separate continuity" />
    <bl pageid="9276" ns="1" title="Talk:Battlestar Galactica (2005 Novel)" />
    <bl pageid="9305" ns="101" title="Portal talk:Merchandise" />
    <bl pageid="11803" ns="1" title="Talk:Sagittarius Is Bleeding" />
  </backlinks>
  <embeddedin>
    <ei pageid="1499" ns="0" title="Erebus" />
    <ei pageid="1501" ns="0" title="Battlestar Galactica (2003 game)" />
    <ei pageid="1502" ns="0" title="Battlestar Galactica (SDS)" />
  </embeddedin>
</query>
<query-continue>
  <backlinks blcontinue="10|Separate_continuity|11809" />
  <embeddedin eicontinue="10|Separate_continuity|1716" />
</query-continue>

</api>

In this case, you'll just continue with

api.php?action=query&list=backlinks|embeddedin&titles=Template:Separate%20continuity%blcontinue=10|Separate_continuity|11809&elcontinue=10|Separate_continuity|1716

and so on, until you have the whole list. Since most modules (all list modules and most prop modules) provide lists, the user will have to query-continue at some point anyway. Also, since most 'users' are really bots or other programs, they'll just query-continue till they have the whole list and won't even notice anything's weird. Lastly, this kind of weirdness only occurs with extremely long lists (size > 8M), sizes that aren't normally handled by humans anyway. If we decide to keep this behavior, we should of course document it.

Created attachment 5064
Proposed patch v3

Alright, after a few days' work I came up with another patch. It's so huge that I've written up a list of what it changes, which I'll attach right after this one.

What it does basically is change ApiResult to keep track of its 'size' (the sum of the strlen()s of all elements added to it), and only allow stuff to be added through addValue() (the getData() backdoor no longer works), which will refuse to add stuff if it'd cause the result size to exceed $wgAPIMaxResultSize (16 MB by default). In that case, addValue() returns false and the calling module has to set a query-continue. Of course the limit doesn't apply to query-continue elements, or it'd be impossible to add them when they're needed. I'm thinking about making another such exception for warning messages, but I haven't done that in this patch (yet, maybe I will).

Most of the work (and most of the patch) involve refitting the modules around this model, especially changing them to addValue() each item separately (rather than putting them all in an array and addValue()ing that at the end) and check for addValue()'s return value. Some modules didn't have continue parameters, so I either added them or abused an existing parameter (siprop for meta=siteinfo and ususers for list=users).

I'd like to hear people's thoughts on this patch; I'll understand if you don't wanna read the whole thing (63 KB and 1992 lines), but the most important parts are the changes to ApiResult.php, ApiQueryBase.php and ApiBase.php. Also please read the changes list which I'll be attaching shortly, to get an idea of what's happening in this patch.

attachment 11430.patch ignored as obsolete

Created attachment 5065
List of changes in "Proposed patch v3"

Attached list of changes.

attachment 11430-changes ignored as obsolete

Bryan.TongMinh wrote:

I haven't had time to read through this in detail, but what happens if one content item is larger than $wgAPIMaxResultSize ?

(for the people creating 16M pages)

(In reply to comment #30)

I haven't had time to read through this in detail, but what happens if one
content item is larger than $wgAPIMaxResultSize ?

(for the people creating 16M pages)

Well the comment in DefaultSettings.php says $wgAPIMaxResultSize should be larger than the maximum revision size, so that shouldn't be possible. Now I don't know what the maximum revision size *is* (does anyone know?), and the 16M value was a guesstimate on my part (can be changed of course, that's why I'm putting this up for review).

Created attachment 5491
Proposed patch v4

They just keep gettin' bigger...

attachment 11430.patch ignored as obsolete

Created attachment 5492
List of changes in "Proposed patch v4"

Attached:

Created attachment 5779
Update last patch

Updated patch against current trunk. Didn't apply cleanly, so I had to do it manually. Couldn't patch ApiQueryImages, as the file has changed far too much since the patch was written (Roan said he'd redo that patch).

The only other problem I had in ApiQueryRevisions:

  • $limit = $startid = $endid = $start = $end = $dir = $prop = $user = $excludeuser = $expandtemplates = $section = $token = null;

+ $limit = $startid = $endid = $start = $end = $dir = $prop = $user = $excludeuser = $expandtemplates = $section = $token = $continue = null;

The former line doesn't exist anymore either, so I dunno how to fix it really. Other than those minor issues, this should be pretty much ready to go.

attachment api2.patch ignored as obsolete

Created attachment 5784
Patch v5

(In reply to comment #34)

Created an attachment (id=5779) [details]
Update last patch

Updated patch against current trunk. Didn't apply cleanly, so I had to do it
manually. Couldn't patch ApiQueryImages,

You probably mean ApiQueryImageInfo

as the file has changed far too much
since the patch was written

Actually it wasn't that bad, but hunk #1 was quite large (about a hundred lines) and failed over some minor stuff. Applying it by hand was quite easy.

(Roan said he'd redo that patch).

Done; the attached patch is up to date to HEAD (r46843), and I hope to apply it today or tomorrow.

The only other problem I had in ApiQueryRevisions:

  • $limit = $startid = $endid = $start = $end = $dir = $prop =

$user = $excludeuser = $expandtemplates = $section = $token = null;
+ $limit = $startid = $endid = $start = $end = $dir = $prop =
$user = $excludeuser = $expandtemplates = $section = $token = $continue = null;

The former line doesn't exist anymore either, so I dunno how to fix it really.
Other than those minor issues, this should be pretty much ready to go.

This didn't need fixing; I removed that line in my crusade against extract() in r44719 because it wasn't needed any more.

Attached:

(In reply to comment #35)

Done; the attached patch is up to date to HEAD (r46843), and I hope to apply it
today or tomorrow.

Modified version of patch applied in r46845