Page MenuHomePhabricator

Error generating thumbnail: As an anti-spam measure, you are limited from performing this action too many times
Closed, ResolvedPublic

Details

Reference
bz64622

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:17 AM
bzimport set Reference to bz64622.
bzimport added a subscriber: Unknown Object (MLST).

Comments:

  • Purging files at Commons has no effect
  • Clicking the "Other resolutions:" gives the error

Error generating thumbnail
Error creating thumbnail: File missing

  • Full image appears to display okay

https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Revue_des_Deux_Mondes_-_1843_-_tome_3.djvu/page970-2840px-Revue_des_Deux_Mondes_-_1843_-_tome_3.djvu.jpg

Change 130563 had a related patch set uploaded by Aaron Schulz:
Removed "GetLocalFileCopy" pool counter entry

https://gerrit.wikimedia.org/r/130563

Change 130563 merged by jenkins-bot:
Removed "GetLocalFileCopy" pool counter entry

https://gerrit.wikimedia.org/r/130563

Aaron: You are fast. Thank you!

Similar issue again now: I get a message "Error generating thumbnail

As an anti-spam measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes."

in at least about one in every 3 pages.

  • Bug 64801 has been marked as a duplicate of this bug. ***

Changing summary; the error is widespread across all sorts of users of Commons.

555: Resetting blocker and immediate; see [[mw:Bugzilla/Fields#Priority]]

mail wrote:

I as well get currently frequent error 500's after requesting a thumbnail image:
"Error generating thumbnail - As an anti-spam measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes."
This happens quite fast (I requested perhaps around 100 thumbnails in the last few hours). But it also resolves quite fast. Retrying it shortly after, usually results in a OK 200.

It seems this is getting more and more frequent. When is a fix expected? Thanks.

This seems to be hitting $wgRateLimits['renderfile']. See [1]. Those rate limits are disabled by default, so maybe WMF has set them up recently.


[1] https://www.mediawiki.org/wiki/Manual:$wgRateLimits

(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #13)

This seems to be hitting $wgRateLimits['renderfile']. See [1]. Those rate
limits are disabled by default, so maybe WMF has set them up recently.

$ git blame InitialiseSettings.php | grep -A 4 renderfile
c78a54c9 (Aaron Schulz 2013-10-16 16:14:35 -0700 6390) 'renderfile' => array(
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6391) 1400 new thumbnails per minute
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6392) 'ip' => array( 700, 30 ),
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6393) 'user' => array( 700, 30 ),
c78a54c9 (Aaron Schulz 2013-10-16 16:14:35 -0700 6394) ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6395) 'renderfile-nonstandard' => array(
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6396)
140 new thumbnails per minute
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6397) 'ip' => array( 70, 30 ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6398) 'user' => array( 70, 30 ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6399) ),

For what reason an experienced developer was set such very low limit in an environment with the size of Wikimedia, with each category view listing tons of media files and in the exact time that an international upload contest (Wiki Loves Earth) is running??

wieralee wrote:

It makes our work on wikisource.pl twice slower. It crashes our work :-(
% of proofread pages loads without the scans. Very, very tiring.

wieralee wrote:

(In reply to wieralee from comment #16)
40 %

50 hours since the initial report and no single action directly related on fixing it.

Why the change that is *broking all Wikisource wikis* (we *really* rely on ProofreadPage and ProofreadPage relies on image resing!) isn't simply reverted until a sysadmin found the desidered setup? A config intended only to optimize server usage (I'm unable to found any report mentioning that this change is really needed at this moment) is really necessary if it breakes features that are working for years?

From a mail sent to MediaWiki core list:

By looking at the udp2log limiter.log file, the renderfile-nonstandard
limit is reached by:

$ fgrep renderfile limiter.log |cut -d\: -f4|sort|uniq -c|sort -n

378  10.64.0.168 tripped! mediawiki
405  10.64.0.167 tripped! mediawiki
476  10.64.32.92 tripped! mediawiki
498  10.64.16.150 tripped! mediawiki

$

They are the media server frontends ms-fe1001 to ms-fe1004. We probably
want to restrict the end user IP instead.

I suspect the media servers are not properly passing the X-Forwarded-For
header down to the thumbnail renderer. Seems the logic is in
operations/puppet.git file ./files/swift/SwiftMedia/wmf/rewrite.py

Would need someone with more informations about Swift/Thumb handling
than me :-(


I have poked Faidon about it, the X-Forwarded-For headers seems to be passed by the Swift proxies, we need their IP to be trusted by MediaWiki.

Change 131669 had a related patch set uploaded by Hashar:
Trust Swift proxies XFF headers

https://gerrit.wikimedia.org/r/131669

Change 131670 had a related patch set uploaded by Faidon Liambotis:
Add Swift frontends to squid.php

https://gerrit.wikimedia.org/r/131670

Change 131669 abandoned by Hashar:
Trust Swift proxies XFF headers

Reason:
Abandoned in favor of Faidon change https://gerrit.wikimedia.org/r/#/c/131670/

https://gerrit.wikimedia.org/r/131669

Change 131671 had a related patch set uploaded by Hashar:
Mention ms-fe servers need to be XFF trusted by MW

https://gerrit.wikimedia.org/r/131671

Change 131670 merged by jenkins-bot:
Add Swift frontends to squid.php

https://gerrit.wikimedia.org/r/131670

Change 131671 merged by Faidon Liambotis:
Mention ms-fe servers need to be XFF trusted by MW

https://gerrit.wikimedia.org/r/131671

Hashar was correct in identifying the root cause. This was a long-standing (~2 years) configuration error that in combination with the recent per-IP thumb limits broke generation for many users.

The above changes have been merged and deployed, so this should be working for everyone now. The logs suggest so, but let's give it some time..

Can we do anything to make the cause of such incidents more easily visible/debuggable in the future ?

perhaps including the IP being limited in the error ?

Hashar / Faidon: Thanks for your work and investigation!

Works for me now.
However, as 555 said, I wish that such an issue which breaks all Wikisource work, to be better handled in the future. Thanks for fixing this.

Derk-Jan Hartman: to customize the error message, I guess you want to fill another bug :-) We be easier to handle.

Yann Forget: the bug did get escalated to the mw-core weekly meeting (Monday 10pm UTC). Got fixed whenever we managed to wake up. If the issue is critical, your best bet is to raise it on wikitech-l which most people with cluster access read even during week-ends.

If there is no more suspicious entries in limiter.log, I guess we can mark this bug as fixed finally.

I posted a rather long postmortem describing:

  • the timeline for the resolution
  • the root cause analysis and how we caused the issue
  • suggestion improvements

http://lists.wikimedia.org/pipermail/mediawiki-core/2014-May/000068.html

The media servers are no more limited according to limiter.log. Whitelisting them as trusted XFF solved the issue.

Restricted Application added subscribers: Steinsplitter, Matanya. · View Herald Transcript