Page MenuHomePhabricator

Purge foreign pages using an image/media file where this data is available
Closed, ResolvedPublic

Description

When GlobalUsage is available, and a new version of a file is uploaded, in theory we could do some magic to purge the pages using that image on foreign wikis (in Squid at the very least).


Version: unspecified
Severity: enhancement

Details

Reference
bz22390

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:51 PM
bzimport set Reference to bz22390.

Bryan.TongMinh wrote:

*** Bug 22073 has been marked as a duplicate of this bug. ***

  • Bug 42582 has been marked as a duplicate of this bug. ***

So just to clarify this bug - the bad that happens:
*Person moves a file at commons. The url of the media file now changes. Pages on client wiki that uses that file will be broken until such a time as the pages get re-rendered. (An alternative or complementary solution would be to give an HTTP redirect for those old urls. This would make life nicer for hotlinkers)
*Person uploads a new version of an image file with different dimensions. Because file urls only have the width in them, the height of the corresponding thumb changes. However the height attribute on the <img> tag won't change until next time that page gets re-rendered.
*Slightly separate issue but related (This was bug 22073): User edits the description page on commons, we would want to purge the memcache entries for this description page (Slightly complicated because we don't know what languages this page has been cached in [varies by userlanguage], but we could probably take a good guess based on image usage).

For the first two points, we need to somehow do the equivalent of a cross-wiki HTMLCacheUpdate of the globalusage tables. This is the more serious issue imo. I'm increasing the priority to normal since this actively causes broken images in articles (albeit temporarily). On a third party wiki host, using foreign repos but not 404-image-render-handlers, this would probably cause even more serious breakage.

For the third point (which is kind of a separate issue), we need to clear some memcache entries (Somehow figuring out which ones are appropriate. Clearing content language of where image is used is probably good enough for now), and possibly squid/varnish cache.

A further clarification point, to be explicit, is that this bug does *not* involve showing old images being to the user, only broken images.

(In reply to comment #3)

So just to clarify this bug - the bad that happens:
*Person moves a file at commons. The url of the media file now changes. Pages
on client wiki that uses that file will be broken until such a time as the
pages get re-rendered. (An alternative or complementary solution would be to
give an HTTP redirect for those old urls. This would make life nicer for
hotlinkers)

I'm sure there's a more specific bug, but I can't find it, so I should mention I submitted a patch for the http redirect thing https://gerrit.wikimedia.org/r/80135

That does not solve this bug (only makes it a little less severe). We still need to solve this bug for the below case:

*Person uploads a new version of an image file with different dimensions.
Because file urls only have the width in them, the height of the
corresponding
thumb changes. However the height attribute on the <img> tag won't change
until
next time that page gets re-rendered.

(In reply to comment #4)

I'm sure there's a more specific bug, but I can't find it, so I should
mention
I submitted a patch for the http redirect thing
https://gerrit.wikimedia.org/r/80135

I forget, that that won't make the full sized image url redirect. Oh well, still a step in the right direction. Most of the issues are with thumbnails anyways.

Change 97659 had a related patch set uploaded by Aaron Schulz:
Added support for purging backlinks in the wiki farm

https://gerrit.wikimedia.org/r/97659

Change 97659 merged by jenkins-bot:
Added support for purging backlinks in the wiki farm

https://gerrit.wikimedia.org/r/97659

(In reply to comment #7)

Change 97659 merged by jenkins-bot:
Added support for purging backlinks in the wiki farm

https://gerrit.wikimedia.org/r/97659

Woo! Thanks Aaron. I guess we should not mark this bug as fixed until the added setting is enabled on commons.

Change 101106 had a related patch set uploaded by Aaron Schulz:
Cross-wiki backlink purging for commons file changes

https://gerrit.wikimedia.org/r/101106

Change 101106 merged by jenkins-bot:
Cross-wiki backlink purging for commons file changes

https://gerrit.wikimedia.org/r/101106

To split files, sysops need to delete it, undelete part of the history, rename, and undelete the rest. I always supposed this never made a big fuss because the
articles using the file were not regenerated right away. Is this use case still safe?

(In reply to comment #11)

To split files, sysops need to delete it, undelete part of the history,
rename,
and undelete the rest. I always supposed this never made a big fuss because
the
articles using the file were not regenerated right away. Is this use case
still
safe?

Should be fine. If the file is used by more than 200,000 pages on a single wiki, we don't purge the pages using it on that wiki. In any case, I imagine most cases where you do this sort of thing are for files used on less than 500 pages, which would be an inconsequential amount of pages to purge.

As a technical point, the pages in question aren't actually regenerated immediately - what actually happens is they're marked as needing to be regenerated next time someone visits them.

(In reply to comment #12)

In any case, I imagine most cases where you do this sort of thing
are for files used on less than 500 pages, which would be an
inconsequential amount of pages to purge.

But these pages (eg high-profile Wikipedia articles) containing a temporarily deleted file will be regenerated with a red link, right? Splitting is definitely not a long process but it may take a few minutes ; I just hope this will not cause editing communities to understandably come with pitches and forks to the sysop who defaced their article for a few minutes :-)

(In reply to comment #13)

(In reply to comment #12)

In any case, I imagine most cases where you do this sort of thing
are for files used on less than 500 pages, which would be an
inconsequential amount of pages to purge.

But these pages (eg high-profile Wikipedia articles) containing a temporarily
deleted file will be regenerated with a red link, right? Splitting is
definitely not a long process but it may take a few minutes ; I just hope
this
will not cause editing communities to understandably come with pitches and
forks to the sysop who defaced their article for a few minutes :-)

Well the job queue isn't instant, and may take a couple minutes to get to the page, but ignoring that - This is just about regenerating those page's html. The image itself disappears the moment you delete it (and always has) since its url at upload.wikimedia.org goes away. The difference now would be instead of an <img> tag that doesn't render, the page might have a redlink for the image (Assuming the job queue is fast enough) for a couple minutes.

If people didn't notice and complain previously when you did this sort of thing, I don't think they'll start noticing now.

As an aside, the main thing that pops to mind here is we need a better mechanism for splitting histories :)

(In reply to comment #14)

The image itself disappears the moment you delete it (and always has) since
its url at upload.wikimedia.org goes away.

Oh, does it? I had always assumed the thumb was cached for a while but I never actually checked that. Thanks for the information :)

The difference now would be instead of
an <img> tag that doesn't render, the page might have a redlink for the image
(Assuming the job queue is fast enough) for a couple minutes.

If people didn't notice and complain previously when you did this sort of
thing, I don't think they'll start noticing now.

Perfect then ; thanks Brian for reassuring me :-)

As an aside, the main thing that pops to mind here is we need a better
mechanism for splitting histories :)

We sure do :-)

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:12 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:21 AM