Page MenuHomePhabricator

Deprecate and remove the purge action from MediaWiki
Open, LowPublic

Description

There should be no need for a purge action exposed to users. Every time a user has to manually trigger a purge (or a null edit) to get a useful representation of some page, MediaWiki has failed that user. Eventually, it would be nice if purge could be removed completely; but for now, MediaWiki fails users quite often, so we keep track the issues causing that here.

Details

Reference
bz54902

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

(In reply to comment #2)

In some ways, I feel like the purge action is the red-headed step child of
MediaWiki actions. Is it exposed anywhere in MediaWiki core currently?

(In reply to comment #1)

Some additions like SMW, a Purge extension, and iirc user scripts do expose
it in the UI. It can also be exposed by in-content links like those on most
Forum:Index pages made by DPLForum users and Wikipedia template pages.

Why is the action exposed? When/why does it become necessary?

It's exposed by DPLForum and SMW users due to outputting pattern-based lists of pages. Something which is extremely difficult to subscribe to updates for. We could create an event/subscribe system that would make it easier for extensions to listen for complex changes. But even if we do that it won't eliminate the job queue delay and users will still have a reason to use purge.

The purge action has valid use in cases where the 'underlying [reason]' for
the purge use has an explicit reason it won't be 'fixed'. Purge resets the
parser cache for a single page. This has results like viewing the page with
any template modifications. In this kind of situation the parser cache is not
instantly purged for an explicit reason.

Human effort is more costly, I promise, especially when you first have to
figure out that purging is even an option (how would any typical user know
that?) and then purge the page (giving you a page that looks like the current
page is supposed to look).

Readers do not need to figure out how to purge. Serving slightly out of date templates to readers till the job queue catches up is fine. &action=purge on normal articles is used typically by template authors. Doing complex things with lots of in-knowledge about MW and purging the cache on a single page to test that their modifications look right after they've finished.

There's an overall design defect here, I think.

It *is* an ugly hack, and it's not meant to be exposed to end-users.

Any time people feel the need to use action=purge, it's because MediaWiki failed to properly handle updates to page contents (usually because of some feature that's not compatible with the parser cache, sometimes because of software updates that don't clear the parser cache).

MZMcBride: When would this bug be "evaluated", so action could take place?
So far it feels like this discussion should happen on a mailing list instead of a bugtracker, because this ticket is far from being actionable. If there's consensus a bug report could be filed that defines problems that were agreed on.

null edit is not the same than a simple purge. null edit also refreshed the database tables (like categorylinks and so on). With api purge and param forcelinkupdate you can reach that also.

Along with the job queue and new features (core software update or a new installed extension), bug 18478 is also a reason to do a purge.

(In reply to comment #5)

MZMcBride: When would this bug be "evaluated", so action could take place?

Looks like one of Wikimedia's (or MediaWiki's) architects confirmed that the purge action is a hack in comment 4. :-)

Any hack should eventually be killed (I believe that's the nature of a hack). If we're not going to expose the purge action in the user interface, I think we should work toward deprecating it.

It would be nice if this purging wasn't needed at all. There's a gadget specifically for this at the english wikipedia [1] and probably several other wikipedias as well.

IIRC, whenever a highly used template is changed at the english wikipedia, bots are used to purge all pages where the template was used.

[1]: https://en.wikipedia.org/wiki/MediaWiki:Gadget-purgetab.js

I guess it needs some clarification that "purge action" here refers to index.php?action=purge and/or api.php?action=purge ?

(In reply to Liangent from comment #9)

I guess it needs some clarification that "purge action" here refers to
index.php?action=purge and/or api.php?action=purge ?

Probably both. Is there a good reason to have either?

(In reply to MZMcBride from comment #10)

Probably both. Is there a good reason to have either?

api.php?action=purge can do more than index.php?action=purge - the "force[recursive]linkupdate" parameter.

Assume we have a page containing [[Category:{{CURRENTTIMESTAMP}}]]. Maybe we can deprecate the "parser cache clearing action" by postprocessing parser output or simply disabling parser cache, but I don't think we'll find a way to update the categorylinks row every second. In this case a "link updating action" is still needed. The equivalent thing on index.php is a null edit.

(In reply to Liangent from comment #11)

(In reply to MZMcBride from comment #10)

Probably both. Is there a good reason to have either?

api.php?action=purge can do more than index.php?action=purge - the
"force[recursive]linkupdate" parameter.

Why are these options necessary? Which use-cases is the API purge action and its additional parameters solving?

Assume we have a page containing [[Category:{{CURRENTTIMESTAMP}}]]. Maybe we
can deprecate the "parser cache clearing action" by postprocessing parser
output or simply disabling parser cache, but I don't think we'll find a way
to update the categorylinks row every second. In this case a "link updating
action" is still needed. The equivalent thing on index.php is a null edit.

Null edits should not be necessary.

There are many ideas to explore here. For example, we could make purging more probabilistic by purging pages (including links updates0 every thousandth or millionth view.

(In reply to Liangent from comment #11)

api.php?action=purge can do more than index.php?action=purge - the
"force[recursive]linkupdate" parameter.

This reminds of me the "don't leave a redirect" functionality when moving a page. The functionality was originally added only to the API's move action and eventually it was declared that API functionality and UI (index.php) functionality at Special:MovePage should not intentionally diverge like this.

That is, index.php?action=purge should probably include the force[recursive]linkupdate parameter as well.

Or that parameter should be removed from both, depending on its justification and utility.

Null edits shouldn't be needed but there are several bugs that have been open for a long time, in fact one that I ran into on my private wiki has been open and unresolved for over two years. Until such a time as the purge action is no longer needed it should remain.

(In reply to Betacommand from comment #14)

Null edits shouldn't be needed but there are several bugs that have been
open for a long time, in fact one that I ran into on my private wiki has
been open and unresolved for over two years. Until such a time as the purge
action is no longer needed it should remain.

Yeah I can remember some of them. Is there a full list (a tracking bug?) somewhere?

(In reply to Betacommand from comment #14)

Until such a time as the purge action is no longer needed it should remain.

Indeed. I think the focus should be slow deprecation and eventual removal.

(In reply to Liangent from comment #15)

Yeah I can remember some of them. Is there a full list (a tracking bug?)
somewhere?

This bug report may become a tracking bug.

In my experience, one of the most common reasons to use purge is post infrastructure failures (the purge feed between main and caching centers has been down), causing inconsistencies.

In other cases, it's almost always cached 'time dependent' content. The category with speedy deletes not updating quick enough to the wikignomes liking (job queue). Or calculated ages of article subjects for instance, that are no longer up to date (because calculations are cached, which for time based content, is thus inherently broken).

If we don't want to sacrifice flexibility and want to keep caching at the same time, I would say that calculations based on time could have cache invalidation timers associated with it. (more ugliness, but doable/manageable in Lua I think).

Tgr renamed this task from Deprecate MediaWiki's purge action? (It's a hack.) to Bugs forcing users to do manual purge (tracking).Dec 30 2014, 12:27 AM
Tgr updated the task description. (Show Details)
Tgr added a project: Tracking-Neverending.
Tgr set Security to None.
Tgr subscribed.

This bug report may become a tracking bug.

Done.

MZMcBride renamed this task from Bugs forcing users to do manual purge (tracking) to Deprecate and remove the purge action from MediaWiki (tracking).Jun 30 2016, 3:52 AM

There was a recent situation with All-and-every-Wikisource that needed an almighty purge of many, many pages, and purge.py needed updating. @Mpaa & @Billinghurst did it, iirc..?

There was a recent situation with All-and-every-Wikisource that needed an almighty purge of many, many pages, and purge.py needed updating. @Mpaa & @Billinghurst did it, iirc..?

Wikisource does indeed use the purge function

  • When Proofread Page hasn't replicated its page status properly to indexes, which resulted in the purging of both Index: ns and Page: ns, and it was done across all WSes. [There is a phabricator ticket]

Also to note that when a file: is updated at Commons (overwritten) that we often will need to purge the index to refresh the text layer for the Page: ns pages that pending (redlinked). We used to have issues with text layer availability, though I cannot say that I have had to do that for a while.

Personally, I have had to use purge on occasions when user scripts have failed to load, either where added to the sidebar as an option, or where a pages has failed to load properly. Today for example, I have had issues with Commons script MediawikiExCommons.js failing to find used files and offering to transfer them.

So I have no issue with never having to purge a page, I definitely know that we are not ready to remove it.

Factual note:
Any page (most typically main pages and portal pages) which transcludes time-changing templates (Ie. "word of the day", "quote of the week", "featured article" etc.) needs purge feature because (obviously) never updates on its own.

I'm not saying that isn't true, but it shouldn't be true. Using magic words like {{CURRENTDAY}} is supposed to reduce the cache TTL (time-to-live) of the rendering. For the built-in magic words, there's a handy list in MagicWord.php. For ParserFunctions' {{#time:}} it's kind of difficult to follow, but I think it uses the calculations in Language.php. Of course it's not impossible that this functionality is broken, but it exists and it's supposed to work.

There is an use case for /api.php?action=purge on plwiktionary, see the task description in T109638: Page categorization logs expose user's IP.

I'm not saying that isn't true, but it shouldn't be true. Using magic words like {{CURRENTDAY}} is supposed to reduce the cache TTL (time-to-live) of the rendering. For the built-in magic words, there's a handy list in MagicWord.php. For ParserFunctions' {{#time:}} it's kind of difficult to follow, but I think it uses the calculations in Language.php. Of course it's not impossible that this functionality is broken, but it exists and it's supposed to work.

I honestly never seen "auto-purging" page transcluding time-dependant template. We always had to do it manually or have a bot to do purge on midnights.

Also not only parser function or magic words can create this kind of transclusion. Mind Lua modules as well, which do not have to use any of those constructions simply because of built-in os.date/os.time.

Mind Lua modules as well, which do not have to use any of those constructions simply because of built-in os.date/os.time.

Lua is not my strong point, but this also seems to be implemented for Lua modules, and indeed for os.date/os.time.

https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/master/engines/LuaCommon/lualib/mw.lua#L113
https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/master/engines/LuaCommon/lualib/mw.lua#L477

Phabricator_maintenance renamed this task from Deprecate and remove the purge action from MediaWiki (tracking) to Deprecate and remove the purge action from MediaWiki.Aug 13 2016, 10:11 PM

In addition to what Billinghurst reported about Wikisource, page transclusion is also a major cause for the need of purging: very often, when a transcluded page is modified, the transcluding page remains outdated even for several minutes, and we have to purge it to refresh it.

On ptwiki, we often see people complaining that "today"s date on main page is incorrect, and I assume is caused by the cache, so I suggest purging the pages when they want it to be updated. See e.g.:
https://pt.wikipedia.org/wiki/WP:Esplanada/geral/Datas_autom%C3%A1ticas,_ou_n%C3%A3o..._(6nov2012)

Also, on many wikis we have documentation templates transcluding /doc subpages to the main template pages and these provide a purge link so that changes to the docs can be propagated to the main template page. See:
https://en.wikipedia.org/wiki/Template:Documentation

Consider also the problem reported at pt:WP:CP#Categoria:!Redirecionamentos de categorias não vazios não está esvaziando. It seems to be like this:

Suppose there is a single page "P" in the category "C", and in the description of "C" there is a template "T" containing the test

{{#ifeq: {{PAGESINCATEGORY: {{PAGENAME}} }}|0|| [[Categoria:N]] }}

In this situation, if a user goes to "P" and removes category "C", the test in the template is now evaluated to "true" and the category "N" is removed. However, the category "C" is still listed at category "N" (i.e. it is not purged automatically, and the user has to do a null edit at the description of "C", so that the list at "N" gets updated).

Look at https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Weblinkwartung/Toter_Link

:Impovements since last year

categorytree works much faster. good job. thank you.

:as the same as last year

transclution of {{Spezial:Linkliste/Vorlage:Toter_Link/!...nourl}} (special linklist) 
this needs purge, because it should by actual. at this point there is some work to do.

Hmm... purge action may be needed for now until MediaWiki can be properly operated to keep content up to date. Sure, the purge buttons are grating, but anonymous users using IP addresses, if they can discover, are left with an option to type ?action=purge or &action=purge in the http address bar, which is more effective than simply pressing the refresh/reload button.

As an extension writer, the purge action is massively, massively useful. It is not uncommon whilst working on the extension, to mess-up rendering, and without an easy way to purge the page, you're a bit stuck.

It also helps users who encounter extension bugs that may cause temporary rendering issues. Non-core extensions issues are not something that could ever be resolved by changes to the core MW code.

Therefore, I vote for this to be a WONTFIX as there will always be some situations for which action=purge is the only feasible solution.

On Wikidata purge is a standard again. I'd guess for good reasons. I think that this increased the chances of retaining this useful feature. Touching wood.

It seems a strange idea to me. Changes in MediaWiki or extensions codebase (per @gh87), Lua modules, templates and pages with Semantic MediaWiki annotations do not propagate themselves along dependencies in realtime; and neither is even the parser cache invalidation.

Since it got bumped again, this ticket is more of an idealistic ticket, expressing that MediaWiki should do cache invalidation perfectly so that a manual purge is never needed by users. If users have to click "purge" to get things in sync, that's a bug, which is why this ticket exists. We may never get to the eventual goal because cache invalidation is a hard problem, but it seems worthwhile to aim for, track issues and hopefully fix them.

Since it got bumped again, this ticket is more of an idealistic ticket, expressing that MediaWiki should do cache invalidation perfectly...

I get that, but my point was that this is inherently an impossibility - even if this could be guaranteed for MW core, it could not be guaranteed for extensions (which are outside of your control) nor for any dev environment, where things are inherently unstable.

I would understand a permanent tracking ticket for issues relating to cache invalidation, but I don't think it is realistic to expect that the MW ecosystem would ever get to a point where page-specific purging is not useful and necessary functionality.

Since it got bumped again, this ticket is more of an idealistic ticket, expressing that MediaWiki should do cache invalidation perfectly...

I get that, but my point was that this is inherently an impossibility - even if this could be guaranteed for MW core, it could not be guaranteed for extensions (which are outside of your control) nor for any dev environment, where things are inherently unstable.

Why can't extensions be fixed? Why can't extension developers vary their caches on page_touched or adjust cache TTLs/disable caches when that isn't possible? Relying on users to click purge is a bug, just like any other broken extension functionality.

Developers should be able to manipulate caches on their own. Maybe we need better tooling to make this straightforward?

I would understand a permanent tracking ticket for issues relating to cache invalidation, but I don't think it is realistic to expect that the MW ecosystem would ever get to a point where page-specific purging is not useful and necessary functionality.

We'll have to settle for dreaming then.

Since it got bumped again, this ticket is more of an idealistic ticket, expressing that MediaWiki should do cache invalidation perfectly...

I get that, but my point was that this is inherently an impossibility - even if this could be guaranteed for MW core, it could not be guaranteed for extensions (which are outside of your control) nor for any dev environment, where things are inherently unstable.

Why can't extensions be fixed? Why can't extension developers vary their caches on page_touched or adjust cache TTLs/disable caches when that isn't possible? Relying on users to click purge is a bug, just like any other broken extension functionality.

I'm not suggesting we rely on users to click purge as part of any design, but it is a very useful tool when asking a user to help you debug an issue, remotely, or for users to use to workaround bugs that have not yet been fixed (I have definitely encountered a few of these in my time - both as a user and a developer). The alternative is for the extension to disable caching altogether on pages that use it, which I don't think would be a great idea.

Developers should be able to manipulate caches on their own. Maybe we need better tooling to make this straightforward?

If you mean a tool to purge a specific page whilst you are debugging, then yes, that would be great, but... well... isn't that what we already have?