Page MenuHomePhabricator

prevent DynamicPageList (DPL) extension from being cached
Closed, ResolvedPublic

Description

Author: kbaas

Description:
Regarding DynamicPageList (DPL) for WikiNews
I've got a number of requests, but most importantly:
Prevent these from being cached.

We can't use them effectively on the main page or in the article workspace
because they don't update on a refresh. i'm assuming that's because they get
cached. Isn't it as quick to generate a dpl as it is to find a page? - in both
cases you're querying a database. perhaps if it's an issue, you can add a
"nocache" or "cache: [yes or no]" parameter in the dpl tag.


Version: unspecified
Severity: enhancement

Details

Reference
bz2282

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:32 PM
bzimport set Reference to bz2282.
bzimport added a subscriber: Unknown Object (MLST).

rowan.collins wrote:

Every page in the entire wiki is cached at all sorts of levels (at the database,
at the "squid" servers, at the browser, etc); and no, generating dynamic content
will generally take a *lot* longer than finding a single page, because the
database query(s) involved will be much more complex. More to the point, the
caching is to avoid having to parse and render all the wikitext markup of the
page; since there is currently no way of caching some parts of a page and
calling in others dynamically, any "nocache" flag would have to be set
(internally) for the whole page, not just the DPL part - basically saying "never
cache this page; at all". Obviously, that's not something to do lightly, as it
places a *lot* more load on the server.

rowan.collins wrote:

[please try to make the summary of bugs describe the issue succinctly but precisely]

kbaas wrote:

Well then maybe I can be more specific about the problematic, and instead of
offering a solution, leave it open for consideration:

These issues are probably specific to wikinews, because news demands a fast
turnover time.

There are two pages, and a different issues of functionality for each, where
this issue is manifest:

  1. The main page: In the latest news section, currently one has to manually add

new news items. It has been suggested that dynamic page lists be used of the
following format:

<DynamicPageList>
category=June 1, 2005
category=published
count=20
</DynamicPageList>

But as a user wrote in an edit summary, "dpls don't work for this" - because of
caching, I presume.

  1. The Article Workspace:

There are section on the article workspace for developing stories and prepared
stories, as with the main section, these have to be manually edited, thus
impeding the workflow and making the task more complex for users - possibly
leading to loss of current and potential users.
Specifically, I've proposed a streamlined workspace, a working model of which
exists here: http://en.wikinews.org/wiki/Wikinews:Proposed_Workspace

But it generally doesn't update unless the page is edited.

The main goal, generally, is to streamline the news production process - a
streamlined production is a fundamental component of the site's distinct
functionality.

rowan.collins wrote:

Well, as long as the extension is only being used in a limitted number of
places, it could possibly cause the page to become "uncacheable" without too
much of a problem - it could even be coded to only have this effect on those
pages, so that it couldn't be abused/misused and cause server headaches. The
front page of Wikinews certainly seems like a candidate for "making an
exception" to me.

Various extension authors have wanted to "defeat" the cache in this way, but
no-one has yet coded a "clean" mechanism for it, to my knowledge. The best so
far seems to be Sebastien Barre's hack sent to the mailing list:
http://mail.wikimedia.org/pipermail/wikitech-l/2005-February/027763.html

kbaas wrote:

could it work the other way? (supply-side instead of demand-side): could the
addition of a specific category to any page trigger a cache invalidation of a
specific page? for instance, when an article gets added to "published", it
invalidates the cache on the main page. Such cache invalidation "triggers"
would cause fewer cache invalidations.

But might this be restrictive; must the set-up be hardcoded, or can it be set up
by an editor via wiki, and wouldn't that be complex? possibly there could be a
tag, that, when a page is saved with that tag, it invalidates the cache on a
given page (the cache of a page being edited is already invalidated anyways,
this would just bounce the invalidation to dependancies. such a tag could be
very simple:

<invalidatecache>
page=Main page
page=Article Workspace
</invalidatecache>

(alternatively the tag might be called "update" - but such a name would probably
cause the tag to be used more liberally)
and that could be put in a template, so that the addition of a template to a
page that looks something like:

[[category:published]]
<invalidatecache>
page=Main page
page=Article Workspace
</invalidatecache>

would automatically update the appropriate dpls

you could still restrict the pages that would respond to the triggers.

kbaas wrote:

it might be better if this was on the category level (but harder to implement?),
so that target caches would only be invalidated on addition or removal of
articles from the category, and possibly less target caches would need to be
specified. thus, the tags would only go in categories, and they would have the
same general format.

rowan.collins wrote:

That's an interesting idea - or 2 different ideas, really:

  • if a manual "invalidate" tag were invented, it would make sense to have this

inserted on individual pages, not category descriptions - since adding a page to
a category actually has no effect on the category's description page (in the
database, which pages belong to a category are stored completely seperately to
what introduction is displayed if you browse to that category)

  • if, on the other hand, you wanted certain *categories* to invalidate certain

caches (which better matches what we actually want to achieve), it would make
more sense to have some invisible hook in the code that watched for changes in
membership of those categories, and marked the appropriate caches invalid. This
could be controlled by some kind of admin-only interface that linked
category-page pairs such as "Category:Published -> Main page",
""Category:Published -> Article workspace", etc.

Although not necessarily *easy*, I think such an approach would be possible -
although for the main page, it might still prove hard to invalidate things like
the "squid" cache that transmits to logged-out users; I'm mainly saying that
because I don't know how it works, though, to be honest ;)

kbaas wrote:

Some related DPL features have been implemented. Now, the only thing holding
back a long-awaited improved workflow process for WIkiNews is this
cache-invalidation trigger.

The second idea: "...certain *categories* to invalidate certain
caches"..."category->page pairs" is what is desired. And that implementation
strategy sounds ideal. Regarding how to invalidate the caches: when one edits
a page, it invalidates the caches just fine. I'm sure a look through the edit
page/submit code will reveal the procedure that does that. invalidating a cache
on an add-to-category event might amount to simply calling a function that's
already written for the editing-a-page process. it seems to me that this would
be fairly easy.

fabian.zeindl wrote:

I reworked DynamicPageList to be much more dynamic ;-) (logical AND/OR joins of
categories etc.).
If MediaWiki is going to support something like this in a future release a good
"DoNotCache" Switch would be good...