Page MenuHomePhabricator

Calculated Age of persons can become outdated until next cache purge
Open, LowPublicBUG REPORT

Description

Author: alexei.rudenko

Description:
Pages about persons where age is calculated automatically are saved in the cache until the next cache purge or the page change happens. Calculated value of the age is also saved.

After the date of birth of a person passes the saved age does not change.

Example: Paul McCartney has birthday on 18.06.1942. His wiki page was edited before 18.06.2013 and his age of 70 was saved in cache (which is correct). Then after 18.06.2013 no one changed his page or invalidated cache, that's why his age was 70 while it had to be 71 (which is not correct).

Notes: This can happen for all pages with calculated age. Possible, other calculated fields are also affected.

Possible solution: invalidate cache on some conditions met. For example, invalidate the cache of Paul McCartney's page on 18.06 of every year by settings Expires header.


See also:

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:05 AM
bzimport set Reference to bz49803.
bzimport added a subscriber: Unknown Object (MLST).

Age calculations are done in wikitext, and generally the software does not do anything different based on such things. It does not see individual pieces of information displayed in an infobox, but just plain text.

alexei.rudenko wrote:

Is that possible to specify caching settings for the page during its generation based on the expressions in wikitext?

  • Bug 51128 has been marked as a duplicate of this bug. ***

Core time-related magic variables already set parser cache TTL to one hour or one day (depending on the variable). So either this is a problem in an extension (ParserFunctions? Scribunto?), or Wikimedia wikis do something wrong.

Change 73188 had a related patch set uploaded by Brian Wolff:
Make {{#time:...}} expire parser cache after 12 hours.

https://gerrit.wikimedia.org/r/73188

I would note here for reference that this expiry wont affect squid/varnish cache (what the anons see), but usually that's not that big a deal.

(In reply to comment #0)

Pages about persons where age is calculated automatically are saved in the
cache until the next cache purge or the page change happens. Calculated value
of the age is also saved.

After the date of birth of a person passes the saved age does not change.

Example: Paul McCartney has birthday on 18.06.1942. His wiki page was edited
before 18.06.2013 and his age of 70 was saved in cache (which is correct).
Then
after 18.06.2013 no one changed his page or invalidated cache, that's why his
age was 70 while it had to be 71 (which is not correct).

Notes: This can happen for all pages with calculated age. Possible, other
calculated fields are also affected.

Possible solution: invalidate cache on some conditions met. For example,
invalidate the cache of Paul McCartney's page on 18.06 of every year by
settings Expires header.

Hmm, I just looked at the wiki. {{Age}} uses {{CURRENTDAY}} to calculate the person's age, which means the page should have been rechecked once an hour...

The docs suggest that CURRENTDAY returns UTC, so the timeout can be between 0 and 24 hours.

Change 73188 abandoned by Brian Wolff:
Make {{#time:...}} expire parser cache after 12 hours.

https://gerrit.wikimedia.org/r/73188

I saw another report of this problem recently on VP/T. A user was complaining that he often had to do manual purges of these kinds of pages, because users on OTRS or helpdesk were reporting incorrect ages in google etc.

(so because there is no varnish purge). So perhaps we do need a bit of improvement here....

Jackmcbarn created a very related patch at https://gerrit.wikimedia.org/r/#/c/135887/ . It adds the ability to set a TTL per transclusion, which is then exposed through the API.

Change 135887 had a related patch set uploaded by Jackmcbarn:
Add PPFrame::getTTL() and setTTL()

https://gerrit.wikimedia.org/r/135887

Change 135887 merged by jenkins-bot:
Add PPFrame::getTTL() and setTTL()

https://gerrit.wikimedia.org/r/135887

(In reply to Gabriel Wicke from comment #14)

Also related: https://gerrit.wikimedia.org/r/#/c/136617/,
https://gerrit.wikimedia.org/r/#/c/136618/ and
https://gerrit.wikimedia.org/r/#/c/136619/.

All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

As far as I can tell, no patch to fix the problem as specified was merged. There's some disagreement as to whether it is a problem, and if yes, whether it is a problem that can be solved (without disabling caching or something drastic like that, which we obviously can't do). Setting UNCONFIRMED.

Aklapper changed the subtype of this task from "Task" to "Bug Report".Feb 5 2022, 2:33 PM
Aklapper removed a subscriber: GWicke.

Still can reproduce. There is a solution, but I think it's suboptimal.

To my knowledge, there are three ways of getting aspects of the current time when calculating the age of people (time since an event, release, etc.):

  1. In wikitext, use magic words like {{CURRENTDAY}}, {{CURRENTMONTH}}, etc.
  2. In wikitext, use {{#time: ... }}/{{#timel: ... }} parser function.
  3. In Lua, call os.date. Using Lua is mandatory when you pull the birthdate from Wikidata.

In fact, you can also do #1 or #2 when using Lua. But this is usually discouraged.

Observation 1: Using #1 significantly decreases the lifetime of cached article renders.
Evidence: this ParserOutput::updateCacheExpiry call and this associative array.

Observation 2: Using #2 or #3 does not decrease the lifetime of cached article renders.
Evidence: T119366#3021907, T270378.
Counter-evidence: both Language:sprintfDate and mw.lua have dedicated procedures for predicting how much time is left until their respective output will change.

This is paradoxical. Somebody spent time (twice) creating an algorithm that is supposed to solve very precisely the problem because of which this task was created. But it isn't effective beacuse of... I don't know, just see the discussion on T119366.
On the other hand, there is a way to fix it (use #1 above), but:

  • Unlike, e.g., Language:sprintfDate it is very imprecise, and it unneccesarily underestimates.
  • It is not obvious why {{CURRENTDAY}} should be preferred over {{#time:j}}. Nobody is probably aware of this either.
  • When you use Lua, you want to avoid going back-and-forth between Lua and wikitext.

For number 2, i was under the impression that that was intentional, however i am in favour of changing the behaviour.

I don't understand the goal of the earlier patches. We added the ability to set a TTL per-template transclusion, and then totally ignore it except in the api output of Special:ExpandTemplates. Was there a plan to do something with that value?

The current scheme for setting frame TTL tries to set it to exactly the number of seconds for when the time would change (e.g. if you ask for current hour, and it is current 3:10, the TTL would be for 50 minutes). There's 2 things that are slightly concerning:

  • If its precisely on the second, there is some risk due to clock drift it might be re-calculated the second before the time changes, which would be unfortunate (e.g. if its now 3:10, we are asking for only the current hour, and we regenerate in 50 minutes, there is probably some risk that we regenerate the thing at 3:59:59, and then cache that for a while, which seems a bit unfortunate).
  • More importantly, this means everything that uses the time function will sync up. Instead of regenerations being spread out, they might happen all at once. This could be a significant problem if there are lots of pages.

Change 854490 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):

[mediawiki/extensions/Scribunto@master] Make lua time functions shorten cache like variables do

https://gerrit.wikimedia.org/r/854490

More importantly, this means everything that uses the time function will sync up. Instead of regenerations being spread out, they might happen all at once. This could be a significant problem if there are lots of pages.

Currently, there are ~1,050,000 (~16%) articles about living people on enwiki. ~1,140,000 (>17%) articles use Module:Age. Too many, I would say.

On the other hand, from the point of view of this specific task:

  1. The age changes only once per year. We don't really have to give cached articles just one day (we actually sometimes give them an hour), we can definitely do better.
  2. You do not always need to consider the day when computing the age. (You only need to consider it when the dates have the same month.) This could somewhat improve the situation around regenerations being synchronous.

Telling people to fix their codes according to #2 is an option, but it cannot be a solution.
Developing a tool (API) for computing age that considers #1 could be.

#time calls PPFrame::setTTL() but the frame TTL is not used by anything so it has no effect. CURRENTDAY etc. call ParserOutput::updateCacheExpiry() which does actually work.

In a comment on gerrit 135887, Anomie wrote:

Code itself seems ok. But it would be nice if this TTL were actually used (either here or in a followup) to actually do something about bug 49803.

So I guess we're still waiting for that. Although I think it would be fine to remove getTTL/setTTL (an incomplete project unused for 9 years) and follow the example set by CoreMagicVariables instead.

Although I think it would be fine to remove getTTL/setTTL (an incomplete project unused for 9 years)

Apart from #time and Scribunto (T270378) there is only one usage: ApiExpandTemplates (action=expandtemplates) where TTL is an optional part of the response.

follow the example set by CoreMagicVariables instead.

I like that approach. Language::sprintfDate() has done something similar, but it communicates via &$ttl with second-level precision. Again, the only two consuments are ParserFunctions and Scribunto (i.e., the two that should be fixed).

If we abandoned the frame TTL concept, we could re-purpose that parameter to be "the TTL for parser cache", share logic with CoreMagicVariables and make the callers responsible for calling ParserOutput::updateCacheExpiry(). This would fix #time.

Is this a plan?