Page MenuHomePhabricator

changes to links in a template do not update 'pagelinks', 'categorylinks', etc for pages including it
Closed, ResolvedPublic

Description

Author: bugzillas+padREMOVETHISdu

Description:
http://en.wikibooks.org/wiki/Programming:C_plus_plus/archive1 says "{{delete}}",
http://en.wikibooks.org/wiki/Template:Delete says "[ [ Category:Candidates for
speedy deletion ] ]", but
http://en.wikibooks.org/wiki/Category:Candidates_for_speedy_deletion says
nothing about "Programming:C_plus_plus/archive1".

Similarly for http://en.wikibooks.org/wiki/User:Paddu/C_plus_plus_talk also is
not listed.


Version: unspecified
Severity: major
URL: http://en.wikibooks.org/wiki/Category:Candidates_for_speedy_deletion

Details

Reference
bz939

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 7:05 PM
bzimport set Reference to bz939.
bzimport added a subscriber: Unknown Object (MLST).

bugzillas+padREMOVETHISdu wrote:

Oh! the category page started listing after I edited both the two pages. What's
going on?

Changing summary to:

"Category doesn't list all articles transcluding a template linking to the
category unless the articles are edited once more after the transclusion."

rowan.collins wrote:

This is a known issue, as described at
http://meta.wikimedia.org/wiki/Help:Template#A_category_tag_in_a_template.3B_caching_problem

Basically, the category tag was added to the template *after* the template was
added to the page; currently, this doesn't trigger an update of the
'categorylinks' table in the database for pages that include that template.
Viewing the page gets the category tag parsed, so you see the "Categories: " box
at the bottom; but only editing each page gets the actual database updated.

I can't find another bug open for this, so I'll try and summarise the summary a bit.

rowan.collins wrote:

Actually, this appears to be true for the 'links' table as well, as demonstrated
by a recent case at
http://en.wikipedia.org/wiki/Wikipedia:Help_desk#Whatlinkshere_not_updating
Basically, links in the template show up in Special:Whatlinkshere, but editing a
link in the template leaves a load of erroneous entries in the whatlinkshere for
the old target page.

And the same goes for imagelinks: compare the pages listed on
http://en.wikipedia.org/wiki/Image:Science-symbol-2.png (as using the image)
with those on
http://en.wikipedia.org/w/wiki.phtml?title=Special:Whatlinkshere&target=Template%3ASci-stub
(those using the template that uses the image)

This is awkward, because recalculating the links, categorylinks, etc tables for
every inclusion every time a page is edited could take a significant amount of
time to process (since it requires re-parsing the whole page). AIUI, this
recalculation is normally only done when each individual article is saved.

Perhaps a flag could be added to the 'cur' table to say "recalculate links on
next display", so that the page would only need *viewing*, not *editing* to
eliminate these effects (still not ideal, but it eliminates the need to create a
massive queue of pages to parse whenever a widely-used template is edited).
Since the cache should be invalidated anyway, the extra flag would just cause
the parser to update the links, categorylinks, etc once it was done.

The alternative is to have some logic when a template is edited that determines
what changes need to be made [e.g. remove 'links' entry for "foo", add
'categorylinks' entry for "Category:Bar"], and repeats them for each target
page. But this is dodgy, because *removing* a link from the template doesn't
guarantee that there isn't a link somewhere *else* in the article, so we could
be removing still-valid entries from the tables.

jnc wrote:

(In reply to comment #3)

this is dodgy, because *removing* a link from the template

doesn't guarantee that there isn't a link somewhere *else* in
the article, so we could be removing still-valid entries from
the tables.

Ooooh, good catch! I had mentally designed the same sort of
optimization code which you had done (at the start of your
post, above the clipped comment above - create a list of
deleted links, etc) and I hadn't seen this bug. There's
simply no way to fix this without mindblowing amounts of
hair/work. (See the end of this post, if anyone really cares.)

The right fix is probably the simple one you mixed, which is
to mark all articles which include that template (and we have
to make sure template recursion works too) so that they will
be re-parsed next time they are asked for. It's probably not
much more work at run-time, and is more than an order of
magnitude simpler implementation-wise.

There is yet *another* catch here though, which is that *any*
article could be a template. (I have templates in User: space
I transclude.) So really each article would need an "I am a
template" flag bit, which is set any time another article
transcludes it, and would never be cleared thereafter.
(Reference counts, which could be cleared, get really hairy;
skipping discussion for now, ask me if you care.) Still, fast
at run-time, easy to implement. And you could put off this
fix until later, so that non-Template: templates would still
have the bug until the "used as template" bit is added.

(Here's how I would do it the other way if I *had* to. First
I would parse the "template" [i.e. any page that's
transcluded, see above] in "before" and "after" states, and
from each one create a sorted list of categories and links
[i.e. removing duplicates]. Then look for ones that were
removed from the "before" list - those are the ones where you
can go to *their* links lists and *potentially* remove all
the articles which include that template. I say "potentially"
because for each article, you'd either have to re-parse the
source looking for duplicates [your bug above, this is the
expensive, brute-force solution], or keep a list of
links/categories which are in the article source [i.e.
without expanding templates], and check that list for a match
before deleting that article page. Far more complexity [extra
storage, run-time cost, especially on pages which don't do
any transclusions - although I suppose you could optimize
them out] - than it's worth.)

rowan.collins wrote:

(In reply to comment #4)

There is yet *another* catch here though, which is that *any*
article could be a template. (I have templates in User: space
I transclude.) So really each article would need an "I am a
template" flag bit, which is set any time another article
transcludes it, and would never be cleared thereafter.
(Reference counts, which could be cleared, get really hairy;
skipping discussion for now, ask me if you care.)

Well, if there was (as is being proposed) a seperate 'inclusionlinks' table (or
an 'inclusion' linktype field in a merged 'links' table) it ought to be possible
to create a query which for any page would return all pages transcluding that
page. For most pages outside the Template: namespace, it would simply return
nothing, so nothing would be flagged for re-parsing; but for anything that *was*
used as a template, it would list all the pages that needed flagging. Having
this seperation is a must anyway, because without it we'd be recalculating links
on articles that just happen to link to the template, and it's ugly enough that
we purge their cache entries (cf bug 734).

I'm going to create a bug for making this distinction in the DB, for ease of
reference, and set it to blocking this.

zigger wrote:

*** Bug 1824 has been marked as a duplicate of this bug. ***

silsor wrote:

This also affects images included in templates - the "following pages link to this file" list
that is automatically generated on image pages does not include pages that included the
template before the image was changed to something else.

Setting to "major" because this breaks the broken link table even further.

bugzilla.20.phyzome wrote:

If anybody wants an example to play with: On commons there is a template
(inbound links:
[http://commons.wikimedia.org/w/index.php?title=Special:Whatlinkshere&target=Template%3AGray%27s_Anatomy_plate])
that had [http://commons.wikimedia.org/wiki/Category:Gray%27s_Anatomy_plates a
category] added later.

  • Bug 1993 has been marked as a duplicate of this bug. ***

Astronouth7303 wrote:

As an alternative solution, should the "purge" action cause the page to be
reparsed? Meaning that using purge could replace null edits.

(In reply to comment #10)

As an alternative solution, should the "purge" action cause the page to be
reparsed? Meaning that using purge could replace null edits.

Purge _does_ cause the page to be reparsed. It does not, however, alter the links tables.

en.ABCD wrote:

Here's a question - should &action=purge update the links table?

  • Bug 2235 has been marked as a duplicate of this bug. ***

magnusrk+wiki wrote:

I use a small parser modification for this. action=submit&nulledit=true or action=nulledit in the url will do the same as a regular null
edit without visiting the edit page. Could even make a page link for it, but it's still just a band-aid. A fix so that updating the
template cascades *-links table updates would be much better.

puzzlet wrote:

*** Bug 1392 has been marked as a duplicate of this bug. ***

puzzlet wrote:

I don't know this could be the compromise, but how about putting backlinks via
templates as subnodes, just like how we have backlinks via redirects? For
example, if article A and B contain Template:T which has a link to X, X's
Special:Whatlinkshere should look like:

< X
The following pages link to here (up to 500 are shown):

  • Template:T
    • A
    • B
  • Eks (redirect page)
    • pages that link to Eks
  • other pages that link to X directly

If A and B also links to X directly, list them on the first level as well. And
once the link in Template:T is deleted, the special page would look like:

< X
The following pages link to here (up to 500 are shown):

  • Template:T (obsolete)
    • A
    • B
  • Eks (redirect page)
    • pages that link to Eks
  • other pages that link to X directly

I'm not a technician guy, but I think implementing this would be easy as making
Whatlinkshere to include the inclusion lists of the templates that link to the
page which is the subject of the special page. But to make this to work, we need
to separate lists of "inclusions" and "actual links" (i.e. [[:Template:T]]) of
templates, which Whatlinkshere pages don't seem to distinguish the difference.

beland wrote:

Making the system update properly in the first place can be done with or without
user-visible changes. Certainly that should happen regardless. I also like the
interface change you suggest; it would help a a lot of category-tidying work go
more smoothly. I see bug 1392 has been reopened; I wonder if these two
questions are now being considered separately or not.

Astronouth7303 wrote:

(In reply to comment #17)

...
I see bug 1392 has been reopened; I wonder if these two
questions are now being considered separately or not.

I believe that both this and bug 1392 would require the same DB changes. But to
totally fix bug 1392 may also require editing SpecialWhatlinkshere.php.

kellen wrote:

This is a major issue for the wikibooks Cookbook -- we have most of our recipes
under category "Recipes" via a template, but would like to move them to category
"Recipe" -- if we just update the template, we'll have to apply several hundred
null edits to refresh each recipe page. A timely fix for this would save us a
lot of grunt work.

  • Bug 3349 has been marked as a duplicate of this bug. ***

quietust wrote:

A command to flush and completely regenerate all link tables (by effectively
performing a null edit on every single page in the wiki) would be useful (though
slow) as a temporary solution to this problem; however, I don't recall if such a
command currently exists.

rowan.collins wrote:

(In reply to comment #21)

A command to flush and completely regenerate all link tables

That'll be the 'maintenance/refreshLinks.php' command-line script. :)

gangleri wrote:

(In reply to comment #21)

A command to flush and completely regenerate all link tables (by effectively
performing a null edit on every single page in the wiki) would be useful (though
slow) as a temporary solution to this problem; however, I don't recall if such a
command currently exists.

related topic to this:
bug 2098: action=purge and categories - implement a more effectiv action=update
or action=rebuild

avarab wrote:

*** Bug 3854 has been marked as a duplicate of this bug. ***

wiki.bugzilla wrote:

*** Bug 4111 has been marked as a duplicate of this bug. ***

robchur wrote:

Bump: Tim, how does the templatelinks stuff affect this bug now?

  • Bug 4929 has been marked as a duplicate of this bug. ***

This is fixed in CVS and on Wikimedia sites. Will be released in MediaWiki 1.6.

A "job queue" system has been implemented. There may be some delay between when the template is
edited and when the relevant changes are reflected in the links tables. The length of the delay
might need some fine tuning, delays of up to a day are to be expected until we sort out the system
administration and performance aspects of this.