Page MenuHomePhabricator

externallinks have links to self (not that external)
Closed, ResolvedPublic

Description

externallinks tables have lots of self-referencing links, mostly caused by templates, and interface language variants.

there should be no project self-referencing links inside 'externallinks', as they grow up to 50GB-per-db-per-instance of crap that will never be used.


Version: 1.16.x
Severity: major

Details

Reference
bz19637

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:38 PM
bzimport set Reference to bz19637.
bzimport added a subscriber: Unknown Object (MLST).

mike.lifeguard+bugs wrote:

This relies on having an internal link syntax for diffs and oldids (etc?) Dunno where the bug is for that, but there is one.

herd wrote:

(In reply to comment #1)

This relies on having an internal link syntax for diffs and oldids (etc?) Dunno
where the bug is for that, but there is one.

I believe he just wants to ditch any links, where the prefix matches wgServer, from being registered as 'exernallinks' (and not actually change the links).

But, on Commons, for example, how would it handle links to the secure server?

mike.lifeguard+bugs wrote:

(In reply to comment #2)

I believe he just wants to ditch any links, where the prefix matches wgServer,
from being registered as 'exernallinks' (and not actually change the links).

My misunderstanding then.

Do we actually use the fact that these are registered as external links for anything? I know the internal link table is (ab)used on meta to track some spam stuff... anything similar for these external links?

(In reply to comment #2)

But, on Commons, for example, how would it handle links to the secure server?

There could be a global variable of hostnames to ignore (defaulting to $wgServer).

(In reply to comment #3)

Do we actually use the fact that these are registered as external links for
anything? I know the internal link table is (ab)used on meta to track some spam
stuff... anything similar for these external links?

I believe externallinks is used for ConfirmEdit, Special:LinkSearch, and prop=extlinks in the API (among other things?). WMF projects are already whitelisted for ConfirmEdit, but it seems like people do use Special:LinkSearch to check for "local" external links. If people are attached to that functionality, maybe we could remove links from the table only if they also include &action=?

Fixed in r53104, links starting with $wgServer are no longer saved, unless overridden.

Live merge note: this fix is live as of 2009 July 11th