Page MenuHomePhabricator

Red interwiki links -- check for page existence across wikis
Open, LowestPublicFeature

Description

Author: xmlizer

Description:
it is important to have information about like we do out of the current
wikimedia instance.

As far as i know, they are on the same database so it *not* technically infeasible

It is especially necessary for wiktionnary


Version: unspecified
Severity: enhancement
See Also:
T39902: RFC: Implement rendering of redlinks in Parsoid HTML as post-processor

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 6:43 PM
bzimport set Reference to bz11.
bzimport added a subscriber: Unknown Object (MLST).

We have currently no way to know if an article exist on an other wiki. The
easiest choice is to don't show any link.

Moving as an enhancement request.
Doesn't block #17

  • Bug 222 has been marked as a duplicate of this bug. ***

Changing summary from "show interwiki and link to wikibooks and wiktionary in
different color if they do not exist", to highlight general nature of the
problem. I think the most common use of such an interwiki check would be to
help correct broken links
to other language-versions of a project, and broken links to/from meta.

For the specific case of the wiktionary links shown on a "this article does not
exist" page, we could keep a list of {title, project} pairs for all extant
wikiprojects in a given language, and only show a "you might want to check
related articles on other projects:"
message when titles *do* exist on other projects.

brian wrote:

Also see bug 2463 - if the same article exists on
another wiki, but is in a different language, perhaps
we should automatically translate and display it.

robchur wrote:

How do you propose to automatically translate the stuff? And how do you know
what article is the same one? Article titles aren't English in non-English
wikis, after all - so how do we set about determining what wikis have our
article? And if we could; what would happen once if we had two or more wikis
with the same one? How could the software tell which to translate?

wikimedia-bugzilla wrote:

I think the proposal is just to show if there happens to be an article with the
same name in another wiki.

webmaster wrote:

The proposal as I understand it, pertains to the following code:

[[w:Example_Article]]

Which would be red if Example_Article does NOT exist on Wikipedia
OR
would be blue if Example_Article DOES exist on Wikipedia.

Simply for interwiki linking, no translation at all.

This would cause HUGE cross-site sql quierying and suck up untold amounts of
bandwidth from the sender and receiver. Although this would be FANTASTIC for my
dual-database setup, I think this may be a pipe-dream.

I could possibly see it paired with some sort of caching system which queries
once a week and stores the link-state (red or blue) locally until next updated.
A lot of work, but could encourage a flurry of edits across several sites.

What do you guys think?

wikimedia-bugzilla wrote:

Once a week is a lot better than nothing. But as you said, a lot of work.

webmaster wrote:

Upon further thought, even a caching system with weekly/monthly queries would be
a heavy load and would lead to security questions like the ability for one db's
rights to query another (if not all on the same database). An alternate thought
I have is a negative-option link table in the database, which assumes all links
as non-existant red-linked pages until someone clicks it at which time it
queries the other database and if necessary updates the local cache of 'existing
pages'.

Just tossing out ideas here... Again, my ultimate thought here is 'pipe-dream'.
One other scenario could see some sort of function-call passing a boolean result
back in the url of the resulting page on the 2nd database. (Which is uglier than
Steve Buscemi...)

Any one have any other thoughts or should we kill this?

ssd.wiki wrote:

Another option would be to create an inter-wiki protocol (perhaps http based to
make it simple) that allows one wiki to query another to ask if the page exists,
and then cache it. This poll could be done (as you suggested) only when a user
follows the link. Doing this at the http level would remove the need to break
security and query the other database, at the expense of a small penalty (double
page load -- one for the interwiki query, one for the user's web browser).

webmaster wrote:

How about a ?action=pageexists function which outputs a raw 'true' or 'false' which could be retrieved via an HTTP GET.
Similar in fashion to the way that Special:Statistics does it:

http://en.wikipedia.org/w/index.php?title=Special:Statistics&action=raw

Should be light on both websites, could be cached for a period and invalidated thereafter.
Even lighter if instead of true/false it were 1/0. ;)
Every byte counts!

Of course, this would be 'off' by default on both sides.
The 'client' wiki would have to turn on $wgDoCrossWikiChecks=TRUE (to do the checking)
The 'server' wiki would have to enable $wgAllowCrossWikiChecks=TRUE (to provide the raw ?action=pageexists output)

If both aren't enabled, it won't work. (Handled gracefully, of course...)
This allows the greatest flexibility and control.

NB: I am changing the summary to reflect the fact that this isn't just for across Wikimedia projects, but rather across any two MediaWiki installations, that support this function.

robchur wrote:

Previous discussions have favoured some sort of API-based check, although in cases where the foreign database is directly readable (such as in the Wikimedia, Wikia, etc. cases), fetching the information straight out of that is preferable.

It would be quite useful to have an api on the calling wiki's side that says "please update this link with data about the target" if that is possible -- that could even allow for checking on existence or status of a page on an arbitrary site (say, linking to a bugzilla bug, and getting a different display of the linktext based on the bug's status... based on proper use of a similar API on the target site).

robert wrote:

There appears to be a two way solution to this problem since some interwiki links will be local, and others remote. To adaquetley accomodate both types a mixture of HTTP access and SQL access would have to be involved. To decide which one will be used the easiest solution would be to add two extra columns to the interwiki table, one for the database the wiki is on, and one for that wikis database table prefix (if applicable) - they would be optional.

When an interwiki to a wiki with a database listed is made an SQL query is made to the targets database that will decide whether or not it is red or blue, this could be cached (see explanation below).

If it is a remote wiki then a call to the api could be made (another database column would be required for the path to the api), e.g. http://en.wikipedia.org/w/api.php?action=query&titles=pagename - if it returns <page missing="" /> then it is missing, otherwise it is not. Currently MediaWiki returns 200 status codes for non-existant pages, so just going to the page would not be reliable. Caching would also be essential with this method.

Caching would involve an extra column in the pagelinks table indicating wether a page is a red link or not - this could be periodically updated by a mainteance script or when the page is purged (but not when edited as this could generate too much traffic).

Comments etc are appreciated and I will consider working on this in the new year if a flaw in my solution is not found.

mattj wrote:

I'm working on this bug at the moment, using a (hopefully) extensible system for both remote (API) and local (DB) sites. Not sure on an ETA, although i'll be merging in changes once sections get done.

sumanah wrote:

Matt, is your code available for us to look at? Or perhaps you've put this project aside? Per bug 20646 this may depend on the interwiki table.

mattj wrote:

I have an old implementation of most of this at http://svn.wikimedia.org/svnroot/mediawiki/branches/remotesite/ - I'm happy to bring it up to HEAD and fix in the missing bits if there's still interest, and people think this is the right approach.
(cross-posted to bug 20646 as this would address both bugs)

Because of votes rasing importance/priority according to following scheme:
15+ votes - highest
5-15 votes - high
Community must have a voice within development.

Regards, Kozuch
http://en.wikipedia.org/wiki/User:Kozuch

In case anyone else runs into this, ...

While {{#ifexists:file:...}} doesnt work for files hosted on Wikimedia Commons..

It is possible to use {{#ifexists:media:...}} on WMF projects, and it does accurately determine whether the media exists on Wikimedia Commons. There is at least one bug: bug 32031 about combining ifexists media: with file redirects.

I havent tested this with InstantCommons.

Yes, it does work on third-party wikis. On the OpenStreetMap Wiki, {{#ifexist: Media:Wikivoyage-logo.svg | yes | no }} returns “yes”.

So, first this is set to "highest" from some reasonable scheme, then some bot reduces it to "low" without any explanation, and now it's again arbitrarily set to "lowest". What about taking input seriously?

This bug represents an important barrier for collaboration and coordination between wikimedia projects. Thank you.

Hi Al-Scandar, sorry for not having clarified the action. Note to self: comment always when changing the prioritization of a report.

Bug status, priority, and target milestone fields summarize and reflect reality and do not cause it. This report has been open since 2004, and currently nobody seems to be working or planning to work on it. The "Lowest" just reflects that.

Bug 20646 - Store more target site metadata in interwiki table (which is blocking this report) seems to be in a similar situation, inactive.

On the other hand, it looks like the VisualEditor team is working on the related Bug 37902 - Implement rendering of redlinks and stubs (in a post-processor?)

If the Platform team or someone else wants to include this request in their plan, then they can set priority accordingly.

I'm going to have to suggest another configuration setting (even though MediaWiki is already bloated enough with those already). We need a way for wikis to opt out if they do not need nor want this change.

I'm not sure if this has been suggested, but should we just keep this feature confined to the same wiki farm? Or ask the target wiki if the source wiki wants to check the existence of the target wiki's article? I'd imagine possible abuse like {{#ifexist:google:foo}} when we would not feasibly check if a google page exists or not.

I suspect this feature request will be solved/solvable when Wikidata integrates all of the projects, esp. Wiktionary (bug number?)

Then instead of [[w:Blah]] magically being red based on weekly updates, wikis would use a template like {{ifexistson|enwiki}} to call entity:getSitelink( 'enwiki' ) in order to determine whether there is an enwiki page for the local page.

More interesting possibilities are possible once we also have Lua access to any item. (bug 47930).
Then the page [[wikt:ru:Foo]] can do magic relating to [[w:en:Blah]] using calls like {{ifexistson|enwiki|Q527633}}

(In reply to TeleComNasSprVen from comment #24)

I'm going to have to suggest another configuration setting (even though
MediaWiki is already bloated enough with those already). We need a way for
wikis to opt out if they do not need nor want this change.

I'm not sure if this has been suggested, but should we just keep this
feature confined to the same wiki farm? Or ask the target wiki if the source
wiki wants to check the existence of the target wiki's article? I'd imagine
possible abuse like {{#ifexist:google:foo}} when we would not feasibly check
if a google page exists or not.

I think just doing it for the current wiki farm is a sane approach; later we might do a wider system, but it would need us to significantly change the information held in the interwiki map.

(In reply to John Mark Vandenberg from comment #25)

I suspect this feature request will be solved/solvable when Wikidata
integrates all of the projects, esp. Wiktionary (bug number?)

No.

This is about skins and appearance, not structural data relating items. (Also, Wikidata isn't remotely the right way to go about this.)

The system we will build as part of the switch from the PHP parser to Parsoid for generating the read HTML will be able to achieve this as an extension of the existing system built for VisualEditor in fixing bug 37901, a pre-cursor to 37902.

VisualEditor requests the existence status of each of the links on the page and sets them to be red or otherwise based on this status; the same styling can be calculated server-side and returned as an API call (without client-side Javascript), which means that this can work for all users, and extending the status checking to other MediaWiki instances in the same farm (or even further afield) is a relatively simple extension of this principle.

(In reply to James Forrester from comment #27)

(In reply to John Mark Vandenberg from comment #25)

VisualEditor requests the existence status of each of the links on the page
and sets them to be red or otherwise based on this status; the same styling
can be calculated server-side and returned as an API call (without
client-side Javascript), which means that this can work for all users, and
extending the status checking to other MediaWiki instances in the same farm
(or even further afield) is a relatively simple extension of this principle.

Can the checks be feasibly done without placing too much load and performance worry on the servers? As someone noted above, even if some of the work was offloaded to cache such querying would already put a strain on the servers.

(In reply to TeleComNasSprVen from comment #28)

(In reply to James Forrester from comment #27)

(In reply to John Mark Vandenberg from comment #25)

VisualEditor requests the existence status of each of the links on the page
and sets them to be red or otherwise based on this status; the same styling
can be calculated server-side and returned as an API call (without
client-side Javascript), which means that this can work for all users, and
extending the status checking to other MediaWiki instances in the same farm
(or even further afield) is a relatively simple extension of this principle.

Can the checks be feasibly done without placing too much load and
performance worry on the servers? As someone noted above, even if some of
the work was offloaded to cache such querying would already put a strain on
the servers.

Sure; caching the state of the pages is already inside the API cluster's bailiwick, and this would just be a (large) client load on that. It's almost certainly feasible, albeit we may need to bump up the API cluster a little.

Change 492453 had a related patch set uploaded (by Setian; owner: Setian):
[mediawiki/core@master] Interwiki links will be blue if the page exists in the other wiki, and red if they do not.

https://gerrit.wikimedia.org/r/492453

Do you want to have an isset() for $wgInterwikiNamespaces, or do you want it defined in DefaultSettings.php as an empty array, or is there some better way to find out the namespace scheme of another wiki?

Adding Legoktm since he wrote the LinkRenderer.

Thanks for trying to take this on.

Doing raw database queries in LinkRenderer is going to be pretty bad from both a performance and code quality standpoint. I think the first step is to integrate interwiki titles into LinkCache (my now outdated attempt https://gerrit.wikimedia.org/r/c/mediawiki/core/+/177960). Then have LinkBatch and other systems support those interwiki titles in their database queries, and then finally have $title->isKnown() start working for interwiki links.

Does that seem like something you could take on? It's definitely not a trivial task - let me know if anything I suggested is unclear.

Does that seem like something you could take on? It's definitely not a trivial task - let me know if anything I suggested is unclear.

Yeah, I kinda figured it wasn't going to be an immediate +2. It's okay, I've got time to work on this stuff, but I might need your help figuring out what I'm doing. Is there some documentation over on MediaWiki.org you'd recommend, or is there some background info you can give me, that could be the start of a Link caching article? I was taking a look at the LinkCache class over at https://doc.wikimedia.org/mediawiki-core/master/php/classLinkCache.html but it's something I've never worked with before.

So what ended up being the snag on change 177960, by the way? I'm going to take a closer look at that patch and start playing around with the code and seeing what I can accomplish.

Change 492453 abandoned by WMFOffice:
Interwiki links will be blue if the page exists in the other wiki, and red if they do not.

Reason:
abandoning patch sets of globally banned user

https://gerrit.wikimedia.org/r/492453

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM
Aklapper removed subscribers: wikibugs-l-list, GWicke.