Page MenuHomePhabricator

Allow limited indirection by redirects, up to $wgMaxRedirects levels.
Closed, ResolvedPublic

Description

Introduce a new variable in DefaultSettings.php :

$wgMaxRedirects = 1;

During page retrieval, it is to be counted down to zero while redirects are honored. Proceed to page rendering when either a non-redirect is found, or the count reached zero.

Rationale:

  • Some wikis are demanding multiple redirects being honored by the wiki software so as to allow a logical and structurally clean use of page names, with some indirection.
  • Other wikis must minimize redirect 'chains' to a length of 1 for one or another reason. This is the current state of affairs, and should be the default.
  • Theoretically at least, there may be reasons to not use redirects at all, thus $wgMaxRedirects = 0; might be an option.

Performance issues:
When the default is kept, there should be no difference to what we have now, so a majority of wikis will likely not experience a change in performance
With a reasonably small permissible number of redirects, performance should not be affected much since adding a level of indirection is considerably "cheaper" than having another image embedded in a page.
With huge redirect chains, however, performance might be slowed down a little, but that is the price to pay, if one really wants it.

We MUST NOT allow an 'unlimited' $wgMaxRedirects value for the obvious risk of an endless loop of page retrieves caused by redirect loops.
There is no point in adding loop detection logic to page retrieval, because that would be rather complicated, hardly ever triggered, plus unnecessary since $wgMaxRedirects is hit anyways after few steps.

See also: bug # 5503
See also: bug # 4578


Version: 1.11.x
Severity: enhancement

Details

Reference
bz11644

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:58 PM
bzimport set Reference to bz11644.

This would require the redirect chain to be memorized so we don't allow circular redirects. Currently, A redirects to B, B knows it came from A, and therefore won't allow it to redirect back.

Assume we have a chain A -> B -> C -> D -> A...it would require memorizing the entire chain so as not to create a redirect loop. WONTFIXing per bug 5503.

^demon's reasoning is spurious, but there are some better reasons on bug 5503 so I won't reopen it.

catlow wrote:

Of course his reasoning is spurious (since there will always be a finite limit to the redirect chain, so repetitions are picked up anyway), but I don't see any good reasons given under 5503 either. All I find there is lots of well-explained arguments in favour, and a one-sentence dismissal from Brion. Surely this is something that can at least be enabled and tried out; if it causes problems (of the type no-one has yet made any effort to describe) then individual wikis can opt to reset their chain limit to 1 (or some other appropriate value).

Reopening as there is clearly a lot of support for this and presumably it wouldn't be that hard to code.

(In reply to comment #3)

Of course his reasoning is spurious (since there will always be a finite limit
to the redirect chain, so repetitions are picked up anyway), but I don't see
any good reasons given under 5503 either. All I find there is lots of
well-explained arguments in favour, and a one-sentence dismissal from Brion.

There is a good argument given by Brion: the longer you allow redirect chains to be, the more handling and resolving overly long redirect chains becomes a nightmare. Also, the automatic redirect fixer was introduced in 1.13.0, making this request kind of moot.

catlow wrote:

The redirect fixer, as far I'm familiar with it, simply replaces double redirects with single ones (so if Elvys redirects to Elvis and Elvis redirects to Elvis Presley, it comes along and turns Elvys into a redirect to Elvis Presley). Which is usually fine, but isn't always the desired behaviour (as this example perhaps shows). We would like Elvys (assuming this were a common misspelling) to remain a redirect only to Elvis, so that if (for example) Elvis is made into a disambiguation page, Elvys will still point there as it should.

Can you explain the "nightmare" of handling and resolving chains? Do you mean that it's difficult to program (I don't believe it's that difficult, considering all the geniuinely complex things the software is able to do)? Or do you mean it would be difficult for wiki projects to deal with such chains? If the latter, then surely it's up to them to try it in practice and decide what limit works best. The main benefits will come just by raising the limit from one redirect to two, to handle chains of the type exemplified above.

ayg wrote:

I've always been of the opinion that allowing multiple redirects is a perfectly good idea. This might require changes to the redirect table, however, if we want that to remain properly useful. A variety of little things will have to be rethought, too, since they may need to deal with an unbounded number of pages instead of a single page and possibly a redirect to it.

I don't even see any reason for a hard upper limit on chain length. Loop detection is trivial, just maintain a list of all titles you've hit so far.

(In reply to comment #6)

I don't even see any reason for a hard upper limit on chain length. Loop
detection is trivial, just maintain a list of all titles you've hit so far.

DOS vulnerability, anyone? If a vandal (or a group of them, to evade the rate limits) manages to create an extremely long redirect chain and then sends thousands or millions redirect resolution requests to the API for the page at the beginning of the chain (or for the first 50 in the chain, even better), I daresay the servers wouldn't know what hit them.

This can be avoided by limiting chain length, or by adding an rd_final field pointing to the ultimate target of the redirect. That would still be exploitable, though, by creating two long chains and constantly editing a page to point their starts in turn. Doing both is probably best (rd_final would speed up redirect resolution significantly for long chains).

There are literally tens of thousands of similar such ideas that could be used for dos attacks. A huge quantity is easily dealts with using proper caching. Putting aside the fact that I was not proposing large $wgMaxRedirects anyways - when a redirect chain is usually cached bypassing all but start and final page for at least a short period of time, all those horror scenarios vaporize and boil down to "many page requests". Better, however, is of course to keep the "shortcircuit" chains in the data base forever, until a member of them is edited, and the edits consequences are processed via the job queue.

(In reply to comment #8)

Better, however, is of course to keep the
"shortcircuit" chains in the data base forever, until a member of them is
edited, and the edits consequences are processed via the job queue.

Right, I forgot my second doom scenario would only fill up the job queue and not have any DOS-like effect.

Right, I forgot my second doom scenario would only fill up the job queue and
not have any DOS-like effect.

Polluted job queue is also DoS, since it prevents other jobs from being completed.

(In reply to comment #10)

Right, I forgot my second doom scenario would only fill up the job queue and
not have any DOS-like effect.

Polluted job queue is also DoS, since it prevents other jobs from being
completed.

It doesn't prevent other jobs from being completed, it just delays them. It's not a nice scenario, but it won't bring the site down.

ayg wrote:

(In reply to comment #7)

DOS vulnerability, anyone? If a vandal (or a group of them, to evade the rate
limits) manages to create an extremely long redirect chain and then sends
thousands or millions redirect resolution requests to the API for the page at
the beginning of the chain (or for the first 50 in the chain, even better), I
daresay the servers wouldn't know what hit them.

That's ludicrously contrived. And it wouldn't hurt anything anyway. At most you're talking about dozens/hundreds/whatever of very fast queries per request, which isn't going to kill anything. There are vastly better DoS vectors, like sending loads of expensive stuff to the parser. Provide some real objections, please.

This can be avoided by limiting chain length, or by adding an rd_final field
pointing to the ultimate target of the redirect.

Depends. If the redirected-from message says something like "redirected [[A]] -> [[B]] -> [[C]] -> [[D]]", you have to look up the full chain anyway. If it says "redirected from [[A]]" and lets you work it out, yeah, this would be O(1) in the chain length with an rd_final field, AFAICT.

That would still be
exploitable, though, by creating two long chains and constantly editing a page
to point their starts in turn.

No it wouldn't. If I redirect [[A]] to [[B]] and [[B]] is already a redirect, I just need to set A's rd_to to [[B]] and A's rd_final to whatever rd_final is for B. It still a constant-time operation. But the DoS angle isn't an issue we need to pay much attention to anyway, resolving redirects is a lot faster than a lot of other things in the software.

(In reply to comment #12)

There are vastly better DoS vectors, like
sending loads of expensive stuff to the parser.

True.

That would still be
exploitable, though, by creating two long chains and constantly editing a page
to point their starts in turn.

No it wouldn't. If I redirect [[A]] to [[B]] and [[B]] is already a redirect,
I just need to set A's rd_to to [[B]] and A's rd_final to whatever rd_final is
for B. It still a constant-time operation.

Oh, yeah, forgot about that.

As was correctly pointed out, the DoS stuff is moot. rd_final would be useful, though.

skizzerz wrote:

Simple patch that adds $wgMaxRedirects

Attached a (very) simple patch that adds $wgMaxRedirects. The default value of 1 makes it work exactly like it does now. Values less than 1 disable redirects, and values greater than 1 parse out redirect chains until that limit is reached.

However, there is probably a better way of retrieving the redirect target than getting the article's content each time to recursively call Title::newFromRedirect, and the "redirected from" text does not show that a chain was followed (it shows the original source, so if [[A]] -> [[B]] -> [[C]] and you visited [[A]] and got redirected to [[C]], the text would be "redirected from [[A]]", which makes it difficult to detect that a redirect chain was followed -- although it still shows up properly on Special:DoubleRedirects).

The addition of the second parameter to Title::newFromRedirect and only defining it from Article::followRedirectText prevents normal links to redirects in the wiki text itself from being followed.

attachment MaxRedirects.patch ignored as obsolete

I would tend to think that calling Title::newFromRedirect() without a recurse-level parameter would default to $wgMaxRedirects instead of to 1. (Eg, use a null default and set to $wgMaxRedirects if we got null.)

skizzerz wrote:

Well, if it always recurses, then *links* to redirects will resolve to the target page, so if page A redirected to page B, putting [[A]] on the page would show [[B]] to the user after the Parser runs its course. I added the second parameter to stop that so it only recurses when resolving page names.

(In reply to comment #14)

However, there is probably a better way of retrieving the redirect target than
getting the article's content each time to recursively call
Title::newFromRedirect,

Article::getRedirectTarget()

(In reply to comment #16)

Well, if it always recurses, then *links* to redirects will resolve to the
target page, so if page A redirected to page B, putting [[A]] on the page would
show [[B]] to the user after the Parser runs its course.

Since links don't use Title::newFromRedirect(), I don't think that would be the case.

I added the second
parameter to stop that so it only recurses when resolving page names.

Under what circumstances is Title::newFromRedirect() used that you would *not* want it to recurse by default?

skizzerz wrote:

Screenshot illustrating why Title::newFromRedirect should not recurse by default

@Roan: Thanks, I'll look into that

@brion: I know it shouldn't be happening, but for some reason it does. I didn't just say that the links resolved as well because I felt like it, I said it because that's what appeared in my testing. Well... perhaps I need to clarify: It's the links after the #REDIRECT on the redirect page itself that resolve, not just any random wikilink. I attached a screenshot to illustrate what I mean by that (I didn't change anything in the edit box after I clicked show preview, and Test3 -> Test2 -> Test1 -> Test).

attachment Redirectlink.png ignored as obsolete

Ryan, that sounds like a very specific case where:

  1. The redirect-display-specific code is resolving the redirect to get the target
  2. The redirect-display-specific code displays that link

That's unrelated to general link handling in wiki page parsing.

You'd want to think what the ideal display here is -- do you want to show the immediate one-level-down target, or do you want to dive down all the way and show a tree or something? This is redirect-specific stuff, so should be something sensible for redirects.

skizzerz wrote:

new redirect page display

ok, I've got a working patch (still needs some more testing) that changes the redirect page's display when recursive redirects are detected, see attached screenshot for what it looks like. I want to get some feedback about it before committing (like, positioning, etc.) so please leave some :)

Attached:

Redirect_test.png (142×377 px, 7 KB)

skizzerz wrote:

implemented in r45973

I really don't like having two boolean parameters whose meanings are totally unclear when called, one of which changes the return type. I'd much rather have a separate named method for fetching the complete redirect target list.

skizzerz wrote:

code cleaned up in r46502