Page MenuHomePhabricator

Make Wikimedia ShortUrl short URLs even shorter using an Apache rewrite rule
Closed, ResolvedPublic

Description

This bug is related to bug 1450.

With https://www.mediawiki.org/wiki/Extension:ShortUrl enabled, "short" URLs look something like this:

https://ta.wikipedia.org/wiki/Special:ShortUrl/abc123

Some people would like to make this URL even shorter using an Apache rewrite rule. So the URL might look something like this:

https://ta.wikipedia.org/x/abc123


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=1450

Details

Reference
bz36164

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:18 AM
bzimport set Reference to bz36164.
bzimport added a subscriber: Unknown Object (MLST).

From bug 1450 comment 24:


Someone actually needs to write the rewrite rules.. I had some comments from
Roan somewhere...

Locally I've got

RewriteEngine On
RewriteRule ^/r/(.*)$ /w/index.php?title=Special:ShortUrl/$1

RewriteRule ^/r/(.*)$ /wiki/Special:ShortUrl/$1

Not sure why I've got the 2nd one commented out, bug it'd make sense to use
that format.

After that's setup, the simple part is just enabling it on the target wikis

It's been reviewed, and isn't actually ready for deployment, hence -shell,

-need-review and added bug 33551 as a blocking bug

From bug 1450 comment 25:


(In reply to comment #24)

Locally I've got

RewriteEngine On
RewriteRule ^/r/(.*)$ /w/index.php?title=Special:ShortUrl/$1
  1. RewriteRule ^/r/(.*)$ /wiki/Special:ShortUrl/$1

This seems to overlap with bug 16659 a bit and bug 17981 a bit. Basically, ops
should be exceedingly cautious in adding new URL structures/schemes that will

have to be supported indefinitely. I'd like Brion to weigh in, if possible.

From bug 1450 comment 26:


As for the proposed prefix of "/r/", I'm not sure I like it; I'd prefer
"/page/" or something myself, meanwhile using something like "/rev/" or

"/revision/" would make sense for shortened version permalinks (with oldid).

From bug 1450 comment 29:


(In reply to comment #26)

As for the proposed prefix of "/r/", I'm not sure I like it; I'd prefer
"/page/" or something myself, meanwhile using something like "/rev/" or
"/revision/" would make sense for shortened version permalinks (with oldid).

As mentioned in some of the other bugs, a lot of people are pushing for URLs to
be more localized, and this would be a step away from that. That is, "r" or

"rev" only work in English, really.

From bug 1450 comment 30:


Could make it something like /h/ so that everyone is confused equally (although
personally i like /page/). I also agree that /r/ might not be the best due to
the association with revision (Especially given how things like svn use r1234

for revision references).

From bug 1450 comment 43:


ShortURL rewrite rules are in Gerrit change #5433. If those are in production, then

Sam should be able to deploy this.

From bug 1450 comment 44:


Regarding the apache rewrite rule, should it apply to all wmf projects? If

not, which ones should include or exclude it?

From bug 1450 comment 45:


(In reply to comment #43)

ShortURL rewrite rules are in Gerrit change #5433. If those are in production, then
Sam should be able to deploy this.

Did you take into consideration the comments above regarding link syntax and
translations? There were a number of comments explaining why "/r/" was

less-than-ideal and this commit seems to have ignored all of them.

Extension:ShortURL should be adding a rule to our PathRouter. The rewrite rule should be:

RewriteEngine On
RewriteRule ^/r/.*$ /w/index.php

While the path router handles the title:

$router->add( '/r/$1', array( 'title' => 'Special:ShortUrl/$1' ) );

afeldman wrote:

This should presumably be limited on a condition rule to just the sites mentioned here for now: https://bugzilla.wikimedia.org/show_bug.cgi?id=1450#c39

(In reply to comment #3)

Extension:ShortURL should be adding a rule to our PathRouter. The rewrite rule
should be:

RewriteEngine On
RewriteRule ^/r/.*$ /w/index.php

While the path router handles the title:

$router->add( '/r/$1', array( 'title' => 'Special:ShortUrl/$1' ) );

Where does this code go? Where's the instance of $router come from?

The code would go in the ShortURL extension. You'd likely use a configuration variable for it, and you'd use the WebRequestPathInfoRouter hook.

Something like:

$wgShortUrlPath = "/s/$1";
$wgHooks['WebRequestPathInfoRouter'][] = 'egShortURLRouter';
function egShortURLRouter( $router ) {

global $wgShortUrlPath;
if ( $wgShortUrlPath ) {
  $router->add( $wgShortUrlPath, array( 'title' => Special::getTitleFor( 'ShortUrl', '$1' ) );
}
return true;

}

I think my concern here is that this is eventually going to be used as a Wikimedia URL shortener (for Twitter, etc.). You know that's going to happen.

So the planning and development should bear this in mind. The suggestion that you just pick an arbitrary letter is rather bad. It's fine for a short-sighted project (make URLs a bit shorter on a few Tamil wiki projects), but it leaves an awkward path for the future.

I've used a non-letter in my project http://example.org/!f921 - which may not be suitable here.

Quick question from the Analytics Team: will this lead to two separate hits in the server log for one page view? So will the short url show up just as a redirect to the full url or will the content be served under the short url?

@Diederik: It will show up as a redirect. So yes, two hits.

I implemented Daniel's suggestion at https://gerrit.wikimedia.org/r/#/c/6728/

However, only after implementing it I realized that I don't actually understand what extra flexibility it gives me. The apache rewrite rule still needs the prefix. Unless there's an advantage I gain by using PathRouter, I'd rather abandon that change and keep the code simpler.

  • For textual input PathRouter doesn't break on & and + in ways that Apache rewrites will. While this doesn't matter for most of the input used by ShortURL we should still be consistent in implementation.
  • This moves logic that should be handled by MediaWiki into MediaWiki where it belongs. Special pages, params, etc... should not be hardcoded into .htaccess files.
  • Eventually MediaWiki is going to start handling 404s on it's own. At that point many MediaWiki installations will start handling short urls by simply pointing their 404 ErrorDocument to MediaWiki. ShortURL needs to implement PathRouter in order to work under this environment.
  • PathRouter will work on servers using FastCGI that don't have a good rewrite engine of their own.

Thanks for the clarification :) Now if only I could find someone to review and merge that patch...

Patch has been reviewed and merged.

sumanah wrote:

Yuvi, can we resolve and close this bug?

Closing. Required Apache rewrite rules have been filed at https://rt.wikimedia.org/Ticket/Display.html?id=2121, and are copied here:

RewriteRule ^/s/.*$ /w/index.php

(In reply to comment #10)

Quick question from the Analytics Team: will this lead to two separate hits in
the server log for one page view? So will the short url show up just as a
redirect to the full url or will the content be served under the short url?

That would also depend on which logs you're talking about. The publically available stats files filter things out that don't start with /wiki/ AFAIK, so they wouldn't be affected. (I don't really know what I'm talking about, probably wrong)