Page MenuHomePhabricator

Enable Extension:ShortUrl on or.wikipedia, ta.wikipedia...
Closed, ResolvedPublic

Description

Author: dig.mediazilla

Description:
(It could be that something like this was already proposed, I just haven't fount
it).

It is nice to see what the URL is pointing to in English (or any ASCII-7 based
alphabet). Unfortunately, it is not the case for most other languages. For
most of them, especially for the languages which are not based on Latin
alphabet, URL-escaping makes URL unreadable and very very long. So, it would be
nice to have some kind of short URLs for wikipedia pages. This will make it
easier for the user to copy and paste the short URL into e-mail or on the web
page. (I saw the reference to the wikipedia article in Russian in e-mail -- it
is horrible: 3 lines of 80 chars each, absolutely unreadable).

The short URL itself could be something like this:

http://www.wiki???????.org/u/xyzuv

where

????? is a project name ({m,p}edia, etc)
u     is a special prefix (may be empty);
xyzuv is an "encoded" form of longer URL
      (very much like tinyurl.com's one).

It also would be nice to have this short URL on the printed page (as text, as
well as a link).

As a side-effect, the bots on #XXrc-channel will be less verbose. We, at
#ru.wikipedia, are using such short URLs, they seem more practical, and overall
impression is better, than while using wprc-bots directly.

Shorter URLs, happier users.

Best regards,

DIG (Dmitri I GOULIAEV)
1024D/63A6C649: 26A0 E4D5 AB3F C2D4 0112 66CD 4343 C0AF 63A6 C649


Version: unspecified
Severity: enhancement

Details

Reference
bz1450
TitleReferenceAuthorSource BranchDest Branch
Use memory-optimized runners on .gitlab-ci.ymlrepos/data-engineering/dumps/mediawiki-content-dump!9xcollazotest-memory-optimized-runnersmain
gitlab: Parameterize node toleration and selectorrepos/releng/gitlab-cloud-runner!252dduvallreview/parameterize-runner-workloadmain
gitlab: Revert resource name changesrepos/releng/gitlab-cloud-runner!251dduvallreview/revert-gitlab-resource-name-changesmain
gitlab-runner: Create memory optimized runnersrepos/releng/gitlab-cloud-runner!250dduvallreview/memory-optimized-runner-poolmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:09 PM
bzimport set Reference to bz1450.

tietew-mediazilla wrote:

index.php?title=ANYTHING&curid=12345 shows an article curid=12345.

You can use mod_rewrite:
RewriteEngine on
RewriteRule ^/u/([0-9]+) /w/index.php?title=-&curid=$1 [QSA]

Redirecting to WikiMedia.
We probably want to add an entry about that in the manual.

What about using the page's oldid to create something like http://xx.wikixedia.org/o/### ?

(In reply to comment #3)

What about using the page's oldid to create something like
http://xx.wikixedia.org/o/### ?

Revisions have oldids, pages have curids. You probably want to be able to link to the curid too.

arjunaraoc wrote:

The additional problem appears when you want to make wiki book and opt for printer friendly version of the article. At the end of the article, a really long URL in US ASCII appears. I think it need not be in US ASCII, as international domain names have been approved. Please correct the print version to a unicode url, so that the page can be accessed easily as well as understood in the human readable language.

Appreciate higher priority to fix this bug, as we are ready to publish e-books in Telugu
Example Ubuntu user guide page url as it appears in printer friendly version.
http://te.wikibooks.org/wiki/%E0%B0%89%E0%B0%AC%E0%B1%81%E0%B0%82%E0%B0%9F%E0%B1%81_%E0%B0%B5%E0%B0%BE%E0%B0%A1%E0%B1%81%E0%B0%95%E0%B0%B0%E0%B0%BF_%E0%B0%AE%E0%B0%BE%E0%B0%B0%E0%B1%8D%E0%B0%97%E0%B0%A6%E0%B0%B0%E0%B1%8D%E0%B0%B6%E0%B0%A8%E0%B0%BF
and a shorturl if it were to be used.
http://te.wikibooks.org/wiki/ఉబుంటు వాడుకరి మార్గదర్శని

(In reply to comment #5)

I think it need not be in US ASCII, as
international domain names have been approved.

Domain names are not the same as paths, they're unrelated.

(In reply to comment #3)

What about using the page's oldid to create something like
http://xx.wikixedia.org/o/### ?

http://www.mediawiki.org/wiki/Extension:ShortUrl might help.

arjunaraoc wrote:

(In reply to comment #6)

(In reply to comment #5)

I think it need not be in US ASCII, as
international domain names have been approved.

Domain names are not the same as paths, they're unrelated.

I think they are related. If not, I would like to understand why the international text part of the URL is represented in US ASCII.

(In reply to comment #7)

(In reply to comment #3)

What about using the page's oldid to create something like
http://xx.wikixedia.org/o/### ?

http://www.mediawiki.org/wiki/Extension:ShortUrl might help.

I'm pretty sure that has been deployed on some projects.

(In reply to comment #8)

I think they are related. If not, I would like to understand why the
international text part of the URL is represented in US ASCII.

brion's comment on this blogpost explains it i suppose
http://ultimategerardm.blogspot.com/2011/04/sharing-tamil-wikipedia-url.html

(In reply to comment #9)

(In reply to comment #7)

http://www.mediawiki.org/wiki/Extension:ShortUrl might help.

I'm pretty sure that has been deployed on some projects.

Nope, its fresh out of the oven, we are planning to deploy on tamil wikipedia. Review request needs to be raised

Please enable this extension it in tamil wiki projects (Wikipedia, Wiktionary,Wikinews). Would like short urls of form ta.wiki*.org/r/abcdef on each of pages. Find the link to community consensus on the same http://ta.wikipedia.org/wiki/விக்கிப்பீடியா:குறுந்தொடுப்பு#Support_to_install_mw:Extension:ShortUrl_on_tamil_wiki_projects

Changing the priority to normal. Currently Tamil wikipedia uses external domain tawp.in for shortlinks which are being used. Atleast 500 spread across web. Delay in having in house shortner exposes to a risk of large dead links to Wikipedia when the tawp.in server goes down. We already had amazon outages and few corp firewalls block tawp.in since its hosted in EC2.

I know it's been a while - but the issues pointed to have been fixed in r103665 and r103035. Can this be reviewed again?

Roan pointed out more issues, have been fixed in r104219.

Is there anything else that prevents progress on this. A lot of non-latin wikis are really looking forward to this for ages, can some priority given to this to close and deploy them?

Sam/others, Can you please check the status of the above RT. Thanks!

We've decided that this can be deployed without any mod_rewrite changes right now. As Reedy notes there, /wiki/Special:ShortUrl/12345 is better than some of the alternatives. Roan has said he will look at the code again tomorrow.

sumanah wrote:

<Reedy> ShortUrl has 1 outstanding protocol relative issue
<Reedy> And getting the apache rule correct and setup
<Reedy> that's the only blockers to its deployment

from IRC today

sumanah wrote:

<hexmode> Reedy: so what do we need to do re: shorturl? the actual code?
<hexmode> just have better doc for the rewriterule?
<Reedy> Works fine locally
<Reedy> well, the protocol relativity issue needs fixing first
<hexmode> that and then the rewriterule?
<Reedy> Getting someone from ops (though, I'm supposed to be able to do it) to actually implement the rules
<Reedy> Yup, few database tables to create on target wikis, but it'll be fine
<Reedy> http://wiki/r/v takes me to http://wiki/wiki/Wikia_code/includes/api locally :)

So, bug 33551 needs fixing so we get correct https:// or http:// on each wiki, then we need a list of all wikis that will be wanting to enable ShortUrl

From there, we can use that list to create all the relevant database tables, and only add the rewrite rules to those wikis

Someone actually needs to write the rewrite rules.. I had some comments from Roan somewhere...

Locally I've got

RewriteEngine On
RewriteRule ^/r/(.*)$ /w/index.php?title=Special:ShortUrl/$1

RewriteRule ^/r/(.*)$ /wiki/Special:ShortUrl/$1

Not sure why I've got the 2nd one commented out, bug it'd make sense to use that format.

After that's setup, the simple part is just enabling it on the target wikis

It's been reviewed, and isn't actually ready for deployment, hence -shell, -need-review and added bug 33551 as a blocking bug

(In reply to comment #24)

Locally I've got

RewriteEngine On
RewriteRule ^/r/(.*)$ /w/index.php?title=Special:ShortUrl/$1
  1. RewriteRule ^/r/(.*)$ /wiki/Special:ShortUrl/$1

This seems to overlap with bug 16659 a bit and bug 17981 a bit. Basically, ops should be exceedingly cautious in adding new URL structures/schemes that will have to be supported indefinitely. I'd like Brion to weigh in, if possible.

I do have some concerns about this extension:

  • it creates a new numeric identifier to describe pages, which is very similar to existing page_id
  • but it isn't page_id -- it refers to a page title, and thus if a page is renamed ends up referring to the old redirect

Since it's just a number with only internal meaning, I'd be more inclined to recommend using the page_id rather than creating a new id just for short links. This also saves the trouble of installing a new database table on every wiki, and would make it trivial to enable it for *all* wikis instead of just some.

As for the proposed prefix of "/r/", I'm not sure I like it; I'd prefer "/page/" or something myself, meanwhile using something like "/rev/" or "/revision/" would make sense for shortened version permalinks (with oldid).

(In reply to comment #26)

I do have some concerns about this extension:

  • it creates a new numeric identifier to describe pages, which is very similar

to existing page_id

  • but it isn't page_id -- it refers to a page title, and thus if a page is

renamed ends up referring to the old redirect

Since it's just a number with only internal meaning, I'd be more inclined to
recommend using the page_id rather than creating a new id just for short links.
This also saves the trouble of installing a new database table on every wiki,
and would make it trivial to enable it for *all* wikis instead of just some.

I think some people didn't want the short urls changing where they pointed to on page move, and wanted them to still work after a page delete/undelete cycle. (If I remember correctly)

(In reply to comment #27)

I think some people didn't want the short urls changing where they pointed to
on page move, and wanted them to still work after a page delete/undelete cycle.
(If I remember correctly)

That's a reasonable concern... I kinda want to be able to have hex or alphanumeric short codes here just to make sure they don't get confused with page ids though. *hmmmmm*

(In reply to comment #26)

As for the proposed prefix of "/r/", I'm not sure I like it; I'd prefer
"/page/" or something myself, meanwhile using something like "/rev/" or
"/revision/" would make sense for shortened version permalinks (with oldid).

As mentioned in some of the other bugs, a lot of people are pushing for URLs to be more localized, and this would be a step away from that. That is, "r" or "rev" only work in English, really.

I don't think this is a deal-breaker, but I do think it's something to keep in mind.

Could make it something like /h/ so that everyone is confused equally (although personally i like /page/). I also agree that /r/ might not be the best due to the association with revision (Especially given how things like svn use r1234 for revision references).

(In reply to comment #26)

I do have some concerns about this extension:

  • it creates a new numeric identifier to describe pages, which is very similar

to existing page_id

  • but it isn't page_id -- it refers to a page title, and thus if a page is

renamed ends up referring to the old redirect

It's not numeric, it's alpha numeric

(In reply to comment #30)

Could make it something like /h/ so that everyone is confused equally (although
personally i like /page/). I also agree that /r/ might not be the best due to
the association with revision (Especially given how things like svn use r1234
for revision references).

People will think we're turning into Reddit

(In reply to comment #31)

It's not numeric, it's alpha numeric

base36 to be exact.

Is there anything else blocking this extension from deployment?

(In reply to comment #32)

Is there anything else blocking this extension from deployment?

Right now we're mostly working on 1.19 deployment so I think this'll be another week *at least* before it can be deployed. I'll try to get Sam's opinion on this, though.

sumanah wrote:

I don't know whether there are any other TODOs blocking this extension from deployment. But just to comment on the deployment schedule: as Mark mentioned, this is competing with the MediaWiki 1.19 deployment schedule:

https://www.mediawiki.org/wiki/MediaWiki_1.19/Roadmap

which will probably start this week and finish up on March 1st.

(In reply to comment #27)

I think some people didn't want the short urls changing where they pointed to
on page move, and wanted them to still work after a page delete/undelete cycle.
(If I remember correctly)

Seems easier to keep the page_id on undelete (didn't we fix it?), on the simple cases at least (no conflicting ids).

(In reply to comment #35)

(In reply to comment #27)

I think some people didn't want the short urls changing where they pointed to
on page move, and wanted them to still work after a page delete/undelete cycle.
(If I remember correctly)

Seems easier to keep the page_id on undelete (didn't we fix it?), on the simple
cases at least (no conflicting ids).

That would be bug 26123.

sumanah wrote:

(In reply to comment #26)

I do have some concerns about this extension:

Brion, are your concerns dealbreakers, or are you ok with deploying it anyway?

(In reply to comment #24)

So, bug 33551 needs fixing so we get correct https:// or http:// on each wiki,

This is now done.

then we need a list of all wikis that will be wanting to enable ShortUrl

From there, we can use that list to create all the relevant database tables,
and only add the rewrite rules to those wikis

Yuvi, can you collate that list?

I think my concerns are all dealt with or minor enough I don't mind at this point.

orwiki,hiwiki,tawiki have got community consensus.

'orwiki' => true,
'hiwiki' => true,
'tawiki' => true,
'tawikibooks' => true,
'tawikinews' => true,
'tawikiquote' => true,
'tawikisource' => true,
'tawiktionary' => true,

can be treated as first list, others will request in seperate bugs.

I think the only real outstanding question was whether just the extension would be enabled or if there would be accompanying Apache changes (changing link structure). Simply enabling the extension isn't a big deal; adding a link structure (such as /r/) that has to be supported indefinitely is a big deal.

(In reply to comment #40)

I think the only real outstanding question was whether just the extension would
be enabled or if there would be accompanying Apache changes (changing link
structure). Simply enabling the extension isn't a big deal; adding a link
structure (such as /r/) that has to be supported indefinitely is a big deal.

Not trying to sidetrack the issue here, but I really need "atleast just the extension deployed" sooner for 'tawiki' even if redirect rules take time. Tamil Wikipedia already has tawp.in for a year now and has been using a similar code on a private server which redirects tawp.in/r/* . The whole point of making an extension in favor of @Mountain's standalone code was point of stability, so that I dont need to check my server every time someone pokes my server is not redirecting.

(In reply to comment #41)

(In reply to comment #40)

I think the only real outstanding question was whether just the extension would
be enabled or if there would be accompanying Apache changes (changing link
structure). Simply enabling the extension isn't a big deal; adding a link
structure (such as /r/) that has to be supported indefinitely is a big deal.

Not trying to sidetrack the issue here, but I really need "atleast just the
extension deployed" sooner for 'tawiki' even if redirect rules take time. Tamil
Wikipedia already has tawp.in for a year now and has been using a similar code
on a private server which redirects tawp.in/r/* . The whole point of making an
extension in favor of @Mountain's standalone code was point of stability, so
that I dont need to check my server every time someone pokes my server is not
redirecting.

Without the redirect rules being setup, there is little point having shorturl enabled (imho). You're going to end up with https://ta.wikipedia.org/wiki/Special:ShortUrl/abcd123 which is better than what would be there already, but I'm not sure

Ignoring that, setup on the various wikis is a simple task and would only take a few minutes to do

ShortURL rewrite rules are in Gerrit #5433. If those are in production, then Sam should be able to deploy this.

afeldman wrote:

Regarding the apache rewrite rule, should it apply to all wmf projects? If not, which ones should include or exclude it?

(In reply to comment #43)

ShortURL rewrite rules are in Gerrit change #5433. If those are in production, then
Sam should be able to deploy this.

Did you take into consideration the comments above regarding link syntax and translations? There were a number of comments explaining why "/r/" was less-than-ideal and this commit seems to have ignored all of them.

(In reply to comment #29)

As mentioned in some of the other bugs, a lot of people are pushing for URLs to
be more localized, and this would be a step away from that. That is, "r" or
"rev" only work in English, really.

I don't think this is a deal-breaker, but I do think it's something to keep in
mind.

Well, the extension itself is requested since URLs of non latin wikis get converted into percentage encoding like this :- http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BF%E0%AE%95%E0%AF%8D%E0%AE%95%E0%AE%BF%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AF%80%E0%AE%9F%E0%AE%BF%E0%AE%AF%E0%AE%BE:%E0%AE%86%E0%AE%B2%E0%AE%AE%E0%AE%B0%E0%AE%A4%E0%AF%8D%E0%AE%A4%E0%AE%9F%E0%AE%BF

So I dont really see a case why people would want to localize this which would mean URLs get percentage encoded again.

(In reply to comment #30)

Could make it something like /h/ so that everyone is confused equally (although
personally i like /page/). I also agree that /r/ might not be the best due to
the association with revision (Especially given how things like svn use r1234
for revision references).

I dont mind /h/ or /m/ /l/ or whatever. Even /page/ is fine, but just that /<singlechar>/ would mean the url is also short (and hence faster to type on say mobiles)

@MZMcBride, am sorry I couldnt find other comments why /r/ is less-than-ideal to stop this bug from moving? Can you please point them out?

Can you also please point out what would be more ideal since few hundred communities are waiting for a way in which 'sane' URLs of their language wikis which are not percent-encoded, can be shared freely without annoying other people.

(In reply to comment #46)

Can you also please point out what would be more ideal since few hundred
communities are waiting for a way in which 'sane' URLs of their language wikis
which are not percent-encoded, can be shared freely without annoying other
people.

Sorry, this bug got a bit hijacked by tangential issues. This bug is about enabling the ShortUrl extension on specified wikis. I've filed bug 36164 to track the rewrite rule issue.

As far as I know, there's nothing blocking the immediate deployment of this extension to specified Wikimedia wikis. The rewrite rules can and should be discussed separately at bug 36164.

afeldman wrote:

I agree with Srikanth re: the undesirability of internationalization in this case. Using [a-zA-Z0-9] is also in line with most other link shortening services, which have become ubiquitous. I think we can deploy as-is and revisit the issue later if a specific project forms consensus around wanting an internationalized redirector.

Brion far above voiced concern that shortlinks could be confused for revision-ids. I don't think it's current issue but makes me wonder if we should use /l/ instead of /r/.

I also suppose we'll want to use a rewrite condition so the redirect only applies to the wikis mentioned in comment 39?

(In reply to comment #48)

I agree with Srikanth re: the undesirability of internationalization in this
case. Using [a-zA-Z0-9] is also in line with most other link shortening
services, which have become ubiquitous. I think we can deploy as-is and revisit
the issue later if a specific project forms consensus around wanting an
internationalized redirector.

Brion far above voiced concern that shortlinks could be confused for
revision-ids. I don't think it's current issue but makes me wonder if we should
use /l/ instead of /r/.

I also suppose we'll want to use a rewrite condition so the redirect only
applies to the wikis mentioned in comment 39?

How about /s/?

afeldman wrote:

+1 to /s/, I think we just need to get the router code added as you outlined in https://bugzilla.wikimedia.org/show_bug.cgi?id=36164#c6 and we should be able to plan deployment.

The router code has been merged in. What next for deployment?

+1 to /r/, +0 to /s/

(In reply to comment #51)

The router code has been merged in. What next for deployment?

+1 to /r/, +0 to /s/

Can someone tell me why /r/ is even considered a valid path for this purpose? What word is it that makes it so that '/r/' makes sense? Cause to me /r/ speaks 'Revision', ie: &oldid, and short url does NOT use that.

Frankly I'd reject the use of that path on the grounds that it would preclude the ability to use /r/ to point to &oldid in the future if we found a reason to do that.

I think /r/ was considered simply because that's what is being currently used by the wikis. See http://tawp.in/r/rtd for an example.

So that's a 'legacy' reason, and it looks like there is now consensus on /s/ (at least among the comments so far on this bug)

(In reply to comment #53)

I think /r/ was considered simply because that's what is being currently used
by the wikis. See http://tawp.in/r/rtd for an example.

Assuming the registry of this extension will not match all entries on the private server at http://tawp.in/r, there is no reason to use same character. It could even be confusing.

When ta.wikipedia.org/s/<shorturl id> is set up, the maintainer of http://tawp.in/ could set up http://tawp.in/s/* to redirect to http://ta.wikipedia.org/s/*

@Krinkle: Was merely providing historical context :)

(Also: I run tawp.in)

sumanah wrote:

Just talked to CT Woo. Ben Hartshorne will be working on this, but is currently refining the Apache rewrite rules as part of his Swift work. The Apache rewrite rule work will also help with ShortURL.

Redirects are in place.

Enabled on tawiki and orwiki, hiwiki (another bug) and the other tawikis

Open new bugs for other wikis etc.