Page MenuHomePhabricator

Non-canonical HTTPS URLs quietly redirect to HTTP
Closed, ResolvedPublic

Description

https://mediawiki.org redirects to http://www.mediawiki.org/ currently. This is wrong and should be fixed.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=68553

Details

Reference
bz31369

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:56 PM
bzimport set Reference to bz31369.

There are lots of these (I think https://wikipedia.org is broken too) that are stuck in Squid cache and need to be purged.

(In reply to comment #1)

There are lots of these (I think https://wikipedia.org is broken too) that are
stuck in Squid cache and need to be purged.

Are you sure this has to do with Squid cache? Looking at http://noc.wikimedia.org/conf/redirects.conf, it says:

RewriteCond %{HTTP_HOST} mediawiki.org
RewriteRule ^/(.*)$ http://www.mediawiki.org/$1 [R=301,L]

I assumed that was the root cause of the issue here, if redirects.conf is up-to-date.

Like this?

RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} mediawiki.org
RewriteRule ^/(.*)$ http://www.mediawiki.org/$1 [R=301,L]

RewriteCond %{HTTPS} on
RewriteCond %{HTTP_HOST} mediawiki.org
RewriteRule ^/(.*)$ https://www.mediawiki.org/$1 [R=301,L]

(In reply to comment #3)

Like this?

RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} mediawiki.org
RewriteRule ^/(.*)$ http://www.mediawiki.org/$1 [R=301,L]

RewriteCond %{HTTPS} on
RewriteCond %{HTTP_HOST} mediawiki.org
RewriteRule ^/(.*)$ https://www.mediawiki.org/$1 [R=301,L]

That looks about right. It's unclear whether this bug should be expanded to cover the other (several dozen) similar cases in redirects.conf. It's also unclear whether there's a saner system to do this. %{HTTPS} might be best... but it's going to result in an awful lot of code duplication for one letter.

turned this into an RT ticket as well...

This is one of the more evil HTTPS bugs I've seen.. could lead people to believe they're using HTTPS when they're not. Elevating importance and severity.

Yeah. I'm going to fix this after the hack-a-ton.

For the interested, Ryan Lane says on RT that he hasn't had time to do anything with HTTPS right now, but he is aware of it.

whoops thinking of the wrong thing

issue seems to be fixed now. Daniel "mutante" Zahn did some tweaks last week :

With curl (-I show headers, -L follow Location: hints)

Without trailing slash:
$ curl -IL https://www.mediawiki.org 2>/dev/null|grep Location
Location: https://www.mediawiki.org/wiki/MediaWiki
$

With trailing slash:
$ curl -IL https://www.mediawiki.org/ 2>/dev/null|grep Location
Location: https://www.mediawiki.org/wiki/MediaWiki
$

Seems fixed to me so :)

(In reply to comment #12)

issue seems to be fixed now. Daniel "mutante" Zahn did some tweaks last week :

With curl (-I show headers, -L follow Location: hints)

Without trailing slash:
$ curl -IL https://www.mediawiki.org 2>/dev/null|grep Location
Location: https://www.mediawiki.org/wiki/MediaWiki
$

With trailing slash:
$ curl -IL https://www.mediawiki.org/ 2>/dev/null|grep Location
Location: https://www.mediawiki.org/wiki/MediaWiki
$

Seems fixed to me so :)

Err, your test case is flawed. This bug is about "https://mediawiki.org", not "https://www.mediawiki.org" (comment 0 and bug summary).

$ curl -Is "https://mediawiki.org" | grep Location
Location: http://www.mediawiki.org/

This behavior is still wrong.

(In reply to comment #14)

Oh my god.
Well http://wikimedia.org/ has the same issue :-)

You mean https://wikimedia.org, but yes, it appears to have the same issue:

$ curl -Is "https://wikimedia.org/" | grep Location
Location: http://www.wikimedia.org/

(In reply to comment #4)

It's unclear whether this bug should be expanded to
cover the other (several dozen) similar cases in redirects.conf. It's also
unclear whether there's a saner system to do this. %{HTTPS} might be best...
but it's going to result in an awful lot of code duplication for one letter.

Well, it's clearer now that there's a mess of bugs about the exact same issue. I'm going to make some noise and dupe them all down to this bug, as it came first and it contains reasonably useful back-and-forth between me and Daniel. I'll aggregate a few test cases and RT links in a single comment as well, I guess.

This means bug 33751 (redirects.conf), bug 35740 (Wikisource), bug 36951 (wikimediafoundation.org), bug 36952 (tracking bug) will be marked as duplicates.

I'm changing this bug's summary from "https://mediawiki.org redirects to http://www.mediawiki.org/" to "Certain Wikimedia redirects improperly go from HTTPS to HTTP; Apache's redirects.conf needs adjustments".

  • Bug 33751 has been marked as a duplicate of this bug. ***
  • Bug 35740 has been marked as a duplicate of this bug. ***
  • Bug 36951 has been marked as a duplicate of this bug. ***
  • Bug 36952 has been marked as a duplicate of this bug. ***

(In reply to comment #23)

Actually there is dozens and dozens more:

https://gerrit.wikimedia.org/r/gitweb?p=operations/apache-config.git;a=blob;f=redirects.conf;h=a777adf9dfb5f8652289c7855b06e5be43767a9b;hb=HEAD#l176

Right. The question is whether these can be solved in a simpler/cleaner/saner way than specifying "RewriteCond %{HTTPS} off" and "RewriteCond %{HTTPS} on" a million times.

According to my reading of http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30, the Location header must be an absolute URI. Someone suggested that switching these rules to use protocol-relative syntax might resolve this bug, but in addition to the HTTP spec, Apache's rewrite/redirect logic apparently interprets any string beginning with a "/" to be a relative URI path, so that option is a no-go.

As I said earlier, %{HTTPS} might be best... but it's going to result in an awful lot of code duplication for one letter.

(In reply to comment #24)

(In reply to comment #23)

Actually there is dozens and dozens more:

https://gerrit.wikimedia.org/r/gitweb?p=operations/apache-config.git;a=blob;f=redirects.conf;h=a777adf9dfb5f8652289c7855b06e5be43767a9b;hb=HEAD#l176

Right. The question is whether these can be solved in a simpler/cleaner/saner
way than specifying "RewriteCond %{HTTPS} off" and "RewriteCond %{HTTPS} on" a
million times.

According to my reading of
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30, the Location
header must be an absolute URI. Someone suggested that switching these rules to
use protocol-relative syntax might resolve this bug, but in addition to the
HTTP spec, Apache's rewrite/redirect logic apparently interprets any string
beginning with a "/" to be a relative URI path, so that option is a no-go.

Indeed. The spec says it should be absolute and apache doesn't support relative. But, browsers (the ones I've tested) do and always have supported relative Location headers. But anyway, as you say, a no-go.

As I said earlier, %{HTTPS} might be best... but it's going to result in an
awful lot of code duplication for one letter.

We could go for a build process. Where the thing is auto-generated adding one for each with the HTTPS case.

(In reply to comment #26)

see Gerrit change #13293.

I don't have a Gerrit account, so I'm commenting here. You incorrectly switch "^/(.*)$" to "^/(.*$)" several times in that changeset.

(In reply to comment #27)

You incorrectly switch "^/(.*)$" to "^/(.*$)" several times in that changeset.

As you just said on IRC that wasn't me.

Anyway, as I just said: there's essentially no difference between those 2 expressions in practice.

What's the status of https://gerrit.wikimedia.org/r/13293? I see Ryan commented on June 29 to wait for the weekend. It's been a few weekends. Has this been merged/deployed? Can it be?

*** Bug 41070 has been marked as a duplicate of this bug. ***

  • Bug 42409 has been marked as a duplicate of this bug. ***

(In reply to comment #29)

What's the status of https://gerrit.wikimedia.org/r/13293? I see Ryan
commented on June 29 to wait for the weekend. It's been a few weekends. Has
this been merged/deployed? Can it be?

Bump. The dupes are piling up here. Copying Sumana (Andre is already on the CC list). It'd be awfully nice to get this bug moving forward. :-)

https://gerrit.wikimedia.org/r/13293 had some further activity in September, but it's mostly remained quite quiet. I'm not sure what else is needed here to resolve this bug, though I'm happy to help if I can.

You will want to contact ops, most probably Daniel Zahn (mutante on IRC).

Gerrit change #13293 needs an update - that's one way to speed this up.

dzahn: Are you actually working on this, or should the assignee be reset?

Andre the change is pending review and there are a few issues in the patch :-D I don't think anyone is specially assigned to review it.

(In reply to comment #35)

I don't think anyone is specially assigned to review it.

I think Faidon volunteered.

(In reply to comment #36 by MZMcBride)

I think Faidon volunteered.

Faidon: Is that the case? If so feel free to assign this ticket to you and steal it from Daniel.

The patch mentioned above has been abandoned. See reason on the gerrit patch set comments.

I think this bug is quite important. It's been open for almost two years - is there any chance we can get this working with a less beautiful solution (e.g. duplicating the logic for every domain), and then have a follow-up bug to make it more efficient?

Also - the WMF supposedly prioritizes HTTPS...

https://blog.wikimedia.org/2013/08/01/future-https-wikimedia-projects/

Asher, Ryan L., Faidon, or anyone else in ops who might know:

Gerrit changeset 13293 indicates that these redirects should be moved to Varnish. Is this possible now? If so, do you have any pointers about which file(s) should be changed in order to move forward toward resolving this bug?

As Erik noted in comment 6, this is a pretty evil bug.

(In reply to comment #40)

Gerrit change #13293 indicates that these redirects should be moved to
Varnish. Is this possible now? If so, do you have any pointers about which
file(s) should be changed in order to move forward toward resolving this bug?

I suppose it may be possible to move some domains to varnish sooner than others but I'm not sure if it's worth it. (We are making recent progress)

"redirects should be moved to varnish" isn't quite right. Redirects will still be served from the same apaches they are now. Just they'll need to go through varnish instead of squid in order to fix this bug. More status below.

We recently got an OTRS ticket about this:
On 2013-08-09 someone wrote:

Subject: FYI: https://wikipedia.org redirects to http://www.wikipedia.org
would make life a lot easier if i could just type https://wikipedia.org to
get started.

Our reply:

Thanks for pointing this out! I'm also looking forward to that being fixed.

That's bug 31369https://bugzilla.wikimedia.org/31369 which is blocked on
https://wikitech.wikimedia.org/wiki/Projects#Switch_.22text.22_to_Varnish
(see also
http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071086.html )

Also, https://blog.wikimedia.org/2013/08/01/future-https-wikimedia-projects/
(which mentions https://www.eff.org/https-everywhere which is probably the
best short-term solution for you)

MZMcBride, no, text hasn't moved to Varnish yet. It's being worked on but we're not there yet. When that happens, operations/puppet, templates/varnish/text* would be the place to look for.

jeremyb, I think the idea was that we'd move this into VCL and do the redirects before even reaching Apache. I'm not sure if I like the idea. I'll leave it to whoever decides to implement that :) I agree that it's nasty, though.

(In reply to comment #42)

jeremyb, I think the idea was that we'd move this into VCL and do the
redirects
before even reaching Apache. I'm not sure if I like the idea. I'll leave it
to
whoever decides to implement that :) I agree that it's nasty, though.

I'm not completely sure which I prefer but I thought the general trend was to move stuff out to apache/php where we could (e.g. zero/carriers).

We could always use a switch statement in asm in VCL to decide which redirect code path to take for a given request.

Anyway, no matter which approach we end up going with having the domains listed in redirects.conf move from squid to varnish is a blocker for fixing this bug.

As this is blocked by the varnish work, can the project page be updated, specifically dates and RT tickets so those interested can follow along?
https://wikitech.wikimedia.org/wiki/Projects#Switch_.22text.22_to_Varnish

Also, upping importance given current WMF priorities and community desires.

  • Bug 52698 has been marked as a duplicate of this bug. ***

Bump. Any news? If we consider https important than bugs like this shouldn't be left open for almost 2 years.

Quick info: Investigating this problem is on the to-do list for this week.

Daniel/Asher/Ryan: Is there an RT ticket for this? *seems* mostly in Ops/apache domain vs Platform, right?

Oops, I forgot about/missed MZ's comment from way up there ^^

(In reply to comment #22)

Test cases

RT tickets (I believe):

  • RT #1668
  • RT #2847
  • RT #2783

Apache doesn't handle this well, and we don't really want to do it in varnish. It's likely best for MediaWiki to handle this instead...

In addition to those listed above (comment #22 and comment #23), please note that as of today, almost 2 weeks after HTTPS has been "forced" for all logged-in users:

also all get redirected to the non-secure "www.*" versions.

(In reply to comment #50)

Apache doesn't handle this well, and we don't really want to do it in
varnish. It's likely best for MediaWiki to handle this instead...

Are you saying this is a non-ops issue, then? In my view, either Rob L. or Ken S. need to figure out who to assign this ticket to in order to get it resolved. Or we need to adjust the stated priority, but I'd prefer not to do that. If it's best for MediaWiki to handle this, it moves from ops to platform/core, I think.

Clarification about what you mean (or a rough sketch of how you think this should be implemented in Wikimedia's environment) would be wonderful. Firing up MediaWiki to do this type of high-level redirect seems kind of silly to me, but I don't really care about the implementation details. I'll trust your judgment.

ksnider wrote:

I agree with Ryan here. Fixing this in Mediawiki should be the focus, otherwise we have to mangle the redirect in apache/varnish. Better if the wiki could return the HTTPS redirect directly instead.

If for some reason this isn't feasible, then it'll be on us to find a workaround.

Thanks.

We had a brief in-office discussion, and the workaround we would like to propose for this is to change the 301 redirect to a 302 redirect in Gerrit change #13293, and remove the 'Header set Vary "X-Forwarded-Proto"'. That should disable caching for the redirect itself without generally disabling or splitting the cache. The downside is that the redirect won't be cached at all, but we're hoping that we don't get too much traffic on this particular set of redirects.

Longer term, we can potentially move this into MediaWiki (perhaps as part of HipHop), but we'd like to avoid doing this too hastily.

(In reply to comment #54 by Rob Lanphier)

We had a brief in-office discussion, and the workaround we would like to
propose for this is to change the 301 redirect to a 302 redirect in Gerrit
change #13293, and remove the 'Header set Vary "X-Forwarded-Proto"'.

Does any specific MW core developer plan to work on this?

(In reply to comment #55)

Does any specific MW core developer plan to work on this?

It looks like Rob is proposing to fix it on the Apache/ops side. That Gerrit is in operations/apache-config.

(In reply to comment #56)

It looks like Rob is proposing to fix it on the Apache/ops side. That Gerrit
is in operations/apache-config.

In the short-term, yes. But Ryan L, Ken S., and Rob L. suggest on this bug that the long-term solution is for MediaWiki to handle these types of redirects. I think Andre may have been asking who might eventually handle that portion.

I'll repeat what I wrote earlier this year:

Clarification about what you mean (or a rough sketch of how you think this
should be implemented in Wikimedia's environment) would be wonderful. Firing
up MediaWiki to do this type of high-level redirect seems kind of silly to me,
but I don't really care about the implementation details.

Greg set this ticket to highest priority three months ago, so I wonder if that still reflects priorities.

(In reply to comment #58)

Greg set this ticket to highest priority three months ago, so I wonder if
that still reflects priorities.

It obviously hasn't been Platform's Highest priority for the past 3 months, wrongly or rightly. :/

I'll reassess current situation and priorities amongst Platform soon. Sorry.

Bug 51700 was an issue we had with http or https redirect being cached, i.e. not varying on X-Forward-Protocol. Squid did not cache the redirect so that wasn't an issue.

The related fix is https://gerrit.wikimedia.org/r/#/c/75583/ "Fix up Vary headers on 30x redirects from Apache".

Change 96438 had a related patch set uploaded by Tim Starling:
Generate redirects.conf

https://gerrit.wikimedia.org/r/96438

Change 96438 merged by Tim Starling:
Generate redirects.conf

https://gerrit.wikimedia.org/r/96438

(In reply to comment #22, comment #23, and comment #51)

Test cases

This issue isn't fully fixed.

CORRECT (HTTPS --> HTTPS):

$ curl -Is "https://mediawiki.org" | grep Location
Location: https://www.mediawiki.org/

$ curl -Is "https://www.wikimediafoundation.org" | grep Location
Location: https://wikimediafoundation.org/

$ curl -Is "https://dk.wikipedia.org" | grep Location
Location: https://da.wikipedia.org/

(Though I'm not sure why dk redirects to da... assuming that's deliberate.)

$ curl -Is "https://quote.wikipedia.org" | grep Location
Location: https://en.wikiquote.org/

$ curl -Is "https://textbook.wikipedia.org" | grep Location
Location: https://www.wikibooks.org/

$ curl -Is "https://wikimania.wikimedia.org" | grep Location
Location: https://wikimania2014.wikimedia.org/

INCORRECT (HTTPS --> HTTP):

$ curl -Is "https://wikimedia.org" | grep Location
Location: http://www.wikimedia.org/

$ curl -Is "https://wikipedia.org" | grep Location
Location: http://www.wikipedia.org/

$ curl -Is "https://www.wikisource.org" | grep Location
Location: http://wikisource.org/

$ curl -Is "https://wiktionary.org" | grep Location
Location: http://www.wiktionary.org/

$ curl -Is "https://wikiquote.org" | grep Location
Location: http://www.wikiquote.org/

$ curl -Is "https://wikinews.org" | grep Location
Location: http://www.wikinews.org/

$ curl -Is "https://wikivoyage.org" | grep Location
Location: http://www.wikivoyage.org/

$ curl -Is "https://wikibooks.org" | grep Location
Location: http://www.wikibooks.org/

PROBABLY IRRELEVANT:
$ curl -I "https://aa.wikiversity.com"
curl: (51) SSL peer certificate or SSH remote key was not OK

Change 106109 had a related patch set uploaded by Jeremyb:
final (I hope!) fix for protorel redirects

https://gerrit.wikimedia.org/r/106109

(In reply to comment #63)
Please use curl -vs | grep rather than curl -Is

(In reply to comment #65)

(In reply to comment #63)
Please use curl -vs | grep rather than curl -Is

(make that curl -vs 2>&1 >/dev/null | grep)

(In reply to comment #66)

(In reply to comment #65)

(In reply to comment #63)
Please use curl -vs | grep rather than curl -Is

(make that curl -vs 2>&1 >/dev/null | grep)

Why?

(In reply to comment #67)

Why?

Most UAs encountering these redirects will be sending GETs. Your tests used HEAD. Why not use what most UAs will be using?

(In reply to comment #68)

Most UAs encountering these redirects will be sending GETs. Your tests used
HEAD. Why not use what most UAs will be using?

Shouldn't the response header for a HEAD request be identical to that for a GET request?

(In reply to comment #69)

Shouldn't the response header for a HEAD request be identical to that for a
GET
request?

Probably the same most of the time. Not saying HEAD testing is worthless but the first thing to test should be something that more closely approximates actual usage in the wild.

As I suggested on IRC, I think this is an area where defined test cases would be very helpful, arguably essential.

For the Toolserver redirects to come, I wrote Gerrit change #108467 if that's useful for this bug.

(In reply to Gerrit Notification Bot from comment #64)

https://gerrit.wikimedia.org/r/106109

jeremyb: Patch should be split into several patches - anybody willing to do this?

Greg: You set this to highest priority half a year ago. Is the importance still correct?

Lowering prio (thanks for the ping, Andre). Letting Jeremy/Ops take it from here. Chris Steipp can review if needed.

Change 106109 merged by Tim Starling:
final (I hope!) fix for protorel redirects

https://gerrit.wikimedia.org/r/106109

Now fixed, based on testing the prior examples from this thread, plus a couple extras.

aa.wikipedia.com still doesn't work, but that's really bug 40998.

Reopen if you find one that still has the issue.

(In reply to Matthew Flaschen from comment #76)

aa.wikipedia.com still doesn't work, but that's really bug 40998.

I meant to say aa.wikiversity.com (the one mentioned earlier), but actually neither work.

sumanah wrote:

Thanks to everyone who raised or worked on this issue - you have helped keep our users safer.