Page MenuHomePhabricator

Wikidata should use wikidata.org as a domain (not www.wikidata.org)
Closed, DeclinedPublic

Description

There's a bit of funkiness with the wikidata.org --> www.wikidata.org redirect. For me, http://wikidata.org/wiki/Hello and http://www.wikidata.org/wiki/Hello both resolve ("HTTP/1.0 200 OK") without redirecting. This is the wrong behavior. There should only be one canonical form of the URL, similar to how mediawiki.org behaves. Everything should either redirect to www or everything should redirect to the un-prefixed form. A mixture is bad.


Version: wmf-deployment
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=44094
https://bugzilla.wikimedia.org/show_bug.cgi?id=44108
https://bugzilla.wikimedia.org/show_bug.cgi?id=44612
https://bugzilla.wikimedia.org/show_bug.cgi?id=45005

Details

Reference
bz41847

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:46 AM
bzimport set Reference to bz41847.
bzimport added a subscriber: Unknown Object (MLST).

We really want wikidata.org without a prefix, but apparently several bits of ops infrastructure assume that there is always a subdomain. Hence the confusion. But I agree that it needs to be sorted out.

I think it used to work right when it first went live...

Yeah, the ops folks seemed to follow www.mediawiki.org as an example when they probably should have been following wikimediafoundation.org as an example. I'm not sure it was originally clear that the Wikidata folks wanted the non-www form.

There really ought to be only one canonical form of the URL/domain.

What's needed to either use www.wikidata.org or wikidata.org as the canonical form? Are there specific sub-tasks required? Do RT tickets need to be filed? How can we move this bug forward?

You've got gerrit access, make the changes in the operations/apache-config.git repo ;)

I merged the MW and the Apache changes and deployed them. The www is now gone:

https://gerrit.wikimedia.org/r/#/c/44406/
https://gerrit.wikimedia.org/r/#/c/44407/

21:41 logmsgbot: dzahn gracefulled all apaches
21:41 mutante: dropping www from wikidata in mw and apache configs as requested
21:39 logmsgbot: dzahn synchronized ./wmf-config/CommonSettings.php
21:39 logmsgbot: dzahn synchronized ./wmf-config/InitialiseSettings.php

Sorry, this appears to still not be fixed.


$ curl -Is "http://www.wikidata.org/" | grep Location

Location: http://wikidata.org/wiki/Wikidata:Main_Page

^ This is correct.


$ curl -Is "http://www.wikidata.org/wiki/Hello" | head -1

HTTP/1.0 200 OK

^ This is incorrect.

Does this mean you want an additional redirect from www. to "without www"? looking..

Sorry, this whole attempt to drop "www" had to be reverted and caused some issues. We can't _not_ have www because of bits.

		Ifba23284	revert the whole www dropping for wikidata, cant use it due to bits (MERGED)	Dzahn	operations/mediawiki-config	master (master)	4:39 PM

Dzahn

Dzahn

		Ie68957b9	revert the whole www dropping for wikidata, cant have it due to the way bits wor (MERGED)	Dzahn	operations/apache-config	master (master)	4:38 PM

Dzahn

Dzahn

		I4035c527	fix wikidata redirect, for real, sry (MERGED)	Dzahn	operations/apache-config	master (master)	4:19 PM

Dzahn

Dzahn

		I16e20aa9	fix wikidata redirect (MERGED)	Dzahn	operations/apache-config	master (master)	4:13 PM

Dzahn

Dzahn

(In reply to comment #9)

We can't _not_ have www because of bits.

Can you explain further? Plenty of wikis don't use "www" (commons.wikimedia.org, wikimediafoundation.org, etc.). I don't understand the issue.

(In reply to comment #11)

(In reply to comment #9)

We can't _not_ have www because of bits.

Can you explain further? Plenty of wikis don't use "www"
(commons.wikimedia.org, wikimediafoundation.org, etc.). I don't understand
the
issue.

<mutante> we cant NOT have www due to the way bits works
<Reedy_> why? :/
<mutante> due to the way bits works and geolocation
<mutante> it needs a CNAME .. and NOT an A record
<mutante> but wikidata.org is an A record
<mutante> wikimediafoundation.org only works because it is not balanced between data centers at all

Something funky is definitely going on here, but I don't think bits is to blame.

Right now, I'm focused on a tangential issue: when you request a non-existent page, there's a 200 response code from the Wikidata domains, not a 404 response code as there should be.

Compare:

$ curl -Is 'http://en.wikipedia.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 404 Not Found

$ curl -Is 'http://zh.wikisource.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 404 Not Found

$ curl -Is 'http://www.wikidata.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 200 OK

$ curl -Is 'http://wikidata.org/wiki/This_should_return_a_404' | head -1
HTTP/1.0 200 OK

We can see that en.wikipedia.org and zh.wikisource.org properly return a 404, but www.wikidata.org and wikidata.org return a 200. This is wrong.

I'd generally say that these response codes are the subject of a separate bug, but I'm betting that solving this issue (this symptom) will lead to the resolution of this bug. If someone can prove otherwise, feel free to split the bug. :-)

(In reply to comment #9)

Sorry, this whole attempt to drop "www" had to be reverted and caused some
issues. We can't _not_ have www because of bits.

For the record, the revert left broken stuff stuck in the european caches, see bug 44094.

<mutante> due to the way bits works and geolocation
<mutante> it needs a CNAME .. and NOT an A record
<mutante> but wikidata.org is an A record

Filed this as bug 44097.

Daniel K.: Can you please investigate the issue described in comment 13? Between cache pollution and incorrect response codes, it's become nearly impossible to test/debug this bug. I think it's important to get consistent response codes before this bug can move forward.

(In reply to comment #16)

Daniel K.: Can you please investigate the issue described in comment 13?

Filed as bug 44108.

I remember reporting a PyWikipedia bug about page.exists() giving one single result (don't remember offhand if it's True of False all the time), so they might be somehow linked. That hasn't been resolved either.

(In reply to comment #18)

I remember reporting a PyWikipedia bug about page.exists() giving one single
result (don't remember offhand if it's True of False all the time), so they
might be somehow linked. That hasn't been resolved either.

Was this comment posted to the appropriate bug? I'm having difficulty understanding what you're saying in context. What pywikipedia bug?

The Special:UserLogin page is now returning the user to a URL without subdomain and without namespace. This will more often than not fail to be a valid page.

(In reply to comment #22)

http://wikidata.org/wiki is a redirect loop, is a redirect loop.

This is filed as bug 44612.

Okay, it now appears that the canonical form is www.wikidata.org:


HTTP, NO PREFIX

$ curl -Is "http://wikidata.org/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page

HTTPS, NO PREFIX

$ curl -Is "https://wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page

HTTP, PREFIX

$ curl -Is "http://www.wikidata.org/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://www.wikidata.org/wiki" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://www.wikidata.org/wiki/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page

HTTPS, PREFIX

$ curl -Is "https://www.wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://www.wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://www.wikidata.org/wiki/" | grep Location

Location: https://www.wikidata.org/wiki/Wikidata:Main_Page

Can this bug be marked resolved/fixed, then?

(In reply to comment #24)


HTTP, NO PREFIX

$ curl -Is "http://wikidata.org/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "http://wikidata.org/wiki/" | grep Location
Location: http://www.wikidata.org/wiki/Wikidata:Main_Page

HTTPS, NO PREFIX

$ curl -Is "https://wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page

[...]

Hmmm, no, missing one test:


HTTPS, PREFIX, MAIN PAGE

$ curl -Is "https://www.wikidata.org/wiki/Wikidata:Main_Page" | grep Location

^ This is the correct behavior. No Location header, as the canonical form is www currently.


HTTPS, NO PREFIX, MAIN PAGE

$ curl -Is "https://wikidata.org/wiki/Wikidata:Main_Page" | grep Location

^ This is wrong. This page currently loads without redirecting using a Location header. Compare:


HTTPS, NO PREFIX

$ curl -Is "https://wikidata.org/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page
$ curl -Is "https://wikidata.org/wiki/" | grep Location
Location: https://www.wikidata.org/wiki/Wikidata:Main_Page


Three forms go from no-www to www. But one form (/wiki/Wikidata:Main_Page) stays at no-www. Hmm. :-/

Can you please check again? It seems to be working fine here now.

Ok I take that back. There's still one problem: When using :d in the wikitext you get links like en.wikidata.org or fi.wikidata.org and these do not redirect to www.wikidata.org.

(In reply to comment #27)

Ok I take that back. There's still one problem: When using :d in the wikitext
you get links like en.wikidata.org or fi.wikidata.org and these do not
redirect to www.wikidata.org.

Hmmm, yeah, I see what you mean.

Both of these should redirect to https://www.wikidata.org/wiki/Wikidata:Main_Page, as I understand it. Currently neither do.

(In reply to comment #28)

Both of these should redirect to
https://www.wikidata.org/wiki/Wikidata:Main_Page, as I understand it.
Currently neither do.

For now, redirecting to the main page will do, but we really want something more elaborate:

http://en.wikidata.org/wiki/Foo should redirect to http://www.wikidat.org/wiki/Special:ItemByTitle/enwiki/Foo.

That is:

(\w+).wikidata.org/wiki/(.*) should redirect to %PROTOCOL://www.wikidat.org/wiki/Special:ItemByTitle/$1wiki/$2

But perhaps that should be a separate request. I think just redirecting to the main page is fine for now. Just wanted to mention it, in case it has any impact on how this gets implemented.

Attempting to remove a link on http://wikidata.org/wiki/Q169964 triggers an XHR to http://www.wikidata.org/w/api.php, which fails in Chromium 27 with:

XMLHttpRequest cannot load http://www.wikidata.org/w/api.php. Origin http://wikidata.org is not allowed by Access-Control-Allow-Origin.

Attempting to remove a link on http://wikidata.org/wiki/Q169964 triggers

Actually, it should not be possible to even load http://wikidata.org/wiki/Q169964.

Any request to wikidata.org should immediately be redirected to www.wikiedat.org.(In reply to comment #30)

We've decided to do it the other way around and always redirect to www.wikidata.org

see https://bugzilla.wikimedia.org/45005

We still need to solve the issues in the previous two comments though.

This bug needs to be marked as resolved / wont fix.

This bug completely contradicts https://bugzilla.wikimedia.org/45005 which is what we decided to do.

closing after discussion with Denny