Page MenuHomePhabricator

Remove worse-than-useless <meta> keywords feature
Closed, ResolvedPublic

Description

The feature that adds meta keywords to the header of each page implementation of this feature is insufficient, and the very concept of selected keywords is ultimately flawed. It should be removed.

The idea of meta keywords was to provide search engines. From HTML 4.01 (http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4.2):
A common use for META is to specify keywords that a search engine may use to improve the quality of search results.
(http://www.w3.org/TR/REC-html40/appendix/notes.html#recs):
"Some indexing engines look for META elements that define a comma-separated list of keywords/phrases ... Search engines may present these keywords as the result of a search."

So what's wrong? To start with, MediaWiki's implementation of this feature is worse than useless. If we are to believe it, the most relevant terms for Wikipedia's main page are:
"Main Page,1769,1945,1994,1999,2.5D,2004,2009,2D computer graphics,622,Aiea, Hawaii"

For the current featured article, they are:
"Domitian,Articles with unsourced statements from July 2009,Flavian Dynasty,The Triumph of Titus Alma Tadema.jpg,Special:Search/Domitian,69,79,81,96,Abdication,Abortion"

The theory is that these topics are important because they are either within the title, or linked in the article. This misses the point, as search engines _already look_ for text that is in the title or linked in the article.

As shown above, the algorithm fails spectacularly to select sensible keywords. These keywords would do more harm than good if they did anything. Fortunately, they do not. Modern search engines do not use these keywords because of their propensity for spam/SEO (see http://en.wikipedia.org/wiki/Keyword_stuffing). They simply cannot be relied on.

Google went out of its way to talk about every other meta tag, then denied using keywords when asked:
http://googlewebmastercentral.blogspot.com/2007/12/answering-more-popular-picks-meta-tags.html
(John Mueller: 'You're right in that we generally ignore the contents of the "keywords" meta tag.')

Unfortunately, the 80-100 gzipped bytes (250-350 uncompressed bytes) these keywords occupy per response still negatively impacts users. Coming as they do before other headers, the keywords delay the load of linked CSS and JS files, as well as the content itself, occupy cache space, and take CPU to generate.

Providing a way to specify appropriate keywords is not a solution, as it would distract attention from actually useful editing. The standard itself has been abandoned by those intended to use it. This feature should be removed altogether, or at the very least made optional and turned off by default.

(Ironically, the meta keyword that *is* sometimes used - "description" - is not supported by MediaWiki, probably because search engines do a good enough job at figuring out appropriate text without it.)


Version: unspecified
Severity: minor

Details

Reference
bz19761

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:39 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz19761.

Whoops - please read the first paragraph to "The implementation of the feature that adds meta keywords to the header of each page is insufficient, and the very concept of selected keywords is ultimately flawed. It should be removed." :-)

Cf bug 570 and bug 7614. Can probably be marked INVALID/WONTFIX if this bug is done.

Could have sworn I removed this misfeature a year or two ago. :) Gettin' out the scissors...

Done in r53482.

Note the functionality could be replicated in an extension easily enough should someone actually want such things. :) But modern search engines don't use the field, and the values were poorly selected, so I'm happy to kill it.

(In reply to comment #0)

Unfortunately, the 80-100 gzipped bytes (250-350 uncompressed bytes) these
keywords occupy per response still negatively impacts users. Coming as they do
before other headers, the keywords delay the load of linked CSS and JS files,
as well as the content itself, occupy cache space, and take CPU to generate.

How about accessibility bug 19453: it's as if your keyword bad dream
jumped out of the <head> and into a rendered nightmare the top of the <body>
for several screenfulls, doubling total page bytes too.

pzimmerm wrote:

Aargh! Our mediawiki implementation requires a few specific meta tags to work with our company's search system, and with 1.16 it's all broken... it worked great previously. The metakeywordstag extension no longer works, and the metatag extension requires protected pages which does not support our collaborative environment. Any suggestions on a work-around?

As Brion notes, turning the code he removed into a hook should be trivial. In fact, here it is: [[mw:User:Ilmari_Karonen/MetaKeywords]]. Stick that in a .php file somewhere, require_once() it from LocalSettings.php and you should be fine.

Note: I haven't actually tested the code, besides running php -l on it. If you find a bug, let me know.

pzimmerm wrote:

Sweet... thanks a ton!!

I noticed around 2009 or so that the keywords were less useful for Wikiquote than for Wikipedia. 5 years later I went to check if that was still the case and I discovered this bug. :)

Anyway, it seems "keywords" is basically replaced by "description" now (bug 12196) and I think there is a remote chance of this happening in an extension instead: bug 66325.