Page MenuHomePhabricator

googlebot shouldn't be given spoilers
Closed, InvalidPublic

Description

Author: logixoul

Description:
IMPORTANT!!! DO *NOT* READ THIS BUGREPORT IF YOU HAVEN'T READ THE BOOK "Harry
Potter and the Half-Blood Prince" AND YOU PLAN TO READ IT!
-see the bugreport below-
...
...
...
...
...
...
...
...
How to reproduce:

  1. Query Google for "half blood prince"
  2. See the fourth result - it reads:

Harry Potter and the Half-Blood Prince - Wikipedia, the free ...
For information on the character, see Half-Blood Prince (character). ...
Harry pursues Snape, who identifies himself as the Half-Blood Prince before fleeing ...
en.wikipedia.org/wiki/Harry_ Potter_and_the_Half-Blood_Prince - 47k - Cached - Similar pages

As you can see, anybody who searches for information about the book in Google will be spoiled
instantly. Internally, Wikipedia has the solution - the {{spoiler}} template. However, in
Google searches the warning does not display. Therefore another approach is needed.
Expected behavior:

  1. On article access check the user-agent string sent.
  2. If it's Googlebot's, return the page with all spoilers replaced by "---SPOILER---" or

something similar. Spoilers be written in articles like <spoiler>dumbledore dies</spoiler>
(or other markup).


Version: unspecified
Severity: enhancement
URL: http://wikipedia.org

Details

Reference
bz3848

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 8:53 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz3848.
bzimport added a subscriber: Unknown Object (MLST).

avarab wrote:

INVALID, google has a "Dissatisfied? Help us improve" link on every search
result, I suggest you use it.

neubau wrote:

Snape kills dumbledore.

logixoul wrote:

INVALID, google has a "Dissatisfied? Help us improve" link on every search result, I

suggest you use it.

How's a computer algorithm supposed to recognize something as abstract as plot spoilers?
No, users should cope with it. Please, before replying, think whether it's really
possible a satiable antispoil algorithm in Google to work.

Snape kills dumbledore.

No comment...

avarab wrote:

(In reply to comment #3)

INVALID, google has a "Dissatisfied? Help us improve" link on every search

result, I

suggest you use it.

How's a computer algorithm supposed to recognize something as abstract as plot

spoilers?

No, users should cope with it. Please, before replying, think whether it's

really

possible a satiable antispoil algorithm in Google to work.

Well that's something for the google people to work out, not us.

logixoul wrote:

I don't agree, but I have nothing else to say. I used the method you recommended.

robchur wrote:

It is not technically possible for us to prevent Google from indexing spoilers -
how is the software supposed to know what is a spoiler and what isn't?

logixoul wrote:

It is not technically possible for us to prevent Google from indexing spoilers -

how is the software supposed to know what is a spoiler and what isn't?

I wrote that above. For example, we surround spoiling parts with <spoiler></spoiler> manually. As I
said, IMO it's humans' job to tell spoilers apart from regular text.

robchur wrote:

(In reply to comment #7)

It is not technically possible for us to prevent Google from indexing

spoilers -

how is the software supposed to know what is a spoiler and what isn't?

I wrote that above. For example, we surround spoiling parts with

<spoiler></spoiler> manually. As I

said, IMO it's humans' job to tell spoilers apart from regular text.

So? Surround something in those tags either at the wikitext or XHTML level, and
you'll find it's ignored at the former and rejected as invalid XHTML at the
latter. And how does that stop GoogleBot seeing it?

logixoul wrote:

(In reply to comment #9)

(In reply to comment #7)

It is not technically possible for us to prevent Google from indexing

spoilers -

how is the software supposed to know what is a spoiler and what isn't?

I wrote that above. For example, we surround spoiling parts with

<spoiler></spoiler> manually. As I

said, IMO it's humans' job to tell spoilers apart from regular text.

So? Surround something in those tags either at the wikitext or XHTML level, and
you'll find it's ignored at the former and rejected as invalid XHTML at the
latter. And how does that stop GoogleBot seeing it?

I'm sorry, apparently I didn't make myself clear. What I meant was that some markup
analogous to <spoiler></spoiler> has to be added to the wikicode specs, meaning that
MediaWiki itself should be changed.
(In reply to comment #8)

See also: http://lists.w3.org/Archives/Public/www-html/2005Dec/0009.html

Now that I read this, I realized that indeed such a thing would be much better off in
general XHTML. However, as seen at
http://lists.w3.org/Archives/Public/www-html/2005Dec/0021.html
the proposal seems to have been declined. I'm probably going to argue with them,
because I don't agree with some of their points. For now I am convinced that the
INVALID resolution fits this bug.