Page MenuHomePhabricator

Info action's "search engine status" language could use tweaking
Closed, ResolvedPublic

Description

https://en.wikipedia.org/w/index.php?title=User_talk:MZMcBride&action=info

Looking at this page, it currently reads:

Search engine status: Not indexable

I think "Not indexable" is a bit misleading here. The page can still be indexed by search engines (including internal search engines), the page just happens to be marked in such a way that (external) search engines that opt to voluntarily follow the "noindex" directive will not publicly index the page.

Other software packages such as WordPress now use language such as "Search engines discouraged," which I think is more accurate.


Version: 1.22.0
Severity: enhancement
URL: https://translatewiki.net/wiki/Thread:Support/Pageinfo-robot-index

Details

Reference
bz43935

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:41 AM
bzimport set Reference to bz43935.
bzimport added a subscriber: Unknown Object (MLST).

Restoring the "easy" keyword following discussion with Krinkle. Bugs marked with "easy" do not need to be explicitly spelled out about how to move forward (e.g., "go to path/to/file.php and change this line"). They just need to be straightforward in the goal.

We can add a separate "thoughtless" or "trivial" category if people would like. I think Nemo is beginning to question the value of the keyword altogether. He may be right, though I like the idea of having an index of easy bugs for people to get involved with. Maybe this would be better served by [[mw:annoying little bugs]].

"Easy" in keywords currently means "this would be a good bug for a new developer to take a stab at." In this case, a new developer might come along, do some thoughtful consideration (as is sometimes required with bugs), and say "what about X or Y?" in a Bugzilla comment or perhaps in a Gerrit changeset. Then we can move forward.

See URL: I recently had to improve /qqq given the little clarity of the message, but I'm not sure I did it so well.

richardg_uk wrote:

Assuming that the output relates to robots.txt status (not to internal search), I suggest changing the text to:

"External search: Allowed/Disallowed"

Reasons:

  • if the status is based on robots.txt and is not always identical to the internal search indexing, then "External" would be more accurate;
  • "status" usually refers to the outcome of a process, which is misleading here because the info is about a setting;
  • "Disallowed:" is the standard wording used in robots.txt, so using "Allowed" and "Disallowed" is more consistent and makes the technical meaning more obvious.

Though "Discouraged" would be slightly more accurate for non-technical users, the word is still ambiguous and fails to reflect the robots.txt wording. If you think that "Allowed/Disallowed" are still too confusing, then "External search:" could be linked to http://en.wikipedia.org/robots.txt or to the article at [[Robots exclusion standard]].

(In reply to comment #3)

Assuming that the output relates to robots.txt status (not to internal
search), [...]

The output has no relation to robots.txt, as far as I'm aware. It's related to the noindex value of the meta element's content attribute. For example:

<meta name="robots" content="noindex,follow" />

This HTML element can be controlled per-page on non-content namespaces using the INDEX__/NOINDEX__ magic words or per-namespace using global configuration variables.

richardg_uk wrote:

Well at least I've demonstrated that the current wording is confusing. :/

There are at least four confusable tests:

  • robots:noindex meta tag (reported)
  • NOINDEX behavior switch (sets meta tag in non-article space)
  • robots.txt status (not reported; set independently)
  • internal search status (not reported; always true?).

So, to reflect the robots meta value more closely, how about:

"External search: Index / No index"

Stilted grammar, but makes it easier to see that it corresponds to the noindex meta tag.

And, given the potential for confusion, there's a strong case for linking "External search" to an explanatory page such as [[WP:NOINDEX]] or the [[noindex]] article.

(See also bug 42867 for an inconsistency in the reported search status depending on how a page is specified in the URL.)

Change 78413 had a related patch set uploaded by Nemo bis:
Clarify info action's "search engine status"

https://gerrit.wikimedia.org/r/78413

Change 78413 merged by jenkins-bot:
Clarify info action's "search engine status"

https://gerrit.wikimedia.org/r/78413

I'm not sure switching to allowed/disallowed resolves this bug.

Indexing directives are optional. I think using "discouraged"/"encouraged" is a bit clearer here and gives less of an illusion of control. However, nobody else seems to agree, so I'll leave this bug alone for now.

(In reply to comment #8)

I'm not sure switching to allowed/disallowed resolves this bug.

Indexing directives are optional. I think using "discouraged"/"encouraged"
is a
bit clearer here and gives less of an illusion of control. However, nobody
else
seems to agree, so I'll leave this bug alone for now.

Everything is optional, except law, and almost no web standard are made into laws, so I hope people understand. If you manage to find some other language which is used by anyone but us, it could be borrowed; I agree with you but couldn't find anything better.

"Encouraged" is also not really correct, as merely not disallowing something doesn't mean encouraging it.

External search engines: [discouraged | allowed]

Blergh.

(In reply to comment #10)

External search engines: [discouraged | allowed]

Sorry, I've been thinking about this bug for nearly a month and I don't agree that it's resolved. I'd really like to see us not use "disallowed" here. I think "discouraged" would be much more accurate, if even if it doesn't have perfect symmetry to "allowed". This imperfect symmetry (discouraged/allowed) best matches reality, I think.

Jdlrobson claimed this task.
Jdlrobson subscribed.

In 2021, there is no "Search engine status: Not indexable" text on the page

Perhaps this is: "Indexing by robot" but thats using common robots.txt language.