Page MenuHomePhabricator

Magic word to remove page from internal MediaWiki search results
Open, LowPublicFeature

Description

One of the perennial debates (on the Hungarian Wikipedia, at least) is whether to delete redirects from old names in the project namespace. On one hand, even when links are updated, it destroys some navigation pathways (bookmarks, external links etc.), makes old revisions of pages (where the links are not updated) less readable and annoys some people; on the other, the old (and often erroneous or misleading) names clutter up the search suggestions dropdown, which is by far the most user-friendly navigation method currently). There is a similar problem with redirects from misspelled names of articles.

It would be very useful to have a __NOSEARCH__ magic word which would suppress indexing of such pages by the internal search engine (or maybe just adding a flag so that the page is only returned when such pages are explicitly requested).

Details

Reference
bz22251

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:54 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz22251.
bzimport added a subscriber: Unknown Object (MLST).

rd232 wrote:

*** Bug 24169 has been marked as a duplicate of this bug. ***

This comment was removed by Bachsau.

rd232 wrote:

(In reply to comment #2)

I wouldn't want a new magic word. May a config option, if NOINDEX should
also apply to internal search would do.

I doubt that if MediaWiki had that option, Wikimedia projects would use it: NOINDEX hiding content from internal search is usually not the desired behaviour for existing uses of NOINDEX. Equally, it's not certain that NOSEARCH uses should be hidden from search engines. Just provide both, for flexibility.

One concern here is the potential for abuse if people can mask content from internal search.

Another concern here is that some people use __NOINDEX__ with the specific intention of excluding content from external search engines, but continuing to include that content in internal search engines. A separate magic word such as __NOSEARCH__ might address this.

debt subscribed.

We'll need more information on the use cases for doing this and there might be (already) a different way to go about this.

This comment was removed by Bachsau.

@Bachsau, there's quite a few reasons why we'd want to have redirects appear in the search box. Two examples off the top of my head. Common misspellings of article titles are often created as redirects. The article for "Mississippi" has 9 redirects for misspellings alone. :) Colloquial names for things as well. "Show Me State" redirects to the article on the state of Missouri, USA as it's the state's motto. I hope that helps explain a few reasons why we just can't exclude them all.

This comment was removed by Bachsau.

Another use case is abusive user names. The user pages of these are usually templated for transparency, but that means writing the abuse target's name in the search box gives abusive results in the dropdown.

One concern here is the potential for abuse if people can mask content from internal search.

I think abuse is not typically found by searching for it, but marked pages could be added to a service category if reviewing them is a concern.

In T24251#4021783, @Tgr wrote:

Another use case is abusive user names. The user pages of these are usually templated for transparency, but that means writing the abuse target's name in the search box gives abusive results in the dropdown.

Abusive user names should be hideuser-blocked so they don't appear in search results / for normal users.

Abusive user names should be hideuser-blocked so they don't appear in search results / for normal users.

AFAIK that doesn't prevent the user page from appearing in search results, since it is a page that exists (it has a template saying why the user got blocked). Also hideuser is an oversight right so not available on most wikis. (Maybe that should be fixed; IMO it would make sense to allow admins to hide users as they can mostly do it already, but in a very cumbersome way.)

Krinkle renamed this task from Magic word to remove page from (quick)search results to Magic word to remove page from internal MediaWiki search results.Mar 22 2018, 12:06 AM
Krinkle updated the task description. (Show Details)
Krinkle removed a subscriber: wikibugs-l-list.

On the English Wikipedia there is a frequently expressed desire for redirects from misspellings (and similar) to be excluded from the search suggestions drop-down.
A magic word NOSEARCH (or similar) that:

  • excluded such pages from the search drop down (except for exact matches?)
  • (for redirects) placed their target higher in the drop down list and full search results page (to the top if an exact match)
  • lowered non-exact matches in the full search hits but did not remove them entirely

would satisfy this desire and, as I understand it, also satisfy the original request will limiting (to almost nothing) the potential for abuse noted by @MZMcBride in comment 4.

On the English Wikipedia we would likely include the magic word in some of the redirect categorisation templates that are transcluded onto the redirect page, so it would need to function in that environment the same as if it was placed on the redirect page directly.

You could probably create an empty {{nosearch}} template, put it inside the affected templates and unboost it.

Well, maybe. I'm not sure boosting is used for search suggestions. But it might be worth a try.

I test to use subpages to separate some chapters from each other and make it more easy to have an global order and change them in a more automatic manner. the main page uses only the references to inject. It works well, but in Search would be all subpages listed. Hiding these would be the way i seach for.

And:

srredirects: Include redirect pages in the search. From 1.23 onwards, redirects are always included. (Default: false) (removed in 1.23)

i use also serveral redirects. Since redirects can no longer be hidden since 1.23, it would be a first step to have an option to hide it out again. As has already been suggested several times.

Similarly, when we create a mediawiki instance for a new domain, we separate each page into a main page (editable) and a transcluded autogenerated page (not editable, since it is automatically regenerated on a regular basis).

We'd like to exclude the autogenerated page from both search results and random page functionality.

Bump. I know I shouldn't do this, but I think that this is actually a good idea. I'm not sure if there is something like this for wikis, which aren't using the CirrusSearch, available or considered.

Furthermore, I got a question from one user of MediaWiki software, because of which I found this task. I'm placing it here:

We have a lot of quotes on the site from users of wiki who are reporting their experience with something, and I've been adding the names of the authors in markup (<!-- name -->) to obscure their names from the public-facing text, but make it easy for me, when visiting a page, to check who we already have quotes from.

The problem is that I've discovered that the site's search tool will bring up these names if one searches for the names themselves.

Do you know of any markup that we could use to completely obscure these names, even from searches specifically for the names?
Aklapper changed the subtype of this task from "Task" to "Feature Request".Jul 4 2022, 9:55 AM

@Kizule I don't know of any way to exclude phrases from search results, but I can certainly see use cases for it. I'm not sure it's quite the same thing as this feature request though, so maybe open a new ticket?