Page MenuHomePhabricator

Special:Linksearch should default to all protocols (protocol-less column for externallinks)
Closed, ResolvedPublicFeature

Description

Special:Linksearch should default to all protocols instead of just http. This quirk is documented in the default MediaWiki:Linksearch-text. As it stands, this allows linkspammers to evade scrutiny by placing https links.

Example query: Special:Linksearch/*.spam.example.com should pick up the link to https://spam.example.com/blah I placed in my personal sandbox, but I have to search Special:Linksearch/https://*.spam.example.com to pick up this link.


Version: unspecified
Severity: enhancement

Related Objects

StatusSubtypeAssignedTask
ResolvedFeature Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Duplicate Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
ResolvedAntoine_Quhen
Resolved Ladsgroup
Resolved Ladsgroup
Resolved Ladsgroup
Open Ladsgroup
ResolvedBUG REPORT Ladsgroup

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:06 PM
bzimport set Reference to bz12810.
bzimport added a subscriber: Unknown Object (MLST).

Currently you'd have to do separate queries for every possible protocol, and paging the list wouldn't be cleanly possible.

To work cleanly, another index field would have to be added to the externallinks table which doesn't include the protocol.

Extensions is now part of MediaWiki core (1.14alpha) -> changing product and component

In fact, when pagination occurs, even if you place a domain without protocol, pagination links convert the domain to http:// automatically. For example, try following the next page at https://www.mediawiki.org/wiki/Special:LinkSearch/commons.wikimedia.org and see how http:// is added to the domain in the search input field.

It would be really nice if this got a bit of love. It is 13 years old, and with so many links being https by default it is becoming a pussy bedsore of a problem

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM

Please fix. If you can really only handle one protocol by default, it should probably be https://

@Certes: Please feel free to provide a patch if you'd like to get things closer to getting fixed. Thanks.

As a note, it is trivial to work around the protocols issue in link search using Special:Search with insource: and regex e.g. a query with insource:theguardian insource:/theguardian\.com/ returns about 154k pages, divided into 147k secure HTTP links and 17k insecure HTTP links.

Instead of defaulting to all protocols, it's possible to make it default to http and https, that is feasible and would address the big bulk of the problem. Would that be good enough to call this done?

Instead of defaulting to all protocols, it's possible to make it default to http and https, that is feasible and would address the big bulk of the problem. Would that be good enough to call this done?

This was my primary frustration with the page :)

I have some good news for you! Once T312666: Remove duplication in externallinks table is done (in June), it'd be quite easy to implement http and https as the default.

I will do this but only for http/https. Limiting scope to the actual problem. If people need all protocols, they can query it in quarry.wmcloud.org instead.

Change 917412 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] LinkSearch: Change default protocol to http:// and https:// in READ_NEW

https://gerrit.wikimedia.org/r/917412

Jdforrester-WMF reopened this task as In Progress.
Jdforrester-WMF subscribed.

Actually, I'll merge this in a few weeks once it's ready to run in Wikimedia production.

Just a reminder that this functionality needs to be exposed in the Action API too.

Change 917412 merged by jenkins-bot:

[mediawiki/core@master] LinkSearch: Change default protocol to http:// and https:// in READ_NEW

https://gerrit.wikimedia.org/r/917412

Change 655401 had a related patch set uploaded (by Krinkle; author: DennisRoczek):

[mediawiki/core@master] switch to https as default for Special:LinkSearch instead of http

https://gerrit.wikimedia.org/r/655401

Change 655401 abandoned by Krinkle:

[mediawiki/core@master] switch to https as default for Special:LinkSearch instead of http

Reason:

Closing in favour of https://gerrit.wikimedia.org/r/917412, which addreses the underlying issue of HTTP being a poor default, by defaulting to both HTTP and HTTPS (instead of switching the default from one to the other).

https://gerrit.wikimedia.org/r/655401

Change 932000 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] api: Make linksearch APIs also default to http and https in READ_NEW

https://gerrit.wikimedia.org/r/932000

Fine, after one and a half decade.

May I suggest to add // to the list of protocols?

As far as I experienced on first glance the so-called protocol relative links are not covered now. Those were quite popular before WMF wikis moved to always secure, since they inherited the security level from the client preference. In those days this had improved performance or bandwidth.

Fine, after one and a half decade.

May I suggest to add // to the list of protocols?

As far as I experienced on first glance the so-called protocol relative links are not covered now. Those were quite popular before WMF wikis moved to always secure, since they inherited the security level from the client preference. In those days this had improved performance or bandwidth.

// is internally indexed as https so that's automatically covered.

Also, we made some changes regarding handling proto-relative URLs: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/USZRHK5NJRHSFBWD6Q6RZQGS2R4E4BZ7/

On top of that, using // is generally discouraged and should be cleaned up :P

On top of that, using // is generally discouraged and should be cleaned up :P

Well, in order to clean them up it is necessary to find them via LinkSearch.

Anyway, many of them are within (even archived) discussions and user pages where it is not welcome to modify a personal contribution.

On productive community pages and in article space we are expanding to https:// when editing anyway.

They are stored as-is in el_to field in externallinks, so querying quarry.wmflabs.org with el_to like '//%' should give you a lot of results to work with (but I'll be dropping that field in a month or so). Anyway, offtopic to this ticket and already addressed as it's internally stored as https so it'll show up in link search regardless of being fixed or not.

Change 932000 merged by jenkins-bot:

[mediawiki/core@master] api: Make linksearch APIs also default to http and https in READ_NEW

https://gerrit.wikimedia.org/r/932000

We probably need to tell users we fixed this fifteen year old ticket.

To be clear: This ticket is partially resolved, partially declined. There is no way we would support for all protocols but the most common use case which is http and https is doable and done now.

Re: Tech News - What wording would you suggest as the content?
I'd guess at something like this, but perhaps it's more complicated?

Changes later this week [section]

  • Special:LinkSearch and the Action API will now search both http and https protocols at once.

And does anything related need to be updated within https://www.mediawiki.org/wiki/Help:Linksearch ?
Thanks!

Re: Tech News - What wording would you suggest as the content?
I'd guess at something like this, but perhaps it's more complicated?

Changes later this week [section]

  • Special:LinkSearch and the Action API will now search both http and https protocols at once.

And does anything related need to be updated within https://www.mediawiki.org/wiki/Help:Linksearch ?
Thanks!

I'd say:

The default protocol of Special:LinkSearch and API counter-parts have changed from http to both http and https

It's more about the default part. Sorry for the confusion

@Ladsgroup Do we need a separate ticket to fix the json for "linksearch-text" ?

https://github.com/wikimedia/mediawiki/blob/master/languages/i18n/en.json

Currently:
"linksearch-text": "Wildcards such as \"*.wikipedia.org\" may be used.<br />\nSupported {{PLURAL:$2|protocol|protocols}}: $1 (defaults to http:// if no protocol is specified).",

Guessing should be trimmed to
"linksearch-text": "Wildcards such as \"*.wikipedia.org\" may be used.<br />\nSupported {{PLURAL:$2|protocol|protocols}}: $1",

or are all the language versions at translatewki?

Change 936119 had a related patch set uploaded (by JJMC89; author: JJMC89):

[mediawiki/core@master] update linksearch-text

https://gerrit.wikimedia.org/r/936119

Change 936119 merged by jenkins-bot:

[mediawiki/core@master] update linksearch-text

https://gerrit.wikimedia.org/r/936119