Page MenuHomePhabricator

Make namespace filter in Special:Linksearch efficient enough for wgMiserMode
Open, MediumPublic

Description

As suggested by Simetrical on T9804: Namespace filter in Special:Linksearch, I'm opening a new bug to request the namespace filter to be activated for Special:Linksearch on Wikimedia wikis. As far as I understand, this feature needs to be more efficient to be activated (see T9804#125213). Thanks.

See Also:

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:52 PM
bzimport set Reference to bz10593.
bzimport added a subscriber: Unknown Object (MLST).

The only way to make this efficient is to duplicate page namespace data into the link tables, which would be difficult at best to maintain.

I don't think that's something we want to do at the moment...

  • Bug 11754 has been marked as a duplicate of this bug. ***
  • Bug 14096 has been marked as a duplicate of this bug. ***
  • Bug 16948 has been marked as a duplicate of this bug. ***
  • Bug 19649 has been marked as a duplicate of this bug. ***
  • Bug 19789 has been marked as a duplicate of this bug. ***

Bah. So many dupes due to Bugzilla's crap default search. This isn't a LATER bug any longer. As I noted in bug 19789, the API can do this easily. The UI should match. Re-opening!

ayg wrote:

Is this a case where the web UI needs to be fixed, or where the API needs to be fixed?

(In reply to comment #8)

Is this a case where the web UI needs to be fixed, or where the API needs to be
fixed?

The API has a namespace filter. The user interface does not. The user interface should be updated to include a namespace selector. Clarified bug summary accordingly.

The worst-case behavior of the database query is very bad as there's not a clean index to work on, hence it's disabled in miser mode.

ayg wrote:

Since Delta was asking about this in MediaWiki-General, a summary:

  1. The current query without namespace filtering will never scan more rows than the number of results it returns, e.g., 500.
  1. The current query with namespace filtering will scan millions of rows in the worst case.
  1. There is no likely way to change the schema to make the queries efficient. The only way I can think of is to denormalize unacceptably, as Brion says in comment 1.
  1. For some reason, the query is enabled anyway for the API. I guess we could disable that too, but it's not a reason to enable it in the web UI. We don't like queries that scan millions of rows worst-case.

Delta tells me, though, that the thing people really want this for is to limit to the main namespace, or maybe to content namespaces. This is much less of a problem in practice than arbitrary namespace limitations. The real issue would be someone searching for all *.wikipedia.org links in Portal_talk or something. For real-world queries, limiting to content namespaces should only increase rows scanned by a fairly small fixed factor, probably less than ten. So that can be considered, IMO.

I filed bug 27717 ("API's exturlusage module does not respect $wgMiserMode").

  • Bug 48948 has been marked as a duplicate of this bug. ***

Has anything changed in the last few years? I was about to file an enhancement request for a namespace filter on Special:Linksearch but found this.

themfromspace wrote:

(In reply to Scott from comment #14)

Has anything changed in the last few years? I was about to file an
enhancement request for a namespace filter on Special:Linksearch but found
this.

I requested this change a year ago but it looks like nobody's working to add this.

Maybe this can be done the same way as gerrit 117373, by adding a el_from_namespace column

Change 163470 had a related patch set uploaded by Umherirrender:
Add el_from_namespace column to support namespace filter

https://gerrit.wikimedia.org/r/163470

Change 163470 had a related patch set uploaded (by Umherirrender):
Add el_from_namespace column to support namespace filter

https://gerrit.wikimedia.org/r/163470

Patch-For-Review

Change 163470 abandoned by Umherirrender:
[schema] Add el_from_namespace column to support namespace filter

Reason:
No longer support for this one year old patch set

As mentioned some weeks ago, this patch set will not be moved to the release 1.27.
Feel free to copy, change and submit under your own name.

https://gerrit.wikimedia.org/r/163470

The only way to make this efficient is to duplicate page namespace data into the link tables, which would be difficult at best to maintain.

I don't think that's something we want to do at the moment...

Well we do this for pagelinks and friends, so I don't see why we can't do this here.

Delta tells me, though, that the thing people really want this for is to limit to the main namespace, or maybe to content namespaces. This is much less of a problem in practice than arbitrary namespace limitations. The real issue would be someone searching for all *.wikipedia.org links in Portal_talk or something. For real-world queries, limiting to content namespaces should only increase rows scanned by a fairly small fixed factor, probably less than ten. So that can be considered, IMO.

That depends on the distribution of the data, and may or may not be true, depending on what types of pages the url is on. I don't think this is something that can be relied on.

Requested again recently at English Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Noticeboard#Link_searches

Looking around at past discussions and tickets it seems that the API already allows this, the DB table (index) can be expanded to include namespaces, and we're already providing the filtering in other similar Special pages. If someone could assist with this you would make many committed editors very happy.

The API only sort of allows it in miser mode: if you request 500 links, it'll fetch the next 500 regardless of namespace and then return to you only the ones that are in the selected namespace. In some cases, this means you'll get an empty list because none of the 500 were in the requested namespace.

A rough plan might be:

  1. Add an el_from_namespace column to the externallinks table, and a new index on (el_from_namespace,el_index_60, el_id).
  2. Get the schema change deployed. This could be combined into T153182: Perform schema change to add externallinks.el_index_60 to all wikis.
  3. Write/deploy a patch to populate the field. Probably I could just add it to Gerrit change 322728.
  4. Run whatever maintenance script is provided by the previous step on all wikis.
  5. Write/deploy code to actually use the new field, both in LinkSearchPage and ApiQueryExtLinksUsage. Probably I could just add it to Gerrit change 322729.
  6. (cleanup) The patch in step 1 probably set some not-really-useful default value on the new column or left it nullable. Do another schema change to clean that up.