Page MenuHomePhabricator

List most popular queries on search page
Closed, DeclinedPublic

Description

Author: Sebastian.Dietrich

Description:
To improve the content of a wiki it would be usefull to provide search
statistics. i.e. statistics on what was searched for.

e.g. "most searched items":

searched item # searches
wiki 15
mediawiki 13
wikipedia 10


Version: unspecified
Severity: enhancement

Details

Reference
bz7886

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:27 PM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz7886.
bzimport added a subscriber: Unknown Object (MLST).

Moving this request to CirrusSearch, where we are focusing our Search related development currently.

This feature request is really cool! Unfortunately, it's not going to happen.

I am very, very uncomfortable with the idea of us logging what people are typing into the search box. There are a lot of privacy implications. Right now, if we're subject to a subpoena on this information, we can just say "Oh, well, we don't have that information, so we can't give it to you". We'd lose that legal defence, and be forced to give the information out. Also, it's a bit creepy.

I'm WONTFIXing this. Sorry. :-(

I don't think it's an awful idea if we do it in aggregate. We don't want to store per-user search data, sure.

But having a "what are people looking for" could be cool. Or very boring, if it's like the same top-10 things all the time.

Worth looking at :)

wmf.amgine3691 wrote:

Also, like RC, it can be limited to a certain pool or buffer, like the last 24 hours, or the most recent 10,000 searches. Maybe "top five trending searches, rising and falling", which at least will have a very high churn. Or something slightly less expensive. Accuracy is less important than 'nifty'.

Just testing Bug 64373 -- sorry for the noise.

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

In theory we could do this, but there's two problems:

  1. Having looked at the data, we can say with absolute confidence that it is not very interesting at all. It tends to be dominated by bots and other random queries.
  2. When displaying arbitrary user input there is always the risk of releasing personally identifiable information.

The risk of releasing personally identifiable information is not in and of itself necessarily a reason to not go ahead with this, but the nonexistent reward due to this data being useless definitely does not outweigh the risk.