Page MenuHomePhabricator

StructuredDiscussions posts are not indexed in builtin search
Open, HighPublic

Description

This refers to the site search (Special:Search)

Splitting from T59512.

(In reply to bug 57512 comment #12)

unless we can get lucky in being able to find them with the
search engine without false positives

You'd have to get really lucky, since from what I understand, the search
engine
does not index flow posts.

The non-updating of link tables also makes it impossible to search for external
links (e.g. to find past discussions of the reliability of a source), or to
find discussions that link to some page, or to find discussions that use an
image, and so on.

Or to find everywhere the spambots added linkspam to a specific website

(In reply to bug 57512 comment #13)

(In reply to comment #12)

unless we can get lucky in being able to find them with the
search engine without false positives

You'd have to get really lucky, since from what I understand, the search
engine
does not index flow posts.

Sounds like that should be another blocker for bug 60178. Is there a bug for
that yet?


Merge from Trello:

This card is for engineer to pick up Matthias work, answer more questions from checklist below, then meet with Danny/S/Nick and Nik and Chad (and other interested developers) to figure out how to get Flow Topic search results in site search.


Questions to answer
  • how could CirrusSearch index Topic text?
  • how would edits & replies to a Topic trigger CirrusSearch reindexing?
  • Could intitle:math work for topic's title rather than only the Topic: *S0n4m7q632z8ycpv* page title?
  • will CirrusSearch index HTML of pages or wikitext? Note wikitext will match content text expanded using latest templates whereas posts don't use latest templates until re-edited.
  • will Topic search work for wikis using Flow but not CirrusSearch extension?
  • can relevance ranking of search results prefer topic title, then reply text?
  • what about the other Full text search features like prefix and Topic:math ?
  • how can we get Nik and Chad to do all the work?
  • Should https://trello.com/c/dTRhYBdY be a separate card?
  • can site search show topic h2 title instead of Topic: *S0n4m7q632z8ycpv* ?
  • can we support existing prefix:Talk:Some_Flow_board to search within topics that appeared on a particular Flow board?
  • can we support prefix:Talk:Beta_Features/ to search across all the Beta features Flow boards that are subpages of Talk:Beta_Features?
  • will edits made beyond Topic: be included (boards description)

See also:

Details

Reference
bz60493

Related Objects

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:06 AM
bzimport set Reference to bz60493.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

The WMF core features team tracks this bug on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/flow/cards/737, but people from the community are welcome to contribute here and in Gerrit.

Quiddity set Security to None.

High priority, assigned to none? One of the arguments is probably wrong. :)

Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)
Niridya renamed this task from Flow posts are not indexed in builtin search to StructuredDiscussions posts are not indexed in builtin search.Jun 23 2018, 12:39 PM

The way Flow splits a page into a multitude of independent objects is a notable issue. It's common to try to find something based on fragments that come from different comments or different parts of the page. Hopefully this will help answer some of the questions asked in this task. Fragments of a search may include any or all of the following:

  • "noticeboard" matches in page title
  • "dispute resolution" matches in board description
  • "foo" matches in topic title
  • "bar" matches in a comment
  • "johndoe" matches author of a different comment
  • "january" matches in the timestamp of a different comment
  • "baz" matches in topic summary
  • "insource:spammer.com" match hidden within a comment, in the board description, in the topic summary, in the topic title, in a username. An insource search could credibly match in the timestamp, although I find it hard to picture a use case requiring insource while wanting flow_timestamp hits.
  • Plus anything I missed.