Page MenuHomePhabricator

Article pages made from index pages not listed in wikisource search results.
Closed, ResolvedPublic

Description

Author: c.balasankar

Description:
A page which is made from an index page using <page index> tag, is not listed in the search result when a text from that page is searched for. Only the index page containing that text is listed in the search results. Since <page index=> tag in a page creates pointers to the specific index pages and actually there is no real text in that page, it will not be listed in the search pages.
Suggestion to redefine <page index> tag or define a new tag so as to list those pages also in the search results. If an index page is listed in the search result, all the pages which uses that index page as a source should also be listed.


Version: wmf-deployment
Severity: major

Details

Reference
bz43681

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:24 AM
bzimport set Reference to bz43681.
bzimport added a subscriber: Unknown Object (MLST).

I think, much improvements required in Internal Search system of Mediawiki.

For example, searching for the phrase മൈസൂർ രാജ്യത്തിന്റെ ഉത്തരപരിധിയിൽനിന്ന് in google - http://goo.gl/RrTyj - gives results from ml.wikisource. Both Main and Page namespaces are listed in the result.

But searching inside the wiki yields no results.

(In reply to comment #0)

A page which is made from an index page using <page index> tag, is not listed
in the search result when a text from that page is searched for.

Could you provide / link to an explicit existing example, to make it easier to reproduce?

sunil: Is your comment and example related to comment 0 (asking as you don't mention index pages), or more like a general complaint about the search?

c.balasankar wrote:

(In reply to comment #2)

(In reply to comment #0)

A page which is made from an index page using <page index> tag, is not listed
in the search result when a text from that page is searched for.

Could you provide / link to an explicit existing example, to make it easier
to
reproduce?

An example:

A search for "ഇനിയുമുണ്ടൊരു ജന്മ" gave the following result - https://ml.wikisource.org/w/index.php?title=%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B4%82:%E0%B4%85%E0%B4%A8%E0%B5%8D%E0%B4%B5%E0%B5%87%E0%B4%B7%E0%B4%A3%E0%B4%82&search=%E0%B4%87%E0%B4%A8%E0%B4%BF%E0%B4%AF%E0%B5%81%E0%B4%AE%E0%B5%81%E0%B4%A3%E0%B5%8D%E0%B4%9F%E0%B5%8A%E0%B4%B0%E0%B5%81%20%E0%B4%9C%E0%B4%A8%E0%B5%8D%E0%B4%AE&fulltext=%E0%B4%A4%E0%B4%BF%E0%B4%B0%E0%B4%AF%E0%B5%82&profile=all&redirs=1

Here, this index page is listed in the search result - https://ml.wikisource.org/wiki/%E0%B4%A4%E0%B4%BE%E0%B5%BE:%E0%B4%87%E0%B4%9F%E0%B4%AA%E0%B5%8D%E0%B4%AA%E0%B4%B3%E0%B5%8D%E0%B4%B3%E0%B4%BF_%E0%B4%B8%E0%B4%AE%E0%B5%8D%E0%B4%AA%E0%B5%82%E0%B5%BC%E0%B4%A3%E0%B5%8D%E0%B4%A3_%E0%B4%95%E0%B5%83%E0%B4%A4%E0%B4%BF%E0%B4%95%E0%B5%BE.pdf/175

But, this page (which was created including the above specified index page) also contains the same text, but is not listed in the search result - https://ml.wikisource.org/wiki/%E0%B4%AE%E0%B4%A3%E0%B4%BF%E0%B4%A8%E0%B4%BE%E0%B4%A6%E0%B4%82

This is the issue with all the pages created from index pages.

(In reply to comment #3)
താൾ in ml.wikisource is the PAGE namespace not INDEX

The issue reported in the initial comment 0 seems to be fixed for en.wikisource using Cirrus. I'm less sure about mlwiki.

This should all be resolved with cirrus search. I believe that this is a dead bug.

Aklapper changed the task status from Open to Stalled.Jun 1 2016, 11:35 AM
Aklapper added a project: TestMe.
Aklapper edited subscribers, added: Aklapper; removed: demon.

@c.balasankar, @Vssun:

Searching for ഇനിയുമുണ്ടൊരു ജന്മ on ml.wikisource.org, I get:
താങ്കൾ തിരഞ്ഞ പദത്തിനു യോജിച്ച ഫലങ്ങളൊന്നും ലഭിച്ചില്ല. (There were no results matching the query.)

If this problem still happens with CirrusSearch (we replaced the search engine in the meantime), could someone provide an updated testcase (link), and separately list the expected result and the actual result? Thank you a lot!

To extrapolate. the issue was that search did not find the content of transcluded pages, be they transcluded by <pages> or as templates. Cirrus search indexes on transcluded pages not on the page text directly.

Aklapper claimed this task.
Aklapper removed a project: TestMe.

If I get the last comment correctly this is fixed nowadays by using CirrusSearch. Hence closing as resolved. Please correct me if I misunderstood. Thank you!