Page MenuHomePhabricator

Deleted items contaminating search results
Closed, ResolvedPublic

Description

On opening the above URL (you might have to reload the page a few times to get the text to appear - this might be another bug) you'll find that the last five results, being:

[[Image:3 accelerators 17-59. PILCURE-MBT,MBTS,ZMBT,F,CBS,NS,MOR,DCBS,TMT,ZDMC,ZDC,ZDBC,SDBC,ZDBzC-.pdf]]

[[Image:Minnesota Educational Computing Consortium Quick Reference Guide for BASIC Language Version 3.1 MECC TIMESHARE SYSTEM Rev. 2 slash 78.pdf]]

[[Image:Partial unilateral ureteropelvic obstruction in neonatal pigs - Effect of acute inhibition of angiotensin II AT1-receptors on GFR and sodium handling.pdf]]

[[Image:Strategy for improving genetic aspects of fertility and hatchability in breeding lines of White Leghorns, and choosing hens for second cycle of production.pdf]]

[[Image:Buddha's teachings in a NUTSHELL(Explains why Buddha did not answer Questions pertaining to eternal God, NON-SOUL theory (anatta) and his basic teachings.pdf]]

... as well as various others, are deleted and have been so for quite a long time. Deleted items should not appear in search results.

The user interface search - url: http://en.wikipedia.org/w/index.php?title=Special:Search&limit=500&offset=7000&ns6=1&redirs=1&search=.pdf (once again, you might have to reload multiple times) - is somewhat better behaved as in it doesn't show these deleted items but when you view the page source you find comments such as:

<!-- missing page Image:3 accelerators 17-59. PILCURE-MBT,MBTS,ZMBT,F,CBS,NS,MOR,DCBS,TMT,ZDMC,ZDC,ZDBC,SDBC,ZDBzC-.pdf-->
<!-- missing page Image:Minnesota Educational Computing Consortium Quick Reference Guide for BASIC Language Version 3.1 MECC TIMESHARE SYSTEM Rev. 2 slash 78.pdf-->
<!-- missing page Image:Partial unilateral ureteropelvic obstruction in neonatal pigs - Effect of acute inhibition of angiotensin II AT1-receptors on GFR and sodium handling.pdf-->
<!-- missing page Image:Strategy for improving genetic aspects of fertility and hatchability in breeding lines of White Leghorns, and choosing hens for second cycle of production.pdf-->
<!-- missing page Image:Buddha's teachings in a NUTSHELL(Explains why Buddha did not answer Questions pertaining to eternal God, NON-SOUL theory (anatta) and his basic teachings.pdf-->


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=.pdf&srnamespace=6&sroffset=7000&srlimit=500

Details

Reference
bz13792

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:08 PM
bzimport set Reference to bz13792.
bzimport added a subscriber: Unknown Object (MLST).

Stupid long PDF names. The files are:

http://en.wikipedia.org/wiki/Image:3 accelerators 17-59.PILCURE-MBT,MBTS,ZMBT,F,CBS,NS,MOR,DCBS,TMT,ZDMC,ZDC,ZDBC,SDBC,ZDBzC-.pdf
http://en.wikipedia.org/wiki/Image:Minnesota Educational Computing Consortium Quick Reference Guide for BASIC Language Version 3.1 MECC TIMESHARE SYSTEM Rev. 2 slash 78.pdf
http://en.wikipedia.org/wiki/Image:Partial unilateral ureteropelvic obstruction in neonatal pigs - Effect of acute inhibition of angiotensin II AT1-receptors on GFR and sodium handling.pdf
http://en.wikipedia.org/wiki/Image:Strategy for improving genetic aspects of fertility and hatchability in breeding lines of White Leghorns, and choosing hens for second cycle of production.pdf
http://en.wikipedia.org/wiki/Image:Buddha's teachings in a NUTSHELL(Explains why Buddha did not answer Questions pertaining to eternal God, NON-SOUL theory (anatta) and his basic teachings.pdf

Disregard the above comment, Bugzilla is being annoying. Someone better give the Bugzilla devs a kick: https://bugzilla.mozilla.org/show_bug.cgi?id=40896 .

rainman wrote:

This has been fixed with r32742, so newly deleted files won't show up in search results. However, since this is an old bug, the search index is full of old entries and needs a rebuild. We will be shortly update the whole search backend and have this fully fixed.

Bryan.TongMinh wrote:

Sorry for the previous mail, forgot to click the assign option.

Assigning to self.

Bryan.TongMinh wrote:

Fixed in r33608. Broken titles are now silently skipped in API search results.

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]