Page MenuHomePhabricator

deleted page shows up in search results on Wikidata
Closed, ResolvedPublic

Description

A user on Wikidata reported that the item Q9028640 is still showing up in search results despite being deleted. https://www.wikidata.org/w/index.php?title=Special%3ASearch&profile=default&search=Q9028640&fulltext=Search

I don't know if it is just this one item or a larger issue.


Version: master
Severity: major

Details

Reference
bz61464

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:00 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz61464.

I can't seem to replicate outside of the given example.

See for example Q15831339. https://www.wikidata.org/w/index.php?title=Special%3ASearch&profile=default&search=Q15831339&fulltext=Search shows no results, as expected.

Created attachment 14765
When the deleted page is restored, two results appear in the search page.

Attached:

Screen_Shot_2014-03-06_at_13.32.26.png (1×1 px, 255 KB)

Created attachment 14766
All modifications to the restored page are ignored by the old result, except modifications to the description

Attached:

Screen_Shot_2014-03-06_at_13.40.18.png (1×1 px, 272 KB)

I too am unable to replicate this bug outside the given example.

I attempted to restore and then redelete the page in question to see if this fixed the problem. It did not, but in fact exposed more strangeness.

If the deleted page is restored, two results appear when you search for the page instead of one. The first gives the current revision of the page, and the second buggy one is tied to a specific revision on 21 November 2013, which was the most recent revision before the page was deleted. See attachment 1 for screenshot.

If the page is edited, then the information for the first entry updates, but the second one does not as it's tied to the out of date revision. Interestingly this applies to everything except the description, in which changes to this affect *both* versions of the page. See attachment 2 for screenshot.

I'm still unsure what caused the page to still be indexed after it was deleted. Perhaps we should add some code somewhere to ensure that each page can have an entry in the index?

Oh dear, I didn't think Bugzilla would link like that. Obviously I meant attachment 14765 and attachment 14766, respectively.

Change 121888 had a related patch set uploaded by Chad:
Protect against missing pages better

https://gerrit.wikimedia.org/r/121888

Change 121888 merged by jenkins-bot:
Protect against missing pages better

https://gerrit.wikimedia.org/r/121888

So we protect against this sort of thing better than before and we're more aggressive about reattempting failed deletions. I've never been able to run down a root cause of this bug (or really replicate it outside of the one instance).

If someone from Wikidata could take a look at this again and get an idea of the current status of the bug that would be great. Otherwise I'm inclined to close it as FIXED (it was a semi-freak occurrence that we've since protected against better)

Feel free to close then. Thanks for investigating.

Marking FIXED per above. Anyone please feel free to reopen if we find evidence of this again.