Page MenuHomePhabricator

Lucene results order varies when lucene scores are equivalent
Closed, DeclinedPublic

Details

Reference
bz32026

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:59 PM
bzimport set Reference to bz32026.

rainman wrote:

The indexes are currently up to data. The problem here is that multiple results have exactly the same score:

689
1.1514114 0 Flamingo_tongue_snail
1.14319 0 Oxygyrus_keraudrenii
1.1418508 0 Atlanta_lesueurii
1.1418508 0 Tenagodus_barbadensis
1.1418508 0 Atlanta_brunnea
1.1418508 0 Littoraria_irrorata
1.1410456 0 Atlanta_pulchella
1.1410456 0 Caecum_clava
1.1410456 0 Hypselodoris_acriba
1.1410456 0 Calliostoma_adelae
1.1410456 0 Alvania_dejongi
1.1410456 0 Benthonellania_xanthias
1.1410456 0 Cardiapoda_placenta
1.1410456 0 Bursa_rhodostoma
1.1410456 0 Capulus_subcompressus
1.1410456 0 Cerithiopsis_lata
1.1410456 0 Cerithiopsis_flava
1.1410456 0 Cerithium_guinaicum
1.1410456 0 Cerithiopsis_merida
1.1410456 0 Cerithiopsis_georgiana

For some reason, the results in some cases have their score rounded to 5 decimals, and in some cases to 7 decimals. Not sure why this is happening, since the scores are calculated on the same machine. Could be some lucene query caching weirdness.

rainman wrote:

So it seems on the same machine the order is always the same, although the scores are not always rounded the same. Previous comment was from results on search1, this is running the same search on search4 a couple of times

curl http://search4:8123/search/enwiki/10.1371%2Fjournal.pone.0008776?version=2
689
1.1514115 0 Flamingo_tongue_snail
1.1431901 0 Oxygyrus_keraudrenii
1.141851 0 Atlanta_brunnea
1.141851 0 Littoraria_irrorata
1.141851 0 Atlanta_lesueurii
1.141851 0 Tenagodus_barbadensis
1.1410457 0 Bulbus_carcellesi
1.1410457 0 Caecum_circumvolutum
1.1410457 0 Caecum_multicostatum
1.1410457 0 Calliostoma_javanicum
1.1410457 0 Alvania_verrilli
1.1410457 0 Bursa_natalensis
1.1410457 0 Bursa_corrugata
1.1410457 0 Caecum_insularum
1.1410457 0 Cerithiopsis_academicorum
1.1410457 0 Cerithioclava_garciai
1.1410457 0 Cerithiopsis_fuscoflava
1.1410457 0 Cerithiopsis_guitarti
1.1410457 0 Copulabyssia_riosi
1.1410457 0 Crucibulum_auricula

curl http://search4:8123/search/enwiki/10.1371%2Fjournal.pone.0008776?version=2
689
1.1514114 0 Flamingo_tongue_snail
1.14319 0 Oxygyrus_keraudrenii
1.1418508 0 Atlanta_brunnea
1.1418508 0 Littoraria_irrorata
1.1418508 0 Atlanta_lesueurii
1.1418508 0 Tenagodus_barbadensis
1.1410456 0 Bulbus_carcellesi
1.1410456 0 Caecum_circumvolutum
1.1410456 0 Caecum_multicostatum
1.1410456 0 Calliostoma_javanicum
1.1410456 0 Alvania_verrilli
1.1410456 0 Bursa_natalensis
1.1410456 0 Bursa_corrugata
1.1410456 0 Caecum_insularum
1.1410456 0 Cerithiopsis_academicorum
1.1410456 0 Cerithioclava_garciai
1.1410456 0 Cerithiopsis_fuscoflava
1.1410456 0 Cerithiopsis_guitarti
1.1410456 0 Copulabyssia_riosi
1.1410456 0 Crucibulum_auricula

josowski wrote:

Is there a way for me (On the other side of the load balancer) to keep using the same server?

Are the machine architectures and program versions the same? That reminds me the issue with php which depended on 32bit or 64bit.

josowski wrote:

Has this been resolved? I'm no longer replicating it.

josowski wrote:

Nevermind, still happening

This is still an issue, and the thread at http://lists.wikimedia.org/pipermail/mediawiki-api/2011-October/002420.html and followup implies that some indexes could be out of sync. Or not.

Is there a way to reproduce this from the regular Special:Search interface, how a regular user would find out?

I'm just asking because I volunteered to write a test automation scenario to keep observing this problem if/when gets fixed. In the context of https://www.mediawiki.org/wiki/QA/Browser_testing/Search_features

Just rerun comment 2 against multiple backends and check if it varies or not?

(Nothing here to fix by ops but in Lucene code instead. Removing "ops" keyword.)

Wont be fixing this, lsearchd has reached its end of life.