Page MenuHomePhabricator

CirrusSearch: Can't reindex commons....
Closed, ResolvedPublic

Description

When I reindex commons, it crashes:
mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki $wiki --reindexAndRemoveOk --indexIdentifier now --reindexProcesses 4 | tee ~/cirrus_log/$wiki.reindex.log
file index...

Setting index identifier...commonswiki_file_1391550673
Creating index...ok
Validating analyzers...ok
Validating mappings...
        Validating mapping for page type...different...corrected
Validating aliases...
        Validating file alias...is taken...
        Reindexing...
                [0] Starting child process reindex
                [1] Starting child process reindex
                [2] Starting child process reindex
                [3] Starting child process reindex
                [0] About to reindex 5044794 documents
                [1] About to reindex 5045574 documents
                [3] About to reindex 5045006 documents
                [2] About to reindex 5043744 documents

....

[0] Reindexed 130000/5044794 documents at 695/second
[2] Reindexed 130000/5043744 documents at 689/second
[1] Reindexed 140000/5045574 documents at 719/second
[0] Reindexed 140000/5044794 documents at 695/second

Warning: Search backend error during reindex. Error message is: Error in one or more bulk request actions:

create: /commonswiki_file_1391550673/page/391846 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8] [page][391846]: document already exists]
create: /commonswiki_file_1391550673/page/392553 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8] [page][392553]: document already exists]
create: /commonswiki_file_1391550673/page/392692 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8] [page][392692]: document already exists]
create: /commonswiki_file_1391550673/page/391163 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8] [page][391163]: document already exists]
create: /commonswiki_file_1391550673/page/391288 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8] [page][391288]: document already exists]
create: /commonswiki_file_1391550673/page/392135 caused DocumentAlreadyExistsException[[commonswiki_file_1391550673][8 in /usr/local/apache/common-local/php-1.23wmf12/includes/de
bug/Debug.php on line 301

From there on out the [3] process is dead and doesn't log anything else.

We should figure out why rather than just catch the exception, log it, and move on. If we can figure out what causes this we can prevent it in the future.


Version: unspecified
Severity: normal

Details

Reference
bz60854

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:55 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz60854.

Assigning to myself to look at in the morning.

I think this is caused by the document being updated while we're scrolling and us hitting it twice. I'm not sure about that yet but I'm working up a commit that'll just insert the second copy on top of the old one.

Change 111450 had a related patch set uploaded by Manybubbles:
Reindex is ok seeing same id twice

https://gerrit.wikimedia.org/r/111450

Change 111450 merged by jenkins-bot:
Reindex is ok seeing same id twice

https://gerrit.wikimedia.org/r/111450