Page MenuHomePhabricator

[OPS] lucene-search-2 uses too much memory on labs
Closed, DeclinedPublic

Description

search in UI returns nothing

According to the tracking bug, addressing Search via curl() is working, but Search in the UI is not working, see screen shot.


Version: unspecified
Severity: normal

Attached:

Screen_shot_2013-03-21_at_11.16.10_AM.png (240×1 px, 64 KB)

Details

Reference
bz46459

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:40 AM
bzimport set Reference to bz46459.

For the last couple days the php entry points were giving an Error 500 because the Thanks extension was not in mediawiki/extensions.git (that is fixed now). Lucene search poll all the wiki via the OAI extension, that definitely served error 500 page that might have broke the search system.

The two search instances are using puppetmaster::self so their puppet configuration have to be done manually. I have updated them a few hours ago.

Doing a search does not work right now:

http://en.wikipedia.beta.wmflabs.org/w/api.php?format=json&action=opensearch&search=F&namespace=0&suggest=

Gives out:

["F",[]]

Need to investigate the PHP error logs and look at the search box logs.

deployment-search01:~$ curl -x localhost:8123 http://localhost/search/enwiki/Main
curl: (7) couldn't connect to host
deployment-search01:~$

I have restarted lucene-search2 there

Search is working again. What is troublesome is that lucene-search2 should be restart by puppet automatically whenever it dies. I am leaving this bug open to monitor it a bit more.

The lucene process is probably killed by the OOM catcher. We need to tweak the java -Xm parameter to limit the amount of memory being used.

Both en.m.wikipedia.beta.wmflabs.org and en.m.wikipedia.org have San Francisco article:

http://en.m.wikipedia.beta.wmflabs.org/wiki/San_Francisco
http://en.m.wikipedia.org/wiki/San_Francisco

At en.m.wikipedia.org when you enter San in search box, several search suggestions appear (wikipedia.png attachment). No search suggestions appear when the same is done at en.m.wikipedia.beta.wmflabs.org (wmflabs.png attachment).

Created attachment 12054
wikipedia screenshot

Attached:

wikipedia.png (712×645 px, 91 KB)

Created attachment 12055
wmflabs screenshot

Attached:

wmflabs.png (712×644 px, 116 KB)

Rewording the summary. The root cause is the java process asking for 20GB memory on an instance having 4GB.

I have hacked the script locally to limit memory to 2GB. Will see how well it goes then hack the puppet class and init.d script to let us easily tweak the memory settings for lucene.

Command running right now is:

/usr/bin/java -Xmx2000m -Dsun.rmi.transport.tcp.handshakeTimeout=10000 -Djava.rmi.server.codebase=file:///a/search/lucene-search/LuceneSearch.jar -Djava.rmi.server.hostname=deployment-search01 -classpath :/usr/share/java/udp2log-log4j.jar:/a/search/lucene-search/LuceneSearch.jar org.wikimedia.lsearch.config.StartupManager

:)

Taking bug, raising priority. I need to fix that this week.

The deployment-search01 Icinga report is http://icinga.wmflabs.org/cgi-bin/icinga/extinfo.cgi?type=2&host=deployment-search01.pmtpa.wmflabs&service=Lucene+frontend

I have restarted the lucene-search-2 service that was apparently no more listening although there has been no OOM message :-] So we have some progress!

pending ops review, updating summary to reflect that.

Peter has merged the changes and deployed them in production. I have to make sure that works fine in labs and will most probably recreate the existing instances.

Most of the work has been completed, thus lowering priority.

Chad and Nik have migrated beta to CirrusSearch extension which uses an
ElasticSearch backend. Hence this Lucene search bug is no more valid :-)