Page MenuHomePhabricator

solr stress out the NFS server
Closed, DeclinedPublic

Description

This morning we found out both beta and tools projects were "slow". The root cause is some jobs running on the solr project which exhaust the NFS server (labstore3) I/O operations.

The workaround is to reboot the instance solr-mw2.pmtpa.wmflabs to disable the stressing job as instructed by Nikolas Everett http://lists.wikimedia.org/pipermail/labs-l/2013-July/001381.html

The Solr experiment should be run on a different NFS system than the shared one. I guess a dedicated one.


Version: unspecified
Severity: normal

Details

Reference
bz51350

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:56 AM
bzimport set Reference to bz51350.

Created attachment 12846
network usage of solr project between 7/13/2013 8:00 and 7/13/2013 20:00

Attached:

labs-solr.png (211×397 px, 14 KB)

Created attachment 12847
Ganglia CPU report of labstore 3 between 7/13/2013 8:00 and 7/13/2013 20:00

Attached:

labstore3_cpu_report.png (259×397 px, 15 KB)

So in accordance with the above I rebooted solr-mw2 and load on labstore3 looks much better.

I should let everyone know what I'm doing:

I'm loading a copy of enwiki with all the current text (as of some backup) so I can index it. No historical revisions as we won't be indexing them. I've been told that I can't use a prod replica because it won't contain any text.

What is actually killing the NFS server is mysqld, not solr, elasticsearch, or any other new system. I make no claims that those systems wouldn't put a similar load on nfs at some point in the future though.

I'd run this on my local system then I wouldn't be able to properly interact with other systems in labs that I need for the experiment.

Closing this since Elastic Search has been deployed in production so I guess there is less need nowadays to load huge amount of data in labs instance.