Page MenuHomePhabricator

CirrusSearch: Move Elasticsearch "search groups" monitoring from cluster level to node level
Closed, ResolvedPublic

Description

Move Elasticsearch "search groups" monitoring from cluster level to node level. The advantage here is that we'd be able to see which server is actually doing the work. Ganglia would still add them together. Right now ganglia adds them together but that doesn't show anything useful because it just multiplies the number by the number of nodes.


Version: unspecified
Severity: normal
Whiteboard: Elasticsearch_1.0
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57210

Details

Reference
bz60979

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:03 AM
bzimport set Reference to bz60979.
bzimport added a subscriber: Unknown Object (MLST).

This is fixed in Elasticsearch 1.0.

The monitoring we're doing now is actually showing up on the hot threads from time to time:

30.9% (154.2ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1012][management][T#3]'
  7/10 snapshots sharing following 10 elements
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.getIndices(IndicesStatsResponse.java:87)
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:156)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:724)
  2/10 snapshots sharing following 18 elements
    org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator._writeFieldName(UTF8JsonGenerator.java:270)
    org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator.writeFieldName(UTF8JsonGenerator.java:249)
    org.elasticsearch.common.xcontent.json.JsonXContentGenerator.writeFieldName(JsonXContentGenerator.java:86)
    org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:242)
    org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:409)
    org.elasticsearch.common.xcontent.XContentBuilder.timeValueField(XContentBuilder.java:857)
    org.elasticsearch.index.search.stats.SearchStats$Stats.toXContent(SearchStats.java:140)
    org.elasticsearch.index.search.stats.SearchStats.toXContent(SearchStats.java:205)
    org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:555)
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:164)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:724)

I suspect this'll go away when we switch to node level. Thus, setting it to high because it will be easy to fix once we go to 1.0.

Change 117037 had a related patch set uploaded by Manybubbles:
Update Elasticsearch monitoring for 1.0

https://gerrit.wikimedia.org/r/117037

Change 117037 merged by Ottomata:
Update Elasticsearch monitoring for 1.0

https://gerrit.wikimedia.org/r/117037