CirrusSearch: Move Elasticsearch "search groups" monitoring from cluster level to node level
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• Manybubbles
	Feb 6 2014, 7:56 PM

Description

Move Elasticsearch "search groups" monitoring from cluster level to node level. The advantage here is that we'd be able to see which server is actually doing the work. Ganglia would still add them together. Right now ganglia adds them together but that doesn't show anything useful because it just multiplies the number by the number of nodes.

Version: unspecified
Severity: normal
Whiteboard: Elasticsearch_1.0
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57210

Details

Reference: bz60979

Related Objects

Mentioned In: T59210: CirrusSearch: Improve elasticsearch monitoring

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 3:03 AM

• bzimport added projects: CirrusSearch, Upstream.

• bzimport set Reference to bz60979.

• bzimport added a subscriber: Unknown Object (MLST).

• Manybubbles created this task.Feb 6 2014, 7:56 PM

This is fixed in Elasticsearch 1.0.

The monitoring we're doing now is actually showing up on the hot threads from time to time:

30.9% (154.2ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1012][management][T#3]'
  7/10 snapshots sharing following 10 elements
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.getIndices(IndicesStatsResponse.java:87)
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:156)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:724)
  2/10 snapshots sharing following 18 elements
    org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator._writeFieldName(UTF8JsonGenerator.java:270)
    org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator.writeFieldName(UTF8JsonGenerator.java:249)
    org.elasticsearch.common.xcontent.json.JsonXContentGenerator.writeFieldName(JsonXContentGenerator.java:86)
    org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:242)
    org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:409)
    org.elasticsearch.common.xcontent.XContentBuilder.timeValueField(XContentBuilder.java:857)
    org.elasticsearch.index.search.stats.SearchStats$Stats.toXContent(SearchStats.java:140)
    org.elasticsearch.index.search.stats.SearchStats.toXContent(SearchStats.java:205)
    org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:555)
    org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:164)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311)
    org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273)
    org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:724)

I suspect this'll go away when we switch to node level. Thus, setting it to high because it will be easy to fix once we go to 1.0.

Change 117037 had a related patch set uploaded by Manybubbles:
Update Elasticsearch monitoring for 1.0

https://gerrit.wikimedia.org/r/117037

Change 117037 merged by Ottomata:
Update Elasticsearch monitoring for 1.0

https://gerrit.wikimedia.org/r/117037

• Deskana moved this task from Inbox to Resolved/Invalid/Declined/Legacy on the CirrusSearch board.Apr 20 2015, 4:08 AM

Liuxinyu970226 mentioned this in T59210: CirrusSearch: Improve elasticsearch monitoring.Mar 16 2017, 3:04 PM

CirrusSearch: Move Elasticsearch "search groups" monitoring from cluster level to node levelClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

CirrusSearch: Move Elasticsearch "search groups" monitoring from cluster level to node level
Closed, ResolvedPublic
Actions