Page MenuHomePhabricator

Hive queries can bring load on cluster slaves > #CPUs
Closed, DeclinedPublic

Assigned To
None
Authored By
QChris
Mar 28 2014, 11:09 AM
Referenced Files
F12929: ua_device3.sql
Nov 22 2014, 2:54 AM
F12930: ua_os.sql
Nov 22 2014, 2:54 AM
F12928: ua_browser.sql
Nov 22 2014, 2:54 AM
F12927: ganglia-analytics1019.wmf
Nov 22 2014, 2:54 AM

Description

Load graph for analytics 1018 while running the queries (2014-03-28)

When running three hive queries on a week's worth of mobile request data,
load on the Hadoop cluster nodes rises above the number of CPUs.

Should we limit the resources that Hive/Hadoop can take on those machines?


Version: unspecified
Severity: normal

Attached:

ganglia-analytics1018.wmf (225×397 px, 21 KB)

Details

Reference
bz63222

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:54 AM
bzimport set Reference to bz63222.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1503

Created attachment 14955
Load graph for analytics 1019 while running the queries (2014-03-28)

Attached:

ganglia-analytics1019.wmf (225×397 px, 21 KB)

Created attachment 14956
SQL for first query

Attached:

Created attachment 14957
SQL for second query

Attached:

Created attachment 14958
SQL for third query

Attached:

Agree -- perhaps we need to tune the number of slots for mappers/reducers?