Page MenuHomePhabricator

Hive stats not working
Closed, DeclinedPublic

Description

When debugging query for Ironholds I encountered the following error when running

ADD JAR /usr/lib/hcatalog/share/hcatalog/hcatalog-core-0.5.0-cdh4.3.1.jar;

INSERT OVERWRITE TABLE diederik.distinct_ip SELECT distip FROM (SELECT ip AS distip, COUNT(*) as count FROM wmf.webrequest_mobile WHERE year = 2014 AND month = 1 AND day = 20 AND content_type IN ('text/html\; charset=utf-8','text/html\; charset=iso-8859-1','text/html\; charset=UTF-8','text/html') GROUP BY ip HAVING COUNT(*) >= 2) sub1 LIMIT 10000;

This is not a fatal error, the query finishes successful but it's probably good to have stats on Hive usage :)
2014-02-12 18:37:51.085 GMT Thread[Thread-28,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
2014-02-12 18:37:51.161 GMT Thread[Thread-28,5,main] Cleanup action starting
java.sql.SQLException: Failed to create database 'TempStatsStore', see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.createDatabase(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(Unknown Source)
at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.init(JDBCStatsPublisher.java:265)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:436)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
Caused by: java.sql.SQLException: Failed to create database 'TempStatsStore', see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 18 more
Caused by: java.sql.SQLException: Directory /var/log/hadoop-mapreduce/TempStatsStore cannot be created.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
... 15 more
Caused by: ERROR XBM0H: Directory /var/log/hadoop-mapreduce/TempStatsStore cannot be created.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$9.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.services.monitor.StorageFactoryService.createServiceRoot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.createPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.createPersistentService(Unknown Source)
... 15 more


Version: unspecified
Severity: normal

Details

Reference
bz61279

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:05 AM
bzimport set Reference to bz61279.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1447

kevinator claimed this task.
kevinator added a subscriber: Ottomata.

We have a new cluster since this bug was logged, and I don't think it's an issue anymore or even reproducible.
@QChris @Ottomata please confirm.

kevinator set Security to None.
QChris changed the task status from Declined to Resolved.Jan 4 2015, 7:06 PM

Querying the tables works.
Automatic stats collection is expected to still fail, but automatic stats collection is turned off since d3c09569b0ca902fd840cae6376425ea93c9d381.

It's not really "resolved", but not "invalid" either. Going with "resolved", as it is more positive.

QChris changed the task status from Resolved to Declined.Jan 4 2015, 7:08 PM

(Did not see that the task already was "Declined". Declined is a good status.)