Page MenuHomePhabricator

Gerrit: HTTP 500 "Guice provision errors: Cannot open ReviewDb"
Closed, InvalidPublic

Description

Screenshot of error page

Location: https://gerrit.wikimedia.org/r/#/c/62207/


HTTP ERROR: 500

Problem accessing /r/. Reason:

Guice provision errors:
  1. Cannot open ReviewDb at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:71) while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error

Powered by Jetty://


Version: wmf-deployment
Severity: major

Attached:

Screen_Shot_2013-05-04_at_1.33.49_AM.png (578×1 px, 97 KB)

Details

Reference
bz48061

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:37 AM
bzimport added a project: Gerrit.
bzimport set Reference to bz48061.
bzimport added a subscriber: Unknown Object (MLST).

Looks like this has escalated. Most users don't even get an error, it just times out. Gerrit has become completely unresponsive.

Reopening since restarting isn't a solution. Will take a look at the logs.

In case this becomes relevant at some point of investigating the root cause:
~2 hours before this happened MatmaRex told me that gerrit was really slow
to respond. Not having access to logs, I checked ganglia, and the graphs
showed the usual increase around that time of the day.

The now reported error message surfaces if gerrit could connect to the
database at startup but now cannot open new connections to the
database.

Do the logs show a root cause?

Chad: Do the logs show a root cause?

(In reply to comment #3)

Gerrit was restared, Seems to be working now.

Lowering priority.

Created attachment 12371
Proxy error, usually accompanies the previously captured Guice error

The problem is back.

Attached:

Screen_Shot_2013-05-22_at_8.50.46_PM.png (596×1 px, 74 KB)

Restart of Gerrit restored functionality for now.

The following error appeared today (a few minutes ago):

Guice provision errors:

  1. Cannot open ReviewDb at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:71) while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error

[Lowering priority; no recent incidents.]

And its back coming to eat more innocent souls:

Guice provision errors:

  1. Cannot open ReviewDb at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error

There's no one bug here so I'm not sure what this is supposed to accomplish.

"Don't break the database connection with bad config, networking hiccups or otherwise crash the ability of Gerrit to talk to its DB?"

Happened again just now:

Guice provision errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
Paladox claimed this task.
Paladox subscribed.

The problem seems to not happen any more in over 1 and half years.

We are upgrading to gerrit 2.12 soon so maybe that will finally fix anything.

Aklapper changed the task status from Resolved to Declined.EditedJul 13 2016, 1:15 PM

@Paladox: Noone is aware of any fix hence this is not resolved... Declining for the time being.

Please reopen if still happening with 2.12 (to be deployed soon).

Yeah, but we are talking about upgrading gerrit.wikimedia.org, not some other subdomain. Plus not really relevant for this task.

This was not fixed on the existing install and should not be declined either because it's a real bug. Can resolve after upgrade.

So I'm closing this as invalid. Couple of things:

  • "Guice provision error," while an opaque error, is what Gerrit is going to throw when it can't connect to the database. There's not much we can do here except make sure it can connect. Upgrading, while great and maybe fixes *some* connection bug somewhere, doesn't actually fix the complaint here.
  • Showing a generic 503 page when Gerrit is having troubles (apache proxy can't talk to it) is confusing and not useful to users. I fixed this so we show a nicer downtime page now.

I'm not sure what else is actionable here.