Page MenuHomePhabricator

Replicate gerrit database somewhere to allow free querying
Closed, DeclinedPublic

Description

Chad says that until the lucene stuff is done, we won't have things like full text search (https://code.google.com/p/gerrit/issues/detail?id=866 ) or search by commenter (https://code.google.com/p/gerrit/issues/detail?id=1567 ).
It would be useful for everyone in the community to be able to query such information and more (from a safe environment in the meanwhile).

The best solution seems to be replicating the DB so that it's accessible to all users on Labs or (IMHO better) on Toolserver, which would also allow all sorts of cool web tools.
Chad: «I'm pretty sure we could replicate the DB. Only table I'd want to exclude is probably account_ssh_keys. Other than that, there's nothing really private».
If Toolserver is considered best for WMF end, I'll open a TS ticket too.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=52329

Details

Reference
bz40331

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:58 AM
bzimport added a project: Gerrit.
bzimport set Reference to bz40331.
bzimport added a subscriber: Unknown Object (MLST).

Toolserver is going away. It's always best to do new things on Labs.

The more I've thought about this, the less I think it's a good idea. It's not just having to sanitize a good amount of data (ssh keys, external id table, draft patches/comments), but with each release, Gerrit is deleting more and more tables. Given that, I'd rather not encourage tools relying on data that's being deprecated from day 1.

Instead, I'd recommend writing tools that use stable APIs. The REST API[0] and the query SSH command[1] both provide stable machine-readable data sources that can provide nearly (if not all) data that the database can. And given that the API is being actively expanded and added to, it's not difficult to add new endpoints if there's other data we're after.

So this is a WONTFIX, but with an open invitation for people to use other ways to consume Gerrit's data :)

[0] https://gerrit.wikimedia.org/r/Documentation/rest-api.html
[1] https://gerrit.wikimedia.org/r/Documentation/cmd-query.html

(In reply to comment #2)

And given that
the API is being actively expanded and added to, it's not difficult to add
new endpoints if there's other data we're after.

Thank you, Chad. So what's the way forward to get

(from comment #0)

things like full
text search (https://code.google.com/p/gerrit/issues/detail?id=866 ) or
search
by commenter (https://code.google.com/p/gerrit/issues/detail?id=1567 )

?
Should those bugs be somehow prioritized upstream (and how), or should they be repurposed to affect only the API/queries, or what else?

As for reviewers, now I just used the Toolserver clone of the mediawiki/* repos to access ref/notes/review for one of the use cases I had in mind, working around this and bug 46452: https://www.mediawiki.org/?diff=726041&oldid=726025

The API https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html#list-reviewers would be the same, as far as I understand (just much slower), so the core review that doesn't get "merged" is still out. https://code.google.com/p/gerrit/issues/detail?id=1861

I hope it's not unkind to quote this from #wikimedia-dev:

02.47 < ^d> I've changed my mind, I'm willing to replicate the gerrit db provided 2 things.
02.47 < ^d> A) Anything security that's not public should either be made public or removed, and gerrit stop being used for security patches.
02.48 < ^d> B) I can sanitize one column. There's one column I'm not ok with exposing.
02.50 < ^d> account_external_ids.password
02.51 < ^d> That's used if you generate a password to access gerrit over https :)

Related to bug 52329 being harder than expected.

Re-opening this bug for further consideration.

There's substantial and substantive evidence that providing replicated copies of databases is an enormous benefit to developers and end-users (cf. the Wikimedia Toolserver and Wikimedia Labs). We already have the infrastructure in place to replicate database tables, with filtering of specific columns as necessary.

Unless it is an absolute impossibility, we should open Gerrit's data to the masses and allow others to build exciting and wonderful tools on top of it. :-)

hashar claimed this task.
hashar subscribed.

We are not going to provide any Gerrit database replication. This bug was filled back in 2013 and we have upgraded Gerrit which now has a rather complete REST API https://gerrit.wikimedia.org/r/Documentation/rest-api.html