Page MenuHomePhabricator

Provide reviewer counts per Gerrit changeset in batch form
Closed, ResolvedPublic

Description

The Gerrit API is somewhat limited in that retrieving the number of reviewers (a reviewers count) for each changeset requires individual API requests. At somewhere around 76,000 changesets currently (and growing), this isn't a practical system for batch-downloading reviewer counts for all Gerrit changesets.

In discussions with Christian and Chad, a cron job via puppet that executes a simple SQL query and outputs the results of such a query to a text file (or .tsv file, most likely) is probably the best approach to take here.


Version: wmf-deployment
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=40331

Details

Reference
bz52329

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:07 AM
bzimport added projects: Gerrit, acl*sre-team.
bzimport set Reference to bz52329.
bzimport added a subscriber: Unknown Object (MLST).

Change 76945 had a related patch set uploaded by Demon:
Provide reviewer counts per patch

https://gerrit.wikimedia.org/r/76945

Change 76945 merged by Ryan Lane:
Provide reviewer counts per patch

https://gerrit.wikimedia.org/r/76945

It's 02:56 UTC right now:

$ curl https://gerrit.wikimedia.org/reviewer-counts.json
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /reviewer-counts.json was not found on this server.</p>
</body></html>

:-(

Tangentially: I don't love the idea of writing into the root directory here. I realize that other jobs already do (e.g., https://gerrit.wikimedia.org/mediawiki-extensions.txt), but it may make sense to create a separate directory for this.

(In reply to comment #2)

Change 76945 merged by Ryan Lane:
Provide reviewer counts per patch

https://gerrit.wikimedia.org/r/76945

Copying Daniel Z.'s comment here:


this doesn't appear to work and the reason:

fatal: gerrit2 does not have "accessDatabase" capability.

(In reply to comment #4)

Tangentially: I don't love the idea of writing into the root directory here.
I
realize that other jobs already do (e.g.,
https://gerrit.wikimedia.org/mediawiki-extensions.txt), but it may make
sense
to create a separate directory for this.

Maybe something like gerrit.wm.o/reports/foo?

(In reply to comment #5)

(In reply to comment #2)

Change 76945 merged by Ryan Lane:
Provide reviewer counts per patch

https://gerrit.wikimedia.org/r/76945

Copying Daniel Z.'s comment here:


this doesn't appear to work and the reason:

fatal: gerrit2 does not have "accessDatabase" capability.

Whoops yeah that's true. Easily fixed and I'll do it when I'm in front of my computer again.

(In reply to comment #6)

Maybe something like gerrit.wm.o/reports/foo?

Yup.

Whoops yeah that's true. Easily fixed and I'll do it when I'm in front of my
computer again.

Yay. :-)

(In reply to comment #6)

(In reply to comment #5)

fatal: gerrit2 does not have "accessDatabase" capability.

[...] I'll do it when I'm in front of my
computer again.

I granted the capability in the mean time.

$ curl https://gerrit.wikimedia.org/reviewer-counts.json
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /reviewer-counts.json was not found on this server.</p>
</body></html>


What are next steps here?

Chad or Christian: can you please take a look at this bug when you get a chance? I imagine it'll require either running (or finding someone to run) this command from the command line to see what error is now output.

gerrit2 couldn't write to the output file. I fixed this, reviewer counts are now up: https://gerrit.wikimedia.org/reviewer-counts.js

Should now update every weekend.

(In reply to comment #11)

gerrit2 couldn't write to the output file. I fixed this, reviewer counts are
now up: https://gerrit.wikimedia.org/reviewer-counts.js

As discussed, this file was moved to https://gerrit.wikimedia.org/reviewer-counts.json. :-) Thanks for working on this.

Should now update every weekend.

I believe you mean daily.

Christian: Chad and I looked over https://gerrit.wikimedia.org/reviewer-counts.json and it seems to be missing about 6,100 changesets.

From http://p.defau.lt/?XeY_5g8lrYwjWblQzZVGrQ:

gerrit> select count(*) from changes;
count(*)


78406

gerrit> select count(*) from change_id;
count(*)


72305

Do you know why the change_id table is missing thousands of changesets? Can the query be adjusted to include all changesets?

More importantly, whatever is generating this .json file isn't outputting valid JSON. It doesn't load into Python's json.loads() and http://jsonlint.com/ says it's invalid.

The Python error:


Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
  return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
  raise ValueError(errmsg("Extra data", s, end, len(s)))

ValueError: Extra data: line 2 column 1 - line 72311 column 1 (char 67 - 4914008)

The JSONLint error:


Parse error on line 7:
..._count": "2" }}{ "type": "row",
---------------------^

Expecting 'EOF', '}', ',', ']'

No time to read the bug now, but a note in case: the count should not include the patch owner and jenkins-bot.

Change 78944 had a related patch set uploaded by QChris:
Switch to changes table for gerrit's reviewer count file

https://gerrit.wikimedia.org/r/78944

Change 79584 had a related patch set uploaded by QChris:
Add gsql format that returns result set as single Json object

https://gerrit.wikimedia.org/r/79584

Change 79584 merged by Demon:
Add gsql format that returns result set as single Json object

https://gerrit.wikimedia.org/r/79584

+ops because that's what the patch is waiting for.

Change 78944 merged by Ottomata:
Switch to changes table for gerrit's reviewer count file

https://gerrit.wikimedia.org/r/78944

(In reply to comment #19)

Change 78944 merged by Ottomata:
Switch to changes table for gerrit's reviewer count file

https://gerrit.wikimedia.org/r/78944

Thanks, Otto!

Christian: we still need the JSON --> JSON_SINGLE change, I think? After that gets merged, I think this bug will be resolved/fixed. :-)

Change 84743 had a related patch set uploaded by QChris:
Switch to single Json object for gerrit's reviewer count query

https://gerrit.wikimedia.org/r/84743

(In reply to comment #20)

Christian: we still need the JSON --> JSON_SINGLE change, I think?

See comment 21 :-)

After that
gets merged, I think this bug will be resolved/fixed. :-)

Before we can merge that, we need to upgrade to a gerrit that actually understands
JSON_SINGLE. The war file is prepared, and only needs to be put in place.
I pinged ^demon about that.

[removing ops keyword again as patch in comment 21 is -1'ed currently.]

Change 84743 merged by Ori.livneh:
Change output format of Gerrit review count gsql to JSON_SINGLE

https://gerrit.wikimedia.org/r/84743

Actually, there's a straightforward way to improve on Ifbb8176fc that I'd like to see. Instead of redirecting the output to a file in /var/www, make a few assertions about the gsql output first -- that consists of a valid, single JSON object (and no more) and that it has the keys you expect it to have. You might want to write a simple script to do that rather than try to fit it in a single command line in the manifest. It should be straightforward to do. Christian, do you think you could do that sometime soon?

So, isn't https://gerrit.wikimedia.org/reviewer-counts.json supposed to contain something now? It's still empty.

(In reply to comment #26)

So, isn't https://gerrit.wikimedia.org/reviewer-counts.json supposed to
contain something now? It's still empty.

Looks like it broke. Probably a decent time to consider changing the path, as Ori suggested. Maybe /j/?

Chad or Christian: could one of you poke at this? Or do you know who might be able to?

Sorry MZMcBride, I cannot help here.
I had a look back when you said that it broke.
But the cron-job definition looked good to me.
Running the query by hand worked as well for me.
gerrit2 user has Access Database capability as well.

As I lack shell access to the box, I cannot look there directly and
I'll have to defer to Chad.

(I heard random ramblings about not all repos being garbage collected
properly. So some random speculation might be that the cronjob's ssh
connection fails? Maybe username mismatch between gerrit and gerrit2?
I doubt that it's the case, but it would explain both things. Meh.
Someone with shell+sudo access to the machine has to check.)

Daniel: when you have a minute, can you please take a look? I suspect some error is being reported to cron.

took a look at the cron on ytterbium

crontab -u gerrit2 -l

  • 1 * * * ssh -p 29418 localhost gerrit gsql --format JSON_SINGLE -c "'SELECT changes.change_id AS change_id, COUNT(DISTINCT patch_set_approvals.account_id) AS reviewer_count FROM changes LEFT JOIN patch_set_approvals ON (changes.change_id = patch_set_approvals.change_id) GROUP BY changes.change_id'" > /var/www/reviewer-counts.json

looked normal, ran it with sudo -u gerrit2, it created the file just fine. i chowned it to gerrit2.

try it now

http://gerrit.wikimedia.org/reviewer-counts.json

$ curl https://gerrit.wikimedia.org/reviewer-counts.json | json_pp -json_opt pretty | grep -c '"reviewer_count" : "0"'
973

But most are abandoned patches. :(

Mentioned in SAL (#wikimedia-operations) [2017-03-09T20:02:57Z] <mutante> cobalt: remove crontab entry of user gerrit2 that created reviewer counts, gzip /var/www/reviewer-counts.json and moved to /root/ for backup (re: gerrit:341592) T54329