Page MenuHomePhabricator

Gerrit REST API's changes module doesn't support offset
Closed, DeclinedPublic

Description

I'm trying to gather metadata about every Gerrit changeset.

I read https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html, which states "If the n query parameter is supplied and additional changes exist that match the query beyond the end, the last change object has a _more_changes: true JSON field set. Callers can resume a query with the n query parameter, supplying the last change’s _sortkey field as the value."

Here's what I tried:


$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=age:1second&n=500" | tail -20

  "project": "mediawiki/extensions/Polyglot",
  "branch": "master",
  "topic": "tidyup",
  "change_id": "I35ebc242fcf04e5b527631d6be67d1a8c78ef251",
  "subject": "Add method parameter documentation",
  "status": "NEW",
  "created": "2013-01-24 18:41:09.000000000",
  "updated": "2013-01-24 21:44:15.000000000",
  "_sortkey": "0022a6180000b1ff",
  "_number": 45567,
  "owner": {
    "name": "Reedy"
  },
  "labels": {
    "Verified": {},
    "Code-Review": {}
  },
  "_more_changes": true
}

]

$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=age:1second&n=0022a6180000b1ff"

"0022a6180000b1ff" is not a valid value for "-n"

I tried other URL parameters such as &sortkey= and &_sortkey= and &sortkey_after and &resume_sortkey, but nothing seems to work.

After discussing this issue with qchris in Gerrit on freenode, it seems that Gerrit's search functionality is broken (or perhaps restricted). qchris pointed to this (non-working) search example:


$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+sortkey_after:m"
)]}'

  • ---

It's unclear whether this issue has a corresponding bug in Gerrit's bug tracker. As it stands, it appears to be impossible to pull metadata of more than 500 changesets from the Gerrit REST API.

Without the ability to specify an offset (and consequently retrieve information about more than 500 changesets), I'm unable to generate Gerrit reports ([[mw:Gerrit/Reports]]). :-(


Version: wmf-deployment
Severity: normal
URL: https://gerrit-review.googlesource.com/#/c/42421/

Details

Reference
bz45090

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:34 AM
bzimport added projects: Gerrit, Upstream.
bzimport set Reference to bz45090.
bzimport added a subscriber: Unknown Object (MLST).

Some more lines from Gerrit, showing how to fetch the required
changes through the search query string.

09:54 <qchris> Susan: I screwed up before. It seems thinking is easier after

breakfast. sortkey is not the bug title but the change's sort key.
Stupid me.

09:54 <qchris> So what it comes down to is, that you could fetch all changes

like this:

09:54 <qchris> Fetch

https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+limit:3

09:55 <qchris> (For whatever value of status, project you are interested in. Limit

is just to get nice small file to look at by hand. You can drop that)

09:55 <qchris> Look for the _sortkey field of the last object in the result list
09:55 <qchris> And the fetch

https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+limit:3+sortkey_before:LAST_SORTKEY_OF_PREVIOUS_REQUEST

09:55 <qchris> Where LAST_SORTKEY_OF_PREVIOUS_REQUEST is the last sort key of

the previous request

09:55 <qchris> so something like 002327db0000c0f1

To also get the _more_changes field set, use the URL parameter limit instead
of the query parameter.

I pushed a change to correct the REST API documentation upstream
https://gerrit-review.googlesource.com/#/c/42421/

Thank you very much for your help, Christian.

I failed to realize that &n= is distinct from &N= in Gerrit's REST API.

I also failed to realize that "sortkey_after" and "sortkey_before" existed (they're documented at the bottom of https://gerrit.wikimedia.org/r/Documentation/user-search.html). It might be nice to mention these (or cross-reference them) at https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html.

My current understanding is that these are equivalent:

  • ?q=limit:[integer] and &n=[integer]
  • ?q=sortkey_after:[sortkey] and &N=[sortkey]
  • ?q=sortkey_before:[sortkey] and &P=[sortkey]

Marking this bug resolved/worksforme.

(In reply to comment #3)

My current understanding is that these are equivalent:

  • ?q=limit:[integer] and &n=[integer]

Yes, they are more or less equivalent. However, &n=[integer] provides
you with a "_more_changes" field, while ?q=limit:[integer] does not.

You can however circumvent this difference, by keeping the limiting
integer below your queryLimit while still asking for 1 more result
than needed.

In some edge cases, gerrit will even give you one more result than
your queryLimit allows for.

  • ?q=sortkey_after:[sortkey] and &N=[sortkey]
  • ?q=sortkey_before:[sortkey] and &P=[sortkey]

It's actually the other way round.

?q=sortkey_after corresponds to &P=
?q=sortkey_before corresponds to &N=
  • sortkey is increasing for new changes.
  • &N= is for the /N/ext page of search results (i.e.: older changes, lower sortkeys, hence sortkey_before)
  • &P= is for the /P/revious page of search results (i.e.: newer changes, higher sortkeys, hence sortkey_after)

As confusing as this is already, there are further
differences. ?q=sortkey_after skips the first search result. So when
comparing
https://gerrit.wikimedia.org/r/changes/?P=00232fbd0000bc17&n=3
https://gerrit.wikimedia.org/r/changes/?q=sortkey_after:00232fbd0000bc17&n=3
you'll get something like

sortkey        In q=sortkey... ?  In P=... ?
...               ...              ...

002330130000c1d5 no no
0023300f0000c1d8 no no
00232fff0000bc7b yes no
00232ff10000c1d6 yes yes
00232fec0000c1d3 yes yes
00232fd50000bf53 no yes
00232fbd0000bc17 <---- used sortkey

Additionally, if you supply a &n=[integer] parameter to limit the
number of results, the result set for a ?P= query has the
"_more_changes" key set on the first object, while a &sortkey_after=
query has it set on the last object.

(This result skipping, and shuffling around "_more_changes" does not
occur for ?q=sortkey_before or ?N= queries)

Bottom line: When trying to process the data automatically, I'd go for
using &n=[integer] to obtain "_more_changes" marker, but I would not
rely on getting at most [integer] results. Be prepared that there may
be one additional result in the result set. Furthermore, I'd go for the
&N=, and &P= variants, keeping in mind that the "_more_changes" need
not be at the end.