Page MenuHomePhabricator

"Volume of open changesets" graph should show reviews pending every month
Closed, ResolvedPublic

Description

I just realized that "Volume of open changesets" at http://korma.wmflabs.org/browser/gerrit_review_queue.html is not showing "How long is the Gerrit review queue over time?" -- that is, the amount of "Pending" and "Waiting for review" changesets every month.

What I deduce it is being shown is some kind of accumulation over time of the changesets *currently open*.

We should be able to see whether the queue is increasing or decreasing over time. Very much like in http://korma.wmflabs.org/browser/bugzilla_response_time.html we can see whether our response time decreases or not every month.


Version: unspecified
Severity: major
URL: http://korma.wmflabs.org/browser/gerrit_review_queue.html
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=66283

Details

Reference
bz70278

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:43 AM
bzimport set Reference to bz70278.

Volume of open changesets is showing the amount of changesets that were "Pending" and "Waiting for review" in a specific month.

For example:

  • Jun 2014: 862 pending, 326 waiting for review. At the end of Jun 2014 there were 826 reviews in open state, pending for review, and 326 of them were pending for reviewers.

This metric is implemented aggregating the number of pending reviews that remains pending at the end of each month plus the pending reviews inherit from previous months. In gerrit, pending reviews is a metric that can be aggregated.

So for example:

  • Jul 2013: 97 pending
  • Aug 2013: 122 pending (25 new pending issues in Aug, plus 97 pending from Jul 2013)

At the end of Aug 2014 there are 1594 and as expected, currently the pending reviews no is pretty similar. It is 1630 and you have it in:

http://korma.wmflabs.org/browser/scr.html

In order to compute this number we just get the total open reviews now.

So as a conclusion, this metric is showing the evolution in time of total pending reviews each month.

(In reply to Alvaro from comment #1)

This metric is implemented aggregating the number of pending reviews that
remains pending at the end of each month plus the pending reviews inherit
from previous months.

If I understand this correctly...

Why adding the inherited amount from the previous month? All we need to see is the total amount of changesets open at the end of the month, and how many of these are waiting for review.

Then repeat the exercise at the end of the next mont: how many are open in total, from which how many are waiting for review. If you can count how many are open now, it doesn't matter how many were open last month, there is no need to add these amounts, just reflect each amount in each month in the graph.

This way we can have a graph that goes up and down depending on how long is the queue every month. If we are always adding the previous months then we will have an ever-increasing graph, no matter of much we improve our rate of closing reviews.

Ok Quim, you are right.

Right now the metric is showing the backlog of the current open issues.

Let us think more about the metric you are proposing in order to be sure we are in sync with the metric definition and we will create a new metric showing that, "at the end of the month: how many are open in total, from which how many are waiting for review".

(In reply to Alvaro from comment #3)

Ok Quim, you are right.

Yes, you are looking for a related but different metric.

Taking a look to the current metric, you can infer the relative speed at which pending reviews are added, but it is not easy get the total speed for each month.

So we need to create this new metric.

The original name for this metric was "Number of pending submissions" and for us, the definition was the backlog of pending submissions. But it is clear we need a new metric to show what you are looking for.

We will implement it and as soon as it is available we will share with you in a testing page in the dashboard to check it.

Right now the metric is showing the backlog of the current open issues.

Let us think more about the metric you are proposing in order to be sure we
are in sync with the metric definition and we will create a new metric
showing that, "at the end of the month: how many are open in total, from
which how many are waiting for review".

I don't understand the confusion, and I don't want to make you work before assuring that we are oin the same page. "Number of pending submissions" is always the total number of open/waiting-for-review changesets NOW. We just check once a month because it would be very expensive to count on a daily basis.

I really see no point in adding what was open last month, because from those some will be still open, some will be now closed.

Our ideal goal is to have a queue of zero changesets waiting for review, and the purpose of that graph is to see whether we are getting closer or further from that goal.

After a carefully analysis of the metric we are showing in Volume of open changesets, the graph is showing the aggregation of the number of new pending issues that each month exists in Wikimedia gerrit.

Why the aggregation? To understand the total work pending.

If you think it is better not to show the aggregation, but just how many new pending reviews are added each month, it is just removing from the HTML the aggregation conversion param.

We can talk about all details or doubts in the next meeting. I am sorry not to be clear in my first comments in this ticket.

(In reply to Quim Gil from comment #2)

(In reply to Alvaro from comment #1)

This metric is implemented aggregating the number of pending reviews that
remains pending at the end of each month plus the pending reviews inherit
from previous months.

If I understand this correctly...

Why adding the inherited amount from the previous month?

To see how the total number of pending reviews is evolving.

All we need to see
is the total amount of changesets open at the end of the month, and how many
of these are waiting for review.

In order to get this number, we get the total amount of open and close at the end of the month, and the difference is the pending.

Then repeat the exercise at the end of the next mont: how many are open in
total, from which how many are waiting for review. If you can count how many
are open now, it doesn't matter how many were open last month, there is no
need to add these amounts, just reflect each amount in each month in the
graph.

In order to show that, we can just remove the aggregation and you will get just that.

This way we can have a graph that goes up and down depending on how long is
the queue every month. If we are always adding the previous months then we
will have an ever-increasing graph, no matter of much we improve our rate of
closing reviews.

At the end of a month if you have closed more reviews than opened new ones, the number of new pending reviews for this month will be negative, the total amount of pending issues will be lower and the current graph will decrease.

I still find your answer slightly confusing:

In order to get this number, we get the total amount of open and close at
the end of the month, and the difference is the pending.

Ok, if this is the case then the number is correct. But then, what does this have to do with adding the inherited amount of the past month?

The ultimate test:

  1. Forget all history in our Gerrit instance and just look at all the changesets we have now as if they were created during September 2014.
  1. Count how many changesets we have in total, from which how many have status:open, from which how many are waiting for review (no -1, no WIP).

If the numbers still are around the 1563 open - 933 waiting for review currently declared for August 2014, then the calculation is correct.

Sorry for being a skeptical pain in the ass with this metric, but we need to be absolutely sure about it. If it is true we will set an alarm because the trend is really alarming. The worst that could happen to the credibility of our metrics is that the data is found to be wrong after setting the alarm. Thank you for your understanding.

(In reply to Quim Gil from comment #8)

I still find your answer slightly confusing:

In order to get this number, we get the total amount of open and close at
the end of the month, and the difference is the pending.

Ok, if this is the case then the number is correct. But then, what does this
have to do with adding the inherited amount of the past month?

This is bad wording from my side.

In order to measure the total pending issues in a month, you need to add to the new pending issues for this month and all previous pending issues.

The ultimate test:

  1. Forget all history in our Gerrit instance and just look at all the

changesets we have now as if they were created during September 2014.

  1. Count how many changesets we have in total, from which how many have

status:open, from which how many are waiting for review (no -1, no WIP).

If the numbers still are around the 1563 open - 933 waiting for review
currently declared for August 2014, then the calculation is correct.

Sorry for being a skeptical pain in the ass with this metric, but we need to
be absolutely sure about it. If it is true we will set an alarm because the
trend is really alarming. The worst that could happen to the credibility of
our metrics is that the data is found to be wrong after setting the alarm.
Thank you for your understanding.

I understand perfectly what you say. We have reviewed in deep the metric and the final number is right (1646 total pending reviews now), but the evolution in time was not right because we were using as close date, the date in which the review was submitted. After fixing this issue, I have updated the graph and now the tendency is better. The queue is growing, but now is growing less. In some months like March 2014, you have closed more reviews than opened.

I need to recheck is WIP and -1 filters are been used here, because they were added to mediawiki new time measures, but this is a generic metric.

But the "core" of the problem is fixed. I will continue reporting on this ticket the rest of open issues (WIP and -1) and cross checking data.

(In reply to Quim Gil from comment #8)

I still find your answer slightly confusing:

In order to get this number, we get the total amount of open and close at
the end of the month, and the difference is the pending.

Ok, if this is the case then the number is correct. But then, what does this
have to do with adding the inherited amount of the past month?

Right now let's check this ultimate test:

The ultimate test:

  1. Forget all history in our Gerrit instance and just look at all the

changesets we have now as if they were created during September 2014.

  1. Count how many changesets we have in total, from which how many have

status:open, from which how many are waiting for review (no -1, no WIP).

With the fix in closed date and more updated data we have now in Aug 2014 1647 open reviews as the final point for the evolution of open reviews.

Taking a look to the current open reviews in:

http://korma.wmflabs.org/browser/scr.html

we have 1607. So during September we should have -40 open reviews.

Looking at the data in the JSON:

http://korma.wmflabs.org/browser/data/json/scr-evolutionary.json

if you look to pending time series:

"pending": [431, 111, 127, 141, 29, 136, 139, 72, 51, 74, -72, 69, 158, 80, 88, 13, -33]

we have -33.

So the difference is 7 reviews and the numbers are pretty similar. From my point of view this validate the metrics. We can look at this 7 reviews to understand the difference if really needed.

About reviews waiting for reviewer, we are checking it.

Ok, thank you for the double checking. The resolution of this bug is a mix of Invalid (the problem reported was not a problem) and Fixed (you found something wrong on the way). Let's call it Fixed. :)

About reviews waiting for reviewer, we are checking it ... we can not fix yet this ticket. As soon as this analysis is completed I will post results here.

Ok, after a review of the metric "pending reviews waiting for reviewer", we are doing the analysis for the backlog, so it is normal that the graph grows more in last months.

We will change this graph to a "evolution time graph" (as in other places) and then we can recheck the tendency of the graph.

The logic for detecting reviews pending for reviewer seems to be right after checking with some repos the results.

"pending reviews waiting for reviewer" (Waiting for Reviewer) is now implemented as a "evolution time metric" (not backlog anymore) and the results are now in:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

To be consistent in both metrics, I have renamed "pending" to "Waiting".

Hmmm, the last change from "pending" to "Waiting" I am not sure bout it.

Changed "pending" to "New changesets" using the same label than in the second graph.

Alright, the shape of the green line makes more sense now. At least now we see a steady increase instead of a sudden change of trend in the last three months.

Still, the total numbers look inconsistent with the values provided by other graphs in that page.

The blue line in the first graphs marks 1657 changesets, but the total sum of blue columns in the second graph is 1324. Why the difference of 333 changesets?

For the green we have 994 in the first graph, 718 in the second: a difference of 276 changesets. I went through the 211 repositories listed below, adding the value of "Waiting for review", and the total for August is 755.

Therefore, we have two metrics that point a value in the 700s, while the first graph says close to 1000. Which one is right?

(In reply to Quim Gil from comment #16)

Alright, the shape of the green line makes more sense now. At least now we
see a steady increase instead of a sudden change of trend in the last three
months.

Still, the total numbers look inconsistent with the values provided by other
graphs in that page.

The blue line in the first graphs marks 1657 changesets, but the total sum
of blue columns in the second graph is 1324. Why the difference of 333
changesets?

You are comparing time evolution with backlog time distribution. This two time series are the same at the final value, but during the past time, the time distribution will have more "new changesets".

time distribution new changesets in A = new backlog time distribution changesets in A + closed changesets between A and now.

In "closed changesets between A and now" if A = NOW, this value is zero and both numbers in NOW should be the same.

And doing this analysis:

"date": ["May 2013", "Jun 2013", "Jul 2013", "Aug 2013", "Sep 2013", "Oct 2013", "Nov 2013", "Dec 2013", "Jan 2014", "Feb 2014", "Mar 2014", "Apr 2014", "May 2014", "Jun 2014", "Jul 2014", "Aug 2014", "Sep 2014"]
"ReviewsWaiting_ts": [418, 546, 669, 798, 827, 998, 1113, 1191, 1233, 1286, 1258, 1316, 1455, 1565, 1670, 1657, 1571]
"new": [49, 21, 25, 24, 14, 46, 39, 33, 62, 76, 82, 64, 126, 161, 284, 267, 197]

where new is for backlog metric, and ReviewsWaiting_ts is time evolution time series, in Sep 2014 you have 1570 for new (adding all values for "new") and 1571 for ReviewsWaiting_ts (getting the last point).

It is also the same case for the waiting for reviewer metric.

We are not showing the Sep 2014 data because it is incomplete, as we decided in the past.

For the green we have 994 in the first graph, 718 in the second: a
difference of 276 changesets. I went through the 211 repositories listed
below, adding the value of "Waiting for review", and the total for August is

Therefore, we have two metrics that point a value in the 700s, while the
first graph says close to 1000. Which one is right?

(In reply to Alvaro from comment #17)

You are comparing time evolution with backlog time distribution. This two
time series are the same at the final value

Exactly, I'm just saying that in the graphs they are not.

In the first graph, the green line for August 2014 shows "Waiting for review: 994".

The second graph should show how these 994 are spread in time based on their submission date, from August 2014 backward, right?

08-14 230
07-14 208
06-14 57
05-14 60
04-14 33
03-14 20
02-14 39
01-14 19
12-13 13
11-13 15
10-13 7
09-13 3
08-13 8
07-13 5
06-13 1
05-13 12 (this date has no column but does have changesets!)

TOTAL 730

The data of September 2014 doesn't play any role here, because both data points are based in August 2014. I assume both metrics are counting the end of August 2014?

(In reply to Quim Gil from comment #18)

(In reply to Alvaro from comment #17)

You are comparing time evolution with backlog time distribution. This two
time series are the same at the final value

Exactly, I'm just saying that in the graphs they are not.

In the first graph, the green line for August 2014 shows "Waiting for
review: 994".

The second graph should show how these 994 are spread in time based on their
submission date, from August 2014 backward, right?

08-14 230
07-14 208
06-14 57
05-14 60
04-14 33
03-14 20
02-14 39
01-14 19
12-13 13
11-13 15
10-13 7
09-13 3
08-13 8
07-13 5
06-13 1
05-13 12 (this date has no column but does have changesets!)

TOTAL 730

The data of September 2014 doesn't play any role here, because both data
points are based in August 2014. I assume both metrics are counting the end
of August 2014?

Yes both are counting end of August 2014, but one of them is showing the number of reviews in August 2014 from the current open reviews, and the other is showing the number of open reviews in 2014 (from which some of them could be already closed now).

Quim, Jesus has sent you a proposal for the definition for the metrics we are working for. Focusing in the "Waiting for review", the way to detect it is to find the open reviews that are not with a "-1" a no patches uploaded after that "-1". In order to validate the current queries we need to close the definition on that and be sure the code is implementing it.

We are working with this proposal definition right now.

With the proposal metrics definition we have updated the Waiting for review metric in korma. You can check the new graph in:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

There are now more reviews waiting for reviewer than before with the simplified condition we have used: if last change fore a review is Code-Review -1 then the review is not waiting for reviewer. In any other case, it is.

We are working in approaches to improve this method and probably, we are counting some reviews as waiting for reviewer that should not be counted. Consider the current line as a max reviews waiting for reviewer.

I will reply later today about the new definition, but let me comment here asap:

(In reply to Quim Gil from comment #22)

I will reply later today about the new definition, but let me comment here
asap:

  • We had agreed that "WIP" patches shouldn't be counted i.e.

https://gerrit.wikimedia.org/r/#/c/134782/ "WIP: Have GettingStarted depend
on CentralAuth"

Ok, we will remove all current "WIP" reviews for waiting for reviewer. Reviews that are in "WIP" now but not in the past, will be count as "WIP" for all the history. Reviews that are not in "WIP" now but they were in the past in "WIP", will be count as no "WIP" for all the history. We can improve later if needed.

  • -2 in open changesets shouldn't be counted either, i.e.

https://gerrit.wikimedia.org/r/#/c/158632/ "Mattflaschen 4:14 AM Patch Set
8: Code-Review-2"

Ok, we will include this condition also.

I have fine tuned the description proposed at https://www.mediawiki.org/wiki/Community_metrics#Backlog_of_open_changesets_.28monthly_snapshots.29 -- see the history of the page for the specific changes.

The string proposed for the title of the chart and the blue/green lines are good and can be implemented. The descriptions can be used as help clicking "?".

I think the safest parameters to count the "Waiting for review" (green line) is:

Status == "Review in Progress" AND
NOT "WIP" AND
Code-Review AND Verified == (blank OR +1)

The "blank OR +1" is a better filter than the opposite, because Gerrit can have -1, -2, and even +2 but "Can Merge No" for some reason external to the Gerrit change.

Ok, right now what it is implemented is

NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1

and this is what you have now in production.

All the data is checked so the first green line graph and the sencond green bars graph have coherent data.

We will think about your last filtering proposal and compare with the current one. For implementing filtering we should review the fields because some fields are easy to use in SQL, but other require more work.

(In reply to Quim Gil from comment #24)

I have fine tuned the description proposed at
https://www.mediawiki.org/wiki/Community_metrics#Backlog_of_open_changesets_.
28monthly_snapshots.29 -- see the history of the page for the specific
changes.

The string proposed for the title of the chart and the blue/green lines are
good and can be implemented. The descriptions can be used as help clicking
"?".

I think the safest parameters to count the "Waiting for review" (green line)
is:

Status == "Review in Progress" AND
NOT "WIP" AND
Code-Review AND Verified == (blank OR +1)

This Code-Review condition is missing something?

The "blank OR +1" is a better filter than the opposite, because Gerrit can
have -1, -2, and even +2 but "Can Merge No" for some reason external to the
Gerrit change.

(In reply to Alvaro from comment #25)

Ok, right now what it is implemented is

NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1

and this is what you have now in production.

Note that apart from Code-Review (human evaluation) there is Verified (Jenkins). -1 (if it exists here, not sure) and -2 in Verified should be discarded in "Waiting for review".

See for instance https://gerrit.wikimedia.org/r/#/c/22167/ (no Code-Review, but -2 from Jenkins in "Verified").

All the rest in this chart looks good!

Question: where do you store these MySQL queries? I can only see the commits from the Owl Bot at https://github.com/Bitergia/mediawiki-dashboard/commits/master but I don't where does the bot get the data from.

(In reply to Quim Gil from comment #27)

(In reply to Alvaro from comment #25)

Ok, right now what it is implemented is

NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1

and this is what you have now in production.

Note that apart from Code-Review (human evaluation) there is Verified
(Jenkins). -1 (if it exists here, not sure) and -2 in Verified should be
discarded in "Waiting for review".

See for instance https://gerrit.wikimedia.org/r/#/c/22167/ (no Code-Review,
but -2 from Jenkins in "Verified").

Ok, we can add this new events to try to include them in the SQL queries.

All the rest in this chart looks good!

Great!

Question: where do you store these MySQL queries? I can only see the commits
from the Owl Bot at
https://github.com/Bitergia/mediawiki-dashboard/commits/master but I don't
where does the bot get the data from.

What you are seeing in this repository is the activity directly in the dashboard (HTML+CSS+JS+JSON metrics), not the code used to generate the metrics (in JSON format).

This code is in GrimoireLib and we have a branch for Wikimedia that is synced with master (main product) frequently.

In order to see this SQL the last commits are:

https://github.com/VizGrimoire/GrimoireLib/commit/8fe792fcb560a597df0d5f239e581eeeb89d3f5c

and we are modifying SCR (gerrit) metrics that are included from scr_metrics.py. In this commits you can see how the filters for "-2" and not WIP has been added.

Quim, we plan to add the Verified -1 and -2 filtering today but the generation of the new metrics won't be available until this night.

As soon as it it available, I will inform you.

Qgil lowered the priority of this task from High to Low.Jan 8 2015, 11:25 AM
Qgil removed Acs as the assignee of this task.Feb 5 2015, 10:50 AM
Qgil added a subscriber: Dicortazar.
Qgil raised the priority of this task from Low to Medium.Mar 13 2015, 1:27 PM
In T72278#745488, @Acs wrote:

Quim, we plan to add the Verified -1 and -2 filtering today but the generation of the new metrics won't be available until this night.

As soon as it it available, I will inform you.

@Dicortazar, what we need to know here is whether the Verified -1 and -2 filter has been implemented or not.

After checking the code, I'd say that the case of Verified = -1 and Verified = -2 has not been implemented at least in the metrics depicted in the first four charts of that page.

I may have missed something, so probably @Acs can confirm this part.

I've checked code under the mediawiki branch, specifically in:

I guess this should be done during the ECT-April.

After checking the code, I'd say that the case of Verified = -1 and Verified = -2 has not been implemented at least in the metrics depicted in the first four charts of that page.

I may have missed something, so probably @Acs can confirm this part.

I've checked code under the mediawiki branch, specifically in:

It seems this filter is not available anymore in the current code, I am not sure how it was implemented in the past. The best approach is to implement it in the current code base (master branch already).

Aklapper subscribed.

According to Dicortazar in our meeting this needs some more discussion with @Qgil.
Also wondering whether the Patch-To-Review project is correct here or whether it should be removed?

According to Dicortazar in our meeting this needs some more discussion with @Qgil.

I'm all ears / eyes. :)

Quim, taking a look to current korma KPI:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

the first graph title is: "Backlog of open changesets (monthly snapshots)". I think this is not correct. In this graph we have the time evolution of open changesets. Is there any reason for this title?

About the goal of this ticket, the pending tasks is to add then filter: ... AND Verified == (blank OR +1). Is that correct? (A better filter that the original one: "Volume of open changesets" not filtered by verified -1 and verified -2)

In T72278#1378848, @Acs wrote:

Quim, taking a look to current korma KPI:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

the first graph title is: "Backlog of open changesets (monthly snapshots)". I think this is not correct. In this graph we have the time evolution of open changesets. Is there any reason for this title?

This is the title and the concept we agreed when we were polishing this page. At the end of every month, we capture the total amount of open changesets, and from those, how many are waiting for review. We called them "monthly snapshots" to make sure that the numbers of every past month would stay unchanged.

Before, the numbers of previous months would change because the graph captured some kind of historical projection of the changesets open in the last month.

About the goal of this ticket, the pending tasks is to add then filter: ... AND Verified == (blank OR +1). Is that correct? (A better filter that the original one: "Volume of open changesets" not filtered by verified -1 and verified -2)

Mmm ok, yes, this is better.

In T72278#1378848, @Acs wrote:

Quim, taking a look to current korma KPI:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

the first graph title is: "Backlog of open changesets (monthly snapshots)". I think this is not correct. In this graph we have the time evolution of open changesets. Is there any reason for this title?

This is the title and the concept we agreed when we were polishing this page. At the end of every month, we capture the total amount of open changesets, and from those, how many are waiting for review. We called them "monthly snapshots" to make sure that the numbers of every past month would stay unchanged.

Yes, the metric you are describing is what it is implement, but in my mind it is "evolution in time of open changestes". In my mind the backlog refers to the analysis in time of the changesets that are open now. But if we decided in the paste to use this name, and you are comfortable with it, it's ok.

Before, the numbers of previous months would change because the graph captured some kind of historical projection of the changesets open in the last month.

About the goal of this ticket, the pending tasks is to add then filter: ... AND Verified == (blank OR +1). Is that correct? (A better filter that the original one: "Volume of open changesets" not filtered by verified -1 and verified -2)

Mmm ok, yes, this is better.

Great, so we will implement that then!

After reviewing the code I have seem that we have already implemented this filter but using the Volume of open changesets" not filtered by verified -1 and verified -2 approach.

It is in:

https://github.com/VizGrimoire/GrimoireLib/blob/master/vizgrimoire/metrics/query_builder.py#L2235

It is simpler to implement it using this approach. Quim, is it enough?

@Acs, is it already in production? Can we close this ticket?

@Acs, is it already in production? Can we close this ticket?

Yes, in production for several months I think.

Let's close this ticket :).

Thanks @Acs and @Qgil!.

Dicortazar moved this task from Backlog to Doing on the ECT-June-2015 board.