Page MenuHomePhabricator

Key performance indicator: analyze who contributes code
Closed, ResolvedPublic

Description

We need metrics to answer these questions.

Who is contributing merged code each quarter? How is the weight of the WMF evolving? What regions have a higher density of contributors? The evolution of the total amount of merged commits should be visible too.

The work is being done at
https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=37463

Details

Reference
bz53485

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:58 AM
bzimport set Reference to bz53485.

Bringing back the Highest priority since this is indeed a top task in our backlog.

There are a couple of highest issues that we want to fix before tackling this one.

Is it ok to use Gerrit (vs Git) data to answer this question?

commits = merged submissions

In

http://korma.wmflabs.org/browser/scr.html

we are close to cover this "who contributes code" KPI.

We need to add the metrics computed in

https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code

We need the mapping between people and companies and countries, something covered in

https://bugzilla.wikimedia.org/show_bug.cgi?id=53489

and reports will be created for both as it exists now for repositories:

http://korma.wmflabs.org/browser/scr-repos.html

(In reply to comment #5)

Is it ok to use Gerrit (vs Git) data to answer this question?

Yes, Bug 53489 is a blocker of this one. We are talking about the same thing.

fyi I have included the delivery of this KPI to my monhly goals for October:

https://www.mediawiki.org/wiki/Engineering_Community_Team/Meetings#Monthly_goals

We are organizing an Engineering Community Team Showcase on Oct 29 via videoconference and it would be great to demo this KPI. Álvaro, all the better if you want to do it since you are the main developer working on this.

Let's clarify the vocabulary to be used in metrics reports:

Authors: anybody contributing a patch, regardless of it being merged or not.

Committers: the subset of authors extracted from the code that has been committed to the repository. (Alvaro, you see to call these "mergers", which I find confusing.)

Reviewer: anybody commenting or evaluating a changeset in Gerrit (-2 to +2)

Merger: anybody able to exercise +2 permissions, merging proposed changesets to the repository.

Do you agree on this or is there a better way to define there roles?

"Mergers" right now are not committers. We are using committers in the git analysis but not in gerrit one. In gerrit we call mergers to contributors whose patch has being merged.

We are using authors for the people who create the contributions committed in git. And committers for the people who commit the contributions from authors. Sometime an author an a committer could be the same guy if the author has commit rights.

So thinking in a map between git and gerrit:

  • Author -> Merger
  • Committer -> Reviewer?
  • -> Contributor (git does not have knowledge about contributions not committed)

Author, committer and reviewer are clear terms. "Merger" is confusing.

As far as I know it's not a term used to describe a person, even less contributors whose patch has been merged. If anything, it sounds as contributors with +2 permissions in Gerrit that merge code from others.

But we really need to clean this terminology:

http://korma.wmflabs.org/browser/scr.html mentions "top openers" (what is this?) and "top mergers".

Clicking a contributor e.g.

http://korma.wmflabs.org/browser/people.html?id=278&name=Leslie%20Carr mentions

you see "commits" and "closed reviews", and none of the values match.

I would start by renaming "mergers" by "authors of merged code". We don't need to force one word only if that word doesn't exist. It is more important to have self-explanatory labels in metrics.

Alvaro, how do you think you are doing with this KPI? As of today none of the questions of Comment #0 is answered yet. Do you think this will be ready by the ECT Showcase meeting next week?

Taking a look to the original questions:

  • Who is contributing merged code each quarter?

We know who is contributing code and we can create SQL queries for this specific case. Right now we are showing it aggregated in:

http://korma.wmflabs.org/browser/scr.html

We have also the top contributors.

If we want the data group by quarter the best thing right now is to create SQL queries and get the data in text lists. Is it ok Quim?

Later, we can find the best way to viz it. Probably just with tables.

  • How is the weight of the WMF evolving?

This is something answered in the organizations report:

http://korma.wmflabs.org/browser/scr-companies.html

But we can not see how it is evolving. Maybe it is just a matter to do the SQL analysis for:

2000, 2005, 2010, 2013.

  • What regions have a higher density of contributors?

We don't have regions information yet, so it is not possible to do it yet.

  • The evolution of the total amount of merged commits should be visible too.

This is in:

http://korma.wmflabs.org/browser/scr.html

So from the 4 questions, the final one is answered and I can create queries to answer the other two. I will include them in a report.

You are right Quim, we are putting a lot of effort working in the web browser but this questions was not answered yet!

Ok, then

  • Who is contributing merged code each quarter?

What about this:

  • List of all-time top 100 individual "mergers" (as they are called now). Currently only 10 are shown.
  • List of top 25 individual "mergers" of each quarter.
  • List of all-time organizations, all of them.
  • List of organizations of each quarter.
  • How is the weight of the WMF evolving?

analysis for:

2000, 2005, 2010, 2013.

Why not adding the % of commits of each organization in each quarter, in the same "List of all-time organizations" and "List of organizations of each quarter"?

Ok, we can do that way. I am working now in SQL queries for getting all these data and we can see later how to present it. Maybe new HTML pages for Top Contributors and Top Orgs including this data.

Quim, I have already the queries to get this information. Now we need to format it a report. You have always the last SQL database in:

http://korma.wmflabs.org/browser/data/db/

In this case, we need the reviews database so:

acs@lenovix:/tmp$ wget http://korma.wmflabs.org/browser/data/db/reviews.mysql.7z
acs@lenovix:/tmp$ 7zr x reviews.mysql.7z
acs@lenovix:/tmp$ mysqladmin -u root create wikimedia_gerrit
acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit < reviews.mysql

And now time to play SQL:

  • Who is contributing merged code each quarter?

acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit
SELECT total, name, email, quarter, year
FROM
(SELECT COUNT(i.id) AS total, upeople_id, p.id, name, email, user_id, QUARTER(submitted_on) as quarter, YEAR(submitted_on) year

FROM issues i, people p , people_upeople pup 
WHERE i.submitted_by=p.id AND pup.people_id=p.id AND status='merged' 
GROUP BY upeople_id,year,quarter ORDER BY year,quarter,total DESC

) t
WHERE total>50;

With this query you get a list ordered in time for all quarters of contributors that have merged more than 50 contributions.

We can create queries, one for each quarter, in order to get the 25 top "mergers". But with just one query we can get the full picture.

And for organizations it is pretty similar. But in this case with this query we get just 60 rows so it is usable without working in specific queries for each quarter.

SELECT COUNT(i.id) AS total, c.name, QUARTER(submitted_on) as quarter, YEAR(submitted_on) year
FROM issues i, people p , people_upeople pup,

acs_cvsanaly_mediawiki_2029_1.upeople_companies upc, acs_cvsanaly_mediawiki_2029_1.companies c

WHERE i.submitted_by=p.id AND pup.people_id=p.id
AND pup.upeople_id = upc.upeople_id AND upc.company_id = c.id
AND status='merged'
GROUP BY year, quarter, c.id ORDER BY year, quarter, total DESC

Quim, I have attached to this issue the people-quarters.txt and orgs-quarters.csv with the results of this queries.

With this results we can generate HTML tables.

(In reply to comment #14)

Quim, I have already the queries to get this information. Now we need to
format
it a report. You have always the last SQL database in:

http://korma.wmflabs.org/browser/data/db/

In this case, we need the reviews database so:

acs@lenovix:/tmp$ wget
http://korma.wmflabs.org/browser/data/db/reviews.mysql.7z
acs@lenovix:/tmp$ 7zr x reviews.mysql.7z
acs@lenovix:/tmp$ mysqladmin -u root create wikimedia_gerrit
acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit < reviews.mysql

You will need also the unique identities db:

acs@lenovix:/tmp$ wget http://korma.wmflabs.org/browser/data/db/source_code.mysql.7z
acs@lenovix:/tmp$ 7zr x source_code.mysql.7z
acs@lenovix:/tmp$ mysqladmin -u root create acs_cvsanaly_mediawiki_2029_1
acs@lenovix:/tmp$ mysql -u root acs_cvsanaly_mediawiki_2029_1 < source_code.mysql

Report for Wikimedia contributors per quarter

Attached:

Report for Wikimedia orgs contributors per quarter

Attached:

Ok, this is a start. Thank you!

Why the data starts in 2011 only? Is it because we are only looking at code merged via Gerrit?

If we start with tables then the format we probably want is:

Columns: All-time | 4Q 2013 | 3Q 2013 | 2Q 2013 | ...

Starting with the most recent quarter and going backward to avoid requiring horizontal scrolling to check the most recent data. If we have a graph then we can find better solutions e.g. showing the tables for all-time and 4 quarters only, and then providig the rest of data only through mouseover on the graph.

Rows: #position. Contributor - org #commits (%total).

For instance:

  1. Leslie Carr (WMF) 1234 (12.7%)
  1. Wikimedia Foundation 12345 (72.1%)

(In reply to comment #13)

we can see later how to present it. Maybe new HTML pages for Top
Contributors and Top Orgs including this data.

Why not having everything at http://korma.wmflabs.org/browser/scr.html ?

Ok, we can try it. Include here the contributors (people and orgs) using the above criteria. First step is to include the Top100 lists with the search field. And then, we can think in a GUI for filterting using quarters. Your proposed solution or maybe a list selector with all the quarters, and once you select one quarter, all the page contents should be adjusted to this quarter.

This is an idea we have being playing with: all the contents in the browser should have the possibility of being filtered by dates. But you can not compare easily quarters in the same page. You can open two tabs but it is not ideal.

Ok, time to work on it. As soon as I get results, I will share with you.

(In reply to comment #18)

Ok, this is a start. Thank you!

Why the data starts in 2011 only? Is it because we are only looking at code
merged via Gerrit?

Yes, as you can see in

http://korma.wmflabs.org/browser/scr.html

SCR activity starts on Sep 2011.

But since the question is "Who contributes code", and in the SVN/Git history we have also the email addresses of the contributors, could we answer the question properly since the development started?

Not for next week, but at some point?

Yes, at some point we should mix SCR and SCM information. But right now we decided to focus SCR as the data source for contributions analysis. Right now we have also the global view for SCM that helps understanding this, but the details are now for SCR.

I will focus in having all right for SCR and then, move effort to SCM and understand how to join the information from the two data sources.

SCM (Source Code Management): git
SCR (Source Code Review): gerrit

Is that ok? Maybe we can open a new issue for it so we can move on closing tickets.

(In reply to comment #23)

SCR (Source Code Review): gerrit

Is that ok? Maybe we can open a new issue for it so we can move on closing
tickets.

Ok, let's nail down first this KPI in the context of Gerrit data. Once we resolve this task as FIXED we will figure out what is more important, adding the SVN history or moving on to the next KPI.

SVN history ... git history now :) SVN analysis is more ... ellaborated. It is pretty good to have git instead. Good weekend Quim!

Quim, after some development, first results:

http://korma.wmflabs.org/browser/contributors.html

You can see total contributors and companies lists and companies quarters reports.

In order to include in the HTML page a quarter report it is pretty easy:

<div class="Contribs" data-type="companies" data-search="false"
data-quarter="2012 1"></div>

It is the same for companies and people (and other items like countries in the future).

It is just a work in progress but the basics (data and javascript logic to process it) is done.

Next week we can continue advancing in quarters reports to make them more useful.

(In reply to comment #26)

http://korma.wmflabs.org/browser/contributors.html

Good!

Please follow the format proposed at Comment #18, or propose a better format. :)

An "All-time" header for the corresponding lists is needed.

About the orgs by quarter, as suggested it is better to start by the most recent quarter complete (2Q 2013) and then list the rest backward.

It is not clear where the lists of individuals for quarter will fit. At least there is space for the last complete quarter next to the all-time list.

Álvaro, we will demo http://korma.wmflabs.org/browser/contributors.html in exactly 24 hours.

That page still needs some work on low-hanging fruits. Did you see my pull requests in GitHub? If I did something wrong please comment on them. Otherwise why not taking them. :)

This is my next task, taking a look to pull requests and pushing them. I hope I can do it next two hours!!!

Ups, contributors.html is in early beta state but ok, let's try to put it a better dress.

Quim, you have all your changes now in:

http://korma.wmflabs.org/browser/contributors.html

I have tried to give you credit for the changes but I have changed also contributors.html so no automatic merge is possible and during the manual process, your commits do not appear. Sorry about that.

Status of this report:

(In reply to comment #0)

Who is contributing merged code each quarter?

Answered at http://korma.wmflabs.org/browser/contributors.html

How is the weight of the WMF evolving?

Pending. This will be answered by a graph like http://activity.openstack.org/dash/newbrowser/browser/scm-companies-summary.html

What regions have a higher density of contributors?

This is being addressed at Bug 55626

The evolution of the total amount of merged commits should be visible too.

Mmm... We have total amount of commits merged at http://korma.wmflabs.org/browser/scm.html , and we have graphs showing commits per month. Do we have a graph showing the total amount of commits? Do we need it, actually?

(In reply to comment #31)

Status of this report:

(In reply to comment #0)

Who is contributing merged code each quarter?

Answered at http://korma.wmflabs.org/browser/contributors.html

This graph is obviously not excluding self-merges (there's even L10n-bot), it's not particularly useful.

L10n-bot channels the contributions from the translatewiki.net community. Are these contributions we want to count? If not, why not?

Are there other self-merges? Should be excluded?

(In reply to comment #33)

L10n-bot channels the contributions from the translatewiki.net community. Are
these contributions we want to count? If not, why not?

Certainly, but they are a different type of contributor and form a (mostly) separate community, so are probably best analyzed separately. Additionally from what I understand l10n-bot runs at regular intervals copying over all translations made in that interval. A commit by l10n-bot might represent a single contribution by a single translatewiki user, or it might represent the contribution of a hundred translatewiki users. Thus graphing l10n-bot commits tells us nothing about the translatewiki community.

(In reply to comment #33)

Are there other self-merges?

Several of those I see there look like self-merges, I've not run my own stats on it.

Should be excluded?

Self-merges are not code review: they are either a routine practice on the repos where this is accepted (so they only measure commit activity there) or something discouraged where self-merges should generally not exist (like on core, where you don't want to "credit" them).

Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If you find more users that should be removed please report them in that bug report. Thank you!

(In reply to comment #36)

Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If
you
find more users that should be removed please report them in that bug report.
Thank you!

How about self-merges by human users?

I confess not knowing anything about self-merges. Are they still code contributions? Then they should be counted in our metrics about code contributions.

About the "time to respond" metrics ok, if they are instant merges they are deforming the picture and we need to address this. Questions:

Won't be they be discarded (or minimized) by the calculation of the median, instead of plain average?

If they still distort data when calculating the median, I guess be can just remove all self-merges automatically. Are they marked in the database as such? Otherwise I guess we rely on other facts, like they are only reviewed by Jenkins, or they are resolved in less than x seconds (?).

(In reply to comment #38)

I confess not knowing anything about self-merges.

"+2 is for code review, not merging your own stuff" ([[mw:Gerrit/+2]]). Self-merges i.e. +2 on own commits are not "real" +2.

Guys, the evolution in time and global of merged reviews by company:

http://korma.wmflabs.org/browser/scr-companies-summary.html

It answers "How is the weight of the WMF evolving?".

(In reply to comment #36)

Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If
you
find more users that should be removed please report them in that bug report.
Thank you!

I will remove L10n-bot in next iteration.

How is the weight of the WMF evolving?

Pending. This will be answered by a graph like
http://activity.openstack.org/dash/newbrowser/browser/scm-companies-summary.
html

http://korma.wmflabs.org/browser/scr-companies-summary.html

Sorry about not using the thread before!

(In reply to comment #38)

I confess not knowing anything about self-merges. Are they still code
contributions? Then they should be counted in our metrics about code
contributions.

About the "time to respond" metrics ok, if they are instant merges they are
deforming the picture and we need to address this. Questions:

Which are "instant merges"? self-merges? We should check it.

Won't be they be discarded (or minimized) by the calculation of the median,
instead of plain average?

The median is better in any case for time to review. We should change to it.

If they still distort data when calculating the median, I guess be can just
remove all self-merges automatically. Are they marked in the database as
such?

I think we can just check that the submitter and the +2 reviewer is the same person.

Otherwise I guess we rely on other facts, like they are only reviewed by
Jenkins, or they are resolved in less than x seconds (?).

Nothing is reviewed by jenkins. Things get verified by jenkins, and things get submitted by jenkins (sometimes the word merged is used for this act of submitting, which is technically correct but different from how we use this word causually), but nothing is reviewed by jenkins.

(In reply to comment #40)

Guys, the evolution in time and global of merged reviews by company:

http://korma.wmflabs.org/browser/scr-companies-summary.html

It answers "How is the weight of the WMF evolving?".

Interesting! Just a couple of formal details:

  • The graph starts in January 2002 but there is no data before August 2011. Most of the graph is empty, which is not very useful. Please sync graph and data. By the way, the same happens in other Code Review graphs.
  • Mouseover brings data on top of the graph, and currently on top of your mouse. Not very usable.
  • The legend overlaps the graph.

The types of problems are quite common in our dashboard. We should check them at least whenever we publish new graphs. Ideally Grimoire would prevent this problems by design. Should I file separate bugs?

(In reply to comment #43)

About the "time to respond" metrics ok, if they are instant merges they are
deforming the picture and we need to address this. Questions:

Which are "instant merges"? self-merges? We should check it.

"Instant merges" are not defined, but fastest merges are usually either self-merges or consequence of team work (typically staff). After excluding self-merges (where owner and +2'er are the same person) the graph will get meaningful; median is ok, you could also add e.g. 95th percentile to see how many super-fast merges we have in case we're missing something.

(In reply to comment #45)

(In reply to comment #40)

Guys, the evolution in time and global of merged reviews by company:

http://korma.wmflabs.org/browser/scr-companies-summary.html

It answers "How is the weight of the WMF evolving?".

Interesting! Just a couple of formal details:

  • The graph starts in January 2002 but there is no data before August 2011.

Most of the graph is empty, which is not very useful. Please sync graph and
data. By the way, the same happens in other Code Review graphs.

Ok, we have a param in the widgets that cut the non-data series.

  • Mouseover brings data on top of the graph, and currently on top of your

mouse. Not very usable.

I will play a bit with it.

  • The legend overlaps the graph.

I will play a bit with it.

The types of problems are quite common in our dashboard. We should check them
at least whenever we publish new graphs. Ideally Grimoire would prevent this
problems by design. Should I file separate bugs?

We can solve this issues in this ticket. We are migrating to the new product version for the dashboard next weeks, so it is better to improve globally this kind of things there.

(In reply to comment #45)

(In reply to comment #40)

Guys, the evolution in time and global of merged reviews by company:

http://korma.wmflabs.org/browser/scr-companies-summary.html

It answers "How is the weight of the WMF evolving?".

Interesting! Just a couple of formal details:

http://korma.wmflabs.org/browser/scr-companies-summary.html

The new viz includes your suggestions.

Also, we have updated to the last VizJS-lib version.

[mildly off topic suggestions] It might also be cool to further split up WMF (since its such a big contributor) into departments (Features, Platform, Ops, Other). It would also be interesting to see which groups are being reviewed by which other groups (e.g. Do WMF mostly review other WMFers code or does everyone's code get reviewed evenly)

(In reply to comment #49)

It would also be interesting to see which groups are being reviewed
by
which other groups (e.g. Do WMF mostly review other WMFers code or does
everyone's code get reviewed evenly)

This would also be indirectly addressed by the "Time to review" metric (which AFAIK is among the defined KPI?), if that was also split by org. Though of course it's possible that volunteers review volunteers and staffers review staffers but this doesn't affect time, in practice the unreviewed mediawiki.* commits are always in the range of 70-80 % non-WMF ownership, although volunteers have way less than 70-80 % of all commits.

(In reply to comment #49)

[mildly off topic suggestions] It might also be cool to further split up WMF
(since its such a big contributor) into departments (Features, Platform, Ops,
Other).

Interesting idea but out of scope in this bug / round. It might be interested, although I'm not sure how much. In any case this would put more pressure on reliable user data. Therefore, I see it as an improvement blocked by

Bug 58585 - Allow contributors to update their own details in tech metrics directly

It would also be interesting to see which groups are being reviewed
by which other groups (e.g. Do WMF mostly review other WMFers code or does
everyone's code get reviewed evenly)

See Bug 37463 - Key performance indicator: Gerrit review queue + dependent reports.

(In reply to comment #32)

This graph is obviously not excluding self-merges (there's even L10n-bot),
it's not particularly useful.

Let's see. When it comes to code MERGED we have two options in relation to bots:

  1. Remove all data from all identified bots altogether. Meaning that, in practical terms, their commits don't exist in our tech metrics.
  1. Remove bots from rankings in order to "promote" the actual humans and coding tasks. However, their data still counts in the totals.

What should we do with L10n-bot? Do these i18n string commits count as code contributions or not?

We are in the final sprint:

http://korma.wmflabs.org/browser/who_contributes_code.html

Points for Alvaro:

Blockers

  • The search boxes don't seem to work. What result is expected if I search "Quim"? Do we need two search boxes?
  • Lists of people: let's have all-time and last quarter, instead of the two previous quarters.
  • Just checking: "Siebrand 419" means that Siebrand is the author of 419 patches that have been merged to key Wikimedia projects in the last quarter, right?
  • Based on comment 34, let's not compute translatewiki.net data in this KPI.
  • In "What regions have a higher density of contributors?", "Unknown" should be added, as requested in Bug 55626

Non-blockers

  • "How is the weight of the WMF evolving?" starts on March 2013, while "The evolution of the total amount of merged commits" starts on September 2011. Is there a reason to have different starting points? If you ask me, the longer history we can display the better...
  • In fact, "How is the weight of the WMF evolving?" already shows implicitly the amount of commits merged every month. Could we simply add the total amounts and be done with one graph?
  • I still think that adding another graph like "How is the weight of the WMF evolving?" but based on % would be useful to see clearly the trends of each organization (if any).

There is more work to be done in the descriptions and the organization of the page, but we can do this directly through pull requests at https://github.com/Bitergia/mediawiki-dashboard/blob/master/browser/who_contributes_code.html

(In reply to comment #53)

We are in the final sprint:

http://korma.wmflabs.org/browser/who_contributes_code.html

Points for Alvaro:

Blockers

  • The search boxes don't seem to work. What result is expected if I search

"Quim"? Do we need two search boxes?

Fixed search box, and activated only for the long all contributors list.

  • Lists of people: let's have all-time and last quarter, instead of the two

previous quarters.

Done

  • Just checking: "Siebrand 419" means that Siebrand is the author of 419

patches that have been merged to key Wikimedia projects in the last quarter,
right?

Right!

  • Based on comment 34, let's not compute translatewiki.net data in this KPI.

We have filtered it out for contributors but ... we should filter it out also for orgs?

  • In "What regions have a higher density of contributors?", "Unknown" should

be
added, as requested in Bug 55626

Done!

In a second look I have realized that the graphs at http://korma.wmflabs.org/browser/scr-countries.html count reviews. I think it would make more sense that they would count authors.

I mean, when it comes to organizations it does make sense to see which organization is funding how much work, and it is good to count that work in reviews. However, our interest in the location of contributors is based on the people, less on the amount of reviews.

In the case of our community it is clear that most reviews come from USA and Germany (when the devs fills their data) because this is where most WMF and WMDE (professional, full time) developers are located. Still, if there are a dozen of developers with just a bunch of commits in some other country we definitely want to know. In this case, 10 developers with 5 merged commits each has more relevance than a single developer with 50 commits.

Conclusion: it would be good to have the data based on authors. If you want to keep the current graphs that is fine too.

When it comes to http://korma.wmflabs.org/browser/who_contributes_code.html , we will swap "Submitted per country (aggregated)" for the graph by people as soon as it is available. But this is not a blocker for the KPI anymore, as agreed.

(In reply to comment #55)

Sorry, that comment belongs to bug 55626

Can we get some KPIs on the smaller stuff ?
How about:

What repositories get the least contribs/reviews
Who contributes/reviews most of the code that no on else contributes/reviews (who takes care of the orphan projects)
Who makes the most changes to the fields of a ticket (who does triage?)
Which repo's have the longest time between first and last patch
Which repo's have the longest time between first/last patch and merge ?

Usually I find these kinds of things more interesting than 'top performance' indicators.

(In reply to comment #57)

Can we get some KPIs on the smaller stuff ?

Just a bit of background: we agreed to focus our work on five KPIs described at https://www.mediawiki.org/wiki/Community_metrics#Key_performance_indicators . Here we are working on the first one, related to code contributions. In addition to this http://korma.wmflabs.org offers more data.

The answers below refer to the data provided by the metrics dashboard in Korma alone.

How about:

What repositories get the least contribs/reviews

Contributions: http://korma.wmflabs.org/browser/scm-repos.html?page=24 and up.

Reviews: http://korma.wmflabs.org/browser/scr-repos.html?page=24 and up.

Who contributes/reviews most of the code that no on else contributes/reviews
(who takes care of the orphan projects)

Currently we are not displaying authors per repo, and we don't have a way to distinguish the repositories an author or a reviewer contributes to. Is this what you mean? It is a good point, and it would be great if you could open a new report to cover it.

Who makes the most changes to the fields of a ticket (who does triage?)

Do you mean triage in Bugzilla? There is not much now http://korma.wmflabs.org/browser/its.html but we want to work more on this when we address the KPI Bugzilla response time: https://www.mediawiki.org/wiki/Community_metrics#Bugzilla_response_time

Which repo's have the longest time between first and last patch
Which repo's have the longest time between first/last patch and merge ?

Do you mean first and last patch in the review queue? We are measuring the time to review patches by repository at http://korma.wmflabs.org/browser/scr-repos.html If you need something different not covered in the currently planned KPIs then please open a report.

Usually I find these kinds of things more interesting than 'top performance'
indicators.

Yes, in these KPI we attempt to look not only at top performers but also at bottlenecks or neglected areas. For instance, when looking at times to review at the Gerrit queue we sort the lists starting by those repos with larger time to review. See Bug 37463 - Key performance indicator: Gerrit review queue

(In reply to comment #55)

In a second look I have realized that the graphs at
http://korma.wmflabs.org/browser/scr-countries.html count reviews. I think it
would make more sense that they would count authors.

I mean, when it comes to organizations it does make sense to see which
organization is funding how much work, and it is good to count that work in
reviews. However, our interest in the location of contributors is based on
the
people, less on the amount of reviews.

In the case of our community it is clear that most reviews come from USA and
Germany (when the devs fills their data) because this is where most WMF and
WMDE (professional, full time) developers are located. Still, if there are a
dozen of developers with just a bunch of commits in some other country we
definitely want to know. In this case, 10 developers with 5 merged commits
each
has more relevance than a single developer with 50 commits.

Conclusion: it would be good to have the data based on authors. If you want
to
keep the current graphs that is fine too.

When it comes to http://korma.wmflabs.org/browser/who_contributes_code.html ,
we will swap "Submitted per country (aggregated)" for the graph by people as
soon as it is available. But this is not a blocker for the KPI anymore, as
agreed.

Ok Quim, I will take a look and try to use authors also in this report.

As agreed with Alvaro, his only remaining task in this report is to edit the graph "The evolution of the total amount of merged commits" at

http://korma.wmflabs.org/browser/who_contributes_code.html

so that the starting date is the same as "Reviews merged".

After this is done the report will be assigned to me, since I still want to improve the titles and descriptions via HTML edits.

Alvaro, I intend to finish my part this week because I really really want to show a first complete KPI at the Engineering Community Team showcase next week. :)

Quim, you will have this change tomorrow for the review meeting! :)

And I hope some advances also in KPI gerrit review queue.

Alvaro has made the last change to http://korma.wmflabs.org/browser/who_contributes_code.html required from him in this report. I'm taking it now.

http://korma.wmflabs.org/browser/scr.html is quite mysterious for me, I'm unable to extract any meaningful information from it.

  • "pending" graph has no legend, the ? icon does nothing. Worth noting if you're filtering any repo, or -1/-2 commits, or not.
  • "Review time in days": absolutely no idea what this is. The legend is just a tautology so it doesn't explain anything: "Review time in days: Median review time in days".
  • "submitted vs. Merged changes vs. Abandoned": this is the only clear part of the page. :)
  • "code reviews waiting for reviewer" doesn't make any sense, a code review (which is composed by comments and a label like +1, +2) always as an author. Perhaps this means "commits waiting for reviews", but from the examples below I can't tell.
  • "code reviews waiting for submitter" presumably means "commits waiting for merge" ("submit" is ambiguous, better not use it). Note that merge depends on +2 which is one code review label. If that's the meaning,
  • "Top Successful submitters" per above, confusing: do you mean commit authors, or commit mergers/approvers/+2'er?

Note that self-merges have not been excluded yet, but they're in the process of being filtered at last according to bug 37463 comment 30.

Nemo, we will try to clarify all these points.

But our current effort is in closing KPIs.

We will clarify and fix all of this once KPIs are closed. Thank you very much for your valuable collaboration.

Qgil lowered the priority of this task from High to Low.Nov 23 2014, 11:05 PM
Qgil removed Qgil as the assignee of this task.Feb 2 2015, 11:56 AM

My March 2014 comment still applies. I've not been able to use http://korma.wmflabs.org/browser/scr.html so far.

@Nemo_bis, sorry, I have created a specific task for this at T97118, since this one focuses on another page, http://korma.wmflabs.org/browser/who_contributes_code.html

I'm taking this task to define what still needs t be done in order to close it.

Qgil raised the priority of this task from Low to Medium.Jun 1 2015, 6:51 AM

I'm sorry it took me so long to check http://korma.wmflabs.org/browser/who_contributes_code.html

In fact, it looks good to me. Resolving.