Page MenuHomePhabricator

Relating tech contributors with organizations
Closed, ResolvedPublic

Description

Currently we have no way to map tech contributors to their organizations. For instance, many WMF employees commit code with their personal email addresses. While this happens we can't assess properly who contributes code:

https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code

So far we have published a form to allow contributors to introduce their data:

https://docs.google.com/forms/d/1RFUa2zBAOolw78W-ozJPoYlR2lYbrAOYvOZYgjaAYQg/viewform

What needs to be done is to integrate that data at http://korma.wmflabs.org and document the process to update the data.

Not in the scope of this report: come up with a way to allow contributors to manage this data directly through their user profiles.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=55626
https://bugzilla.wikimedia.org/show_bug.cgi?id=58585

Details

Reference
bz53489

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:59 AM
bzimport set Reference to bz53489.

Note: what matters more in the short term are the organizations. Also, that information is public at least for WMF / WMDE employees, and other professionals. This is a never ending task, but in order to consider this report resolved we should have the top 100 code contributors identified with their orgs.

Contributors' location is completely opt-in. Whatever data we get will be good.

Bringing back the Highest priority since this is indeed a top task in our backlog.

Top 100 code contributors using git commits or gerrit merged reviews?

Right now we have a total of 223095 commits with 62631 described as merged reviews. So if we use gerrit data we have 28% of total code contribution activity covered.

If we analyze data from 2012 we have 61050 merged reviews from a total of 104399 total commits.

Good point. Let's start with the authors who get their code merged.

Great! The email data is more accurate so it is easier to do the mapping.

Maybe the best way is to share a spreadsheet with the 100 tops and a company field that will be filled automatically initially with the email domain and that we can check by hand.

Then the companies to people mapping will be added to the Grimoire identities db and the companies report can be created.

I will add to this ticket the progress in this!

(In reply to comment #5)

Maybe the best way is to share a spreadsheet

Remember that we have already one spreadsheet associated to th form linked in the first post. Let's expand on it rather than create a new one, ok?

This is one of those situation where we are dealing with information which is 100% public but scattered. We need to be more careful before having a public spreadsheet where anybody can copy in two clicks the email addresses of the top 100 Wikimedia tech contributors.

Sure Quim, I am working with the spreadsheet from the beginning. But it seems it will be more a confirm and complete data source to be used after a first automatic analysis from gerrit data.

But yes, we should try to reuse it and have only one spreadsheet.

I have added a new sheet with the 45 people from the Top100 whose emails are not from wikimedia (the other 55 the affiliation is WM) and research the affiliation. Quim, could you take a look to results?

Is it a good idea to join WMF and WMF DE?

(In reply to comment #8)

I have added a new sheet with the 45 people from the Top100 whose emails are
not from wikimedia (the other 55 the affiliation is WM) and research the
affiliation. Quim, could you take a look to results?

Will do, hopefully tomorrow.

Is it a good idea to join WMF and WMF DE?

Nope. The Wikimedia Foundation and Wikimedia Deutschland (without F of Foundation) are separate organizations.

Great! I made a few corrections for the most evident cases (WMF former employee, WMDE instead of WMF) but other cases are less clear to interpret, even knowing the data. For instance, Nishayn22 has worked as WMF contractor but honestly I have no idea what is done by him as volunteer and what is done as contractor, and I don't know his professional situation today - or tomorrow.

Proposal: add a spreadsheet with the people among the top 100 conmmitters that didn't fill the form (including their email addresses) and I will send them a request to fill the form. There they can define themselves their affiliation. Those not answering will default to "independent".

That said, I think all this looks already good enough for a first version of Bug 53485 - Key performance indicator: analyze who contributes code. Once these metrics are public we can respond to feedback from the user with wrong/missing data.

Also, the "Name" column contains many no-real-names. Should I edit them?

I won't spend time now updating the "Name" column yet. I feel the current id could be enough for people identifying themselves in the report. Later we can provide a way for the users to change that.

Quim, using the data from our spreadsheet we have now a SCR companies report:

http://korma.wmflabs.org/browser/scr-companies.html

About the proposal of detecting from the Top100 those who did not complete the form, if they use "wikimedia.org/de" I think that it is pretty enough now to relate to a company. Do you feel unconfortable with that? If yes, I will complete this list and send to you.

Good to see some data!

Let's call them "Organizations" instead of "Companies". This is a good idea for free software projects in general, and in our case it's even more needed: WMF and WMDE are non-profits. The story is a bit more complicated than that but calling these orgs companies will raise the eyebrows of some people.

What is "NA"? If it's Non Available then we should consider those "Unknown".

gmail and live should be Unknown" as well.

Also, can we add instructions for contributors to get their data corrected? Or a link to a page with the instructions. It is everybody's interest to keep that "Unknown" as small as possible.

There are more cases that should be aggregated to "Unknown":

users, adres, email, hotmail, yahoo, gmx, googlemail

About the rest, most of them could be aggregated to "Independent"

The end result could then be:

  1. Wikimedia Foundation
  2. Unknown
  3. Wikimedia Deutschland
  4. Independent
  5. Wikia
  6. WikWorks
  7. OmniTI

This would identify a lot better what we know about contributing organizations in the MediaWiki community.

I just learned that "NA" are bots (or the only bot so far). If this refers to the i18n bot then I believe it is fair to say that these contributions come from the TranslateWiki community, and it would be good to identify them as an organization.

Ok Quim, so:

  1. Wikimedia Foundation
  2. Unknown
  3. Wikimedia Deutschland
  4. Independent
  5. Wikia
  6. WikiWorks
  7. OmniTI
  8. TranslateWiki

I will use this mapping and update the data following it!

users, adres, email, hotmail, yahoo, gmx, googlemail -> Unknown
rest -> Independent

As soon as the data is updated, I will comment it here!

Very good! This shows enough committers connected to an organization. From this point we "just"need to update the data based on the feedback received.

What are your thoughts about the other ideas at Comment 12 ? Namely changing "companies" for "organizations" and adding instructions to correct your own data.

By the way, I'm removing the reference to countries in the summary since all the discussion here has been about organizations. I will file a new bug specific to countries.

In order to close this issue:

  • "companies" will be renamed to organizations
  • In order to update contributors data, could we create a google form to store the changes? We will review it periodically and update mapping. In the future, we will need a web interface and do it directly over the database, but now, I will use this approach.

(In reply to comment #18)

  • In order to update contributors data, could we create a google form to

store the changes? We will review it periodically and update mapping.

We have already a Google Form. Let's use it.

Great! So a link to the webform will be added to people.html page in order a user can feedback info about the info the browser is showing.

Let's change "TranslateWiki" for "translatewiki.net". My fault. Thank you!

From nemobis at https://github.com/Bitergia/mediawiki-dashboard/issues/26

http://korma.wmflabs.org/browser/contributors.html mentions "TranslateWiki". This name is ambiguous; you probably meant translatewiki.net.
https://www.mediawiki.org/wiki/Translatewiki.net

I have changed this name. In next updates the org name will be changed from "TranslateWiki" to "translatewiki.net".

(In reply to comment #20)

Great! So a link to the webform will be added to people.html page in order a
user can feedback info about the info the browser is showing.

I believe the only bit missing to resolve this report as FIXED is to provide instructions to contribute your data to the form. We can open a new report about a better way for users to maintain their data. In the meantime, we will introduce the data manually.

We have decided in Bug 53485 comment 32 that translatewiki.net contributions aka L10n-bot self-merges shouldn't be counted. Please remove them. Sorry for the hassle.

Ok, I will remove it in current iteration!

(In reply to comment #23)

(In reply to comment #20)

Great! So a link to the webform will be added to people.html page in order a
user can feedback info about the info the browser is showing.

I believe the only bit missing to resolve this report as FIXED is to provide
instructions to contribute your data to the form. We can open a new report
about a better way for users to maintain their data. In the meantime, we will
introduce the data manually.

Is it enough?

http://korma.wmflabs.org/browser/people.html?id=341&name=raymond

the people.html has being updated also to the new design.

Yes, we can close this bug now. There is only a small detail on the wording and the position of the text:

https://github.com/Bitergia/mediawiki-dashboard/pull/38

I don't understand a thing, is it possible for a person to be in multiple organisations? For instance, many work at the WMF for a period but are/were independent before and after that, or they change affiliation. If it's possible to have multiple affiliations, are they just double counted or can one specify periods and do those periods have to be non-overlapping?

Speaking of which, "affiliation" makes more sense than "organization", let alone "company" (!).

(In reply to comment #28)

I don't understand a thing, is it possible for a person to be in multiple
organisations? For instance, many work at the WMF for a period but are/were
independent before and after that, or they change affiliation. If it's
possible
to have multiple affiliations, are they just double counted or can one
specify
periods and do those periods have to be non-overlapping?

According to https://www.mediawiki.org/wiki/Community_metrics#Contributors

"The classification supports periods of time to cover that a unique people has worked for several companies." I also remember Alvaro mentioning it, but I don't know whether this has been applied already to our metrics.

Ref "affiliations" yes, you are right, see

Bug 60091 - Tech metrics should talk about "Affiliation" instead of organizations or companies

Thank you!

Why is there no "unaffiliated"? All these graphs give a very partial picture.
http://korma.wmflabs.org/browser/scr-companies.html

Hm, this is a regression. See T86152#1233361 and let's continue the discussion there.