Page MenuHomePhabricator

Tech metrics missing IRC channels
Closed, ResolvedPublic

Description

Currently there are only 4 IRC channels being scanned at

http://korma.wmflabs.org/browser/irc-repos.html

https://meta.wikimedia.org/wiki/IRC/Channels#MediaWiki_and_technical contains the list of tech channels

#pywikipediabot should be added as well in order to keep consistency with https://wikitech.wikimedia.org/wiki/Key_Wikimedia_software_projects


Version: unspecified
Severity: normal

Details

Reference
bz54230

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:54 AM
bzimport set Reference to bz54230.

There are 261 Wikimedia IRC channels. We want to gather all those channels?

I think it is not a big issue to configure and implement it, but it could be an important delay and resource consuming in the update process because logs are included in one big file per channel and the incremental support can not be implemented.

http://bots.wmflabs.org/~wm-bot/logs/

Is there any other place with logs for IRC channels?

Check the URL above. I'm proposing only the "MediaWiki and technical" section plus #pywikipediabot

IRC channels are not that critical. Normal priority.

*** Bug 60079 has been marked as a duplicate of this bug. ***

Qgil lowered the priority of this task from Medium to Low.Jan 8 2015, 11:25 AM
Qgil removed Acs as the assignee of this task.Feb 5 2015, 10:50 AM
Qgil added a subscriber: Dicortazar.

Once T96371: Community Metrics for IRC channels not updated since 09/2013 is closed, we may want to work on this task again.

We can discuss this during this Friday conf call and leave it as June task.

Aklapper lowered the priority of this task from Medium to Low.Jun 25 2015, 10:04 AM
Aklapper subscribed.

How do you add channels to korma?

How do you add channels to korma?

Editing /browser/data/json/irc-repos.json in https://github.com/Bitergia/mediawiki-dashboard

So that updated list would be https://github.com/Bitergia/mediawiki-dashboard/pull/59/files (and obviously my GitHub workflow skills are still improvable, sigh)

Thanks for this update!

I'll check that list and update those channels in Korma.

@Dzahn and @Aklapper, I'm still working on the feature to automate the list of repositories to be tracked by Korma at T104845: Automated generation of (Gerrit) repositories for Korma. In summary this will provide a list of repositories per data source and a black list for each of them.

Thus, anyone interested in tracing new repositories or in adding some of them to the black list, should modify such lists and pull request a change.

A first version of the files are available at GitHub in https://github.com/Bitergia/mediawiki-repositories. Those are old versions, so we shouldn't change those a lot.

So, if we keep that workflow, @Aklapper, my suggestion is to add a new config file to the GitHub repo (irc_channels.conf) with the correct list of repositories (sorted). I can do this in any case with the new list you provided. This will help to centralized those lists of repos.

Comments are welcome in any case :). It's a bit complex task to automate and trace repositories, but this could be an interesting feature for all of us :).

(let's keep the discussion of this task focused on IRC channels, not mixing it with code repositories)

Aklapper moved this task from Need discussion to Doing on the wikimedia.biterg.io board.
Aklapper added a project: Patch-For-Review.
Aklapper set Security to None.

I'll check that list and update those channels in Korma.

@Dicortazar: Clean pull request now in https://github.com/Bitergia/mediawiki-dashboard/pull/62

I've commented on the GitHub repo.

As a summary: I'll close that pull request since the place to add/remove repositories takes place at https://github.com/Bitergia/mediawiki-repositories

On the other hand, is there a way to automatically retrieve the list of available channels?.

Finally, the list of updated repos analyzed by Korma are found at http://korma.wmflabs.org/browser/data/json/irc-repos.json, but there are some more than the initial four: "wikimedia-operations", "mediawiki", "wikimedia-dev", "wikimedia-labs", "wikimedia-mobile", "wikimedia-toolserver", "wikimedia-tech", "wikimedia-analytics", "wikimedia-fundraising", "semantic-mediawiki", "huggle"

And regarding to the rest of IRC channels. Do we have logs for all of them?

The following ones were the ones added in the GitHub change:

  • "mediawiki": TRACKED
  • "mediawiki-i18n":
  • "semantic-mediawiki": TRACKED
  • "mediawiki-parsoid":
  • "mediawiki-visualeditor":
  • "mediawiki-scripts":
  • "wikimedia-tech": TRACKED
  • "wikimedia-dev": TRACKED
  • "wikimedia-toolserver": TRACKED
  • "wikimedia-labs": TRACKED
  • "wikimedia-labs-nagios":
  • "wikimedia-ai":
  • "wikimedia-analytics": TRACKED
  • "wikimedia-cep":
  • "mediawiki-core":
  • "wikimedia-collaboration":
  • "wikimedia-databases":
  • "wikimedia-design":
  • "wikimedia-ect":
  • "wikimedia-fundraising": TRACKED
  • "wikimedia-mobile": TRACKED
  • "wikimedia-multimedia":
  • "wikimedia-operations": TRACKED
  • "wikimedia-perf":
  • "wikimedia-qa":
  • "wikimedia-releng":
  • "wikimedia-research":
  • "wikimedia-search":
  • "wikimedia-teampractices":
  • "huggle": TRACKED
  • "pywikibot":
  • "wikidata"

For the rest of the non TRACKED IRC channels, there's missing information in some cases at https://meta.wikimedia.org/wiki/IRC/Channels about where the logs are stored.

I'll close that pull request since the place to add/remove repositories takes place at https://github.com/Bitergia/mediawiki-repositories

Thanks! Good to know.

On the other hand, is there a way to automatically retrieve the list of available channels?.

I'm afraid that technically speaking the answer is no. However for "automatically retrieve the list of channels which are logged by wm-bot" I have asked its maintainer and will update this task once I've received an answer.

Finally, the list of updated repos analyzed by Korma are found at http://korma.wmflabs.org/browser/data/json/irc-repos.json

"wikimedia-toolserver" is irrelevant nowadays, I've created https://github.com/Bitergia/mediawiki-repositories/pull/8 to remove it.

For the rest of the non TRACKED IRC channels, there's missing information in some cases at https://meta.wikimedia.org/wiki/IRC/Channels about where the logs are stored.

Taking the list of channels which are not listed in our configuration yet, and which I'd be interested in, and which I could find public logs for in folders which saw recent updates:

Removing myself as assignee here as I don't know how the "grab those logs" infrastructure works exactly.

Based on info provided by @Aklapper, I've added all of them, but mediawiki that was already in the list.

So, the list is as follows:

  • "mediawiki": TRACKED
  • "mediawiki-i18n":
  • "semantic-mediawiki": TRACKED
  • "mediawiki-parsoid":
  • "mediawiki-visualeditor": TRACKED
  • "mediawiki-scripts":
  • "wikimedia-tech": TRACKED
  • "wikimedia-dev": TRACKED
  • "wikimedia-toolserver": TRACKED
  • "wikimedia-labs": TRACKED
  • "wikimedia-labs-nagios":
  • "wikimedia-ai": TRACKED
  • "wikimedia-analytics": TRACKED
  • "wikimedia-cep":
  • "mediawiki-core": TRACKED
  • "wikimedia-collaboration": TRACKED
  • "wikimedia-databases": TRACKED
  • "wikimedia-design":
  • "wikimedia-ect":
  • "wikimedia-fundraising": TRACKED
  • "wikimedia-mobile": TRACKED
  • "wikimedia-multimedia": TRACKED
  • "wikimedia-operations": TRACKED
  • "wikimedia-perf":
  • "wikimedia-qa":
  • "wikimedia-releng": TRACKED
  • "wikimedia-research": TRACKED
  • "wikimedia-search":
  • "wikimedia-teampractices": TRACKED
  • "huggle": TRACKED
  • "pywikibot":
  • "wikidata" TRACKED

Based on info provided by @Aklapper, I've added all of them

Thank you.
I think we're cool with this list for now; adding any further potential channels would first require making them logged on the Wikimedia side of things.

@Dicortazar: For my understanding of the infrastructure: where did you "add" them? Any URL to some public diff? I guess the next and last step for one of us is to provide a pull request to mediawiki-repositories/irc_channels.conf adding those TRACKED channels and after merging that pull request we are done here?

@Aklapper, what I've done so far is to manually add them to the infrastructure (this is indeed a script deleting, removing, adding URLs, tar.gz, etc) and I wanted to add that list of TRACKED channels once the process if finished and correct (still running).

And yes, as you comment, we just need to add the listed channels in that irc_channels.conf file and we would be done here :).

Do we need to remove deprecated channels? Isn't their information useful to
count contributions of related users in those channels?

This depends on your needs. I mean, as we're not automatically retrieving IRC channels information, we can keep as many channels as you want.

The irc_channels.conf file is more a place to centralize requirements from Korma than an automated place to retrieve IRC info. On the other hand, as we successfully automated Gerrit projects, the Gerrit .conf file is a place where automatically info is updated, added and removed.

Do we need to remove deprecated channels?

No, so I've canceled https://github.com/Bitergia/mediawiki-repositories/pull/8

we just need to add the listed channels in that irc_channels.conf file and we would be done here :).

Awesome! Feel free to close this longstanding task after merging https://github.com/Bitergia/mediawiki-repositories/pull/9

We (I?) can create separate tasks to discuss whether to get certain channels logged by wm-bot ("mediawiki-i18n", "mediawiki-parsoid", "mediawiki-scripts", "wikimedia-cep", "wikimedia-design", "wikimedia-ect", "wikimedia-perf", "wikimedia-qa", "wikimedia-search", "pywikibot") and after deciding, getting them added to korma later.

is there a way to automatically retrieve the list of available channels?.

for "automatically retrieve the list of channels which are logged by wm-bot" I have asked its maintainer and will update this task once I've received an answer.

<addshore> the list is not on github! You manage the setting per channel through the bot!
<addshore> http://bots.wmflabs.org/~wm-bot/dump/systemdata.htm
<addshore> that has all of the info on it, updated every 20 mins
<addshore> there isn't a structured form currently though

Also note my last comment at https://github.com/benapetr/wikimedia-bot/pull/49#issuecomment-149596883
It would be trivial for this to all be in json format too but I would leave that for @Petrb to implement!

Ok, channels are updated.

Having a JSON file with the list of channels and log place would be awesome :).

@Aklapper, from my side feel free to close the task if you prefer to have further discussion about automation in another ticket.