Page MenuHomePhabricator

Implement a way to index Phabricator tasks by Wikimedia wiki family
Closed, DuplicatePublic

Description

I'd like to be able to sort bugs by wiki family. For example, I'd like to be able to find all (open) bugs that relate to Wikibooks.

Currently there's no mechanism for doing this, so I propose adding Bugzilla keywords for each Wikimedia wiki family.

It looks like Wikidata already has its own keyword, so this would mean adding the following keywords:

  • wikipedia
  • wiktionary
  • wikibooks
  • wikiquote
  • wikisource
  • wikinews
  • wikiversity
  • wikimediacommons (or perhaps just "commons"?)

Version: unspecified
Severity: enhancement
URL: https://bugzilla.wikimedia.org/describekeywords.cgi
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36539
https://bugzilla.wikimedia.org/show_bug.cgi?id=56295

Details

Reference
bz38994

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 12:54 AM
bzimport set Reference to bz38994.
bzimport added a subscriber: Unknown Object (MLST).

Note, wikisource uses a tracking bug for this sort of thing: bug 35925

(In reply to comment #1)

Note, wikisource uses a tracking bug for this sort of thing: bug 35925

Yeah, I'm not tied to one idea or another. I guess the bug summary is a little misleading (I'll change it now). I tried to present the problem and then I proposed a possible solution (keywords). There may be better solutions to making it possible to index bugs by wiki family.

The reason for me requesting this change is to be able to get a (rough) metric for resource allocation. Combined with a few other metrics (such as other keywords and bug status/resolution), it'll provide a better picture of which projects really need love, which are getting too much love, etc. Bug reports aren't the best data point, but I think if this data is populated, it will be useful. If others disagree, please say so. :-)

Another idea I had was to do this outside of Bugzilla, but without database replication of Bugzilla's database, I don't really have time to learn how to deal with Bugzilla's API. This means that unless I can find an easy way to implement this feature (indexing by wiki family), it won't get done. I think there's a lot of benefit in keeping the data in Bugzilla, particularly as Bugzilla has features (such as tracking bugs and keywords) that make indexing by wiki family not painful to implement. The only painful part will be doing the actual tagging/adding dependencies/whatever.

Yeah, I'm not tied to one idea or another

Honestly when I think about it, keywords may be less painful.

There's all sorts of political issues to though. What's a "Wikinews" bug, what is a "wikipedia" bug since most bugs affect all projects, although some affect all projects but to different extents. Very few are clear-cut bugs about one project.

And more to the point, (As an example) every proofread page bug is a wikisource bug, but seems like wasted effort to tag each bug as such.

The reason for me requesting this change is to be able to get a (rough) metric
for resource allocation. Combined with a few other metrics (such as other
keywords and bug status/resolution), it'll provide a better picture of which
projects really need love, which are getting too much love, etc.

I'll give you a hint. It starts with "Wiki" and ends in "pedia" :P

(In reply to comment #3)

And more to the point, (As an example) every proofread page bug is a wikisource
bug, but seems like wasted effort to tag each bug as such.

I don't think it's wasted effort if it provides useful data. I want to know exactly things like this: how many bugs are open in the ProofreadPage extension and in the other tools and scripts that are used on Wikisource? What about Wiktionary? What about Wikipedia? And then you can more interesting metrics such as how long the Wikipedia bugs stayed open, etc. As I said, you could dump the data and do this fairly easily in your own database, but I'd like to be able to do it inside Bugzilla.

You could also piece together the data, but it'd be nice given the availability of keywords/tagging to be able to find them explicitly.

I can commit to helping out with the initial tagging. This will only be for open bugs; we'll leave the closed bugs alone.

The reason for me requesting this change is to be able to get a (rough) metric
for resource allocation. Combined with a few other metrics (such as other
keywords and bug status/resolution), it'll provide a better picture of which
projects really need love, which are getting too much love, etc.

I'll give you a hint. It starts with "Wiki" and ends in "pedia" :P

Yeah, but I want to see the pain points more clearly on other wiki families. Bug stats aren't a completely accurate indication of this, but I think they're a decent starting point.

If you have better ideas for what I'm trying to accomplish (with or without the use of Bugzilla), I'm happy to hear them. :-)

(In reply to comment #0)

It looks like Wikidata already has its own keyword

Yes, which was stupid and useless. The Wikidata extensions have their own components. And whatever goes into MediaWiki core has mediawiki-core bugs. Which may or may not be linked as dependency for a bug in a Wikidata extension.

Also, some extensions are used my multiple wiki families and some extensions are only used some some language editions of a project.

Before discussing further I think it would be useful to put some use cases on the table. That will probably lead us to an appropiate solution. Because having "being able to index bugs by wiki project" as a sole goal without use cases is completely useless in my opinion.

(In reply to comment #5)

(In reply to comment #0)

It looks like Wikidata already has its own keyword

Yes, which was stupid and useless. The Wikidata extensions have their own
components. And whatever goes into MediaWiki core has mediawiki-core bugs.
Which may or may not be linked as dependency for a bug in a Wikidata extension.

Also, some extensions are used my multiple wiki families and some extensions
are only used some some language editions of a project.

There seems to be a fascination with edge cases. Keywords are never completely encompassing. There will be corner cases or edge cases with any classification scheme ever used anywhere. Who cares?

Before discussing further I think it would be useful to put some use cases on
the table. That will probably lead us to an appropiate solution. Because having
"being able to index bugs by wiki project" as a sole goal without use cases is
completely useless in my opinion.

Sure. I tried to explain some use-cases above, but I'll try again.

When a young and wide-eyed developer comes along and says "I want to help Wikisource! What are the software bugs facing Wikisource?", I want to be able to point that person to a list of relevant, unresolved bugs.

When Wikimedia and others are doing strategic planning and looking at resource allocation, I want them to be able to look at Bugzilla and say "We've sure got a lot of unresolved bugs for this particular wiki family. Maybe it needs more attention from Wikimedia staff."

Or perhaps classifying the bugs will be a means of finding out which wiki families aren't using Bugzilla effectively. We know that Wikipedias generally understand and use Bugzilla, but perhaps other wiki families are reporting issues elsewhere (on the wiki, on mailing lists, etc.). Those wikis might need more help or attention from the bug team to ensure that their bugs and feature requests are properly filed and eventually get looked at and acted upon.

Again, bug count is not a perfect metric by any means, but it's a starting point. It's a means of roughly tracking which issues are affecting which types of wikis.

A few other methods have been attempted for tracking bugs. One method is using tracking bugs (I'm not a huge fan of tracking bugs generally). Another method appears to be using keywords (such as "wikidata"). This seems like what the keywords system was designed for, to me.

There are less-than-ideal solutions such as having manual indices (I believe some projects have "Feature wishlists" or simliar), but I don't think separating Bugzilla from the feature/bug lists is a good idea. We need dynamic lists so that it's possible to list all unresolved (open) bugs. We want to keep the data centralized in this case to give people a single point of reference for issue tracking.

I'm still a bit unclear why you don't like the "wikidata" keyword. From your example, it seems like the bugs affecting core that Wikidata relies on would be tagged rather unambiguously with the "wikidata" keyword. Why do you think such a keyword is a bad idea?

And, broadly, do you think tracking by wiki family is useful? If so, do you have a better system in mind for implementing this ability? (Or more to the point: do you believe we should eliminate the "wikidata" keyword and/or the Wikisource tracking bug? And if so, what would you replace them with?)

content hidden as private in Bugzilla

How about a query that lists all bugs for extensions used on a particular wiki? No need for classification of each individual bug.

Except for shell requests that should get everything that is related. On the
software level there really can't/shouldn't be any "wiki project" specific
bugs.

Keywords (and tracker bugs) only provide a good overview if they are properly and constantly used. Seeing the 9 keywords in comment 0 I wonder who would add them and have the deep knowledge which projects are affected. (Sure, you could add only the one keyword for the project that you know by heart, but would others really add the missing keywords for the other projects?)
Indeed it boils down to knowing which functionality/extensions are used in each project, and as I personally already fail to find a list of extensions shipped *by default* in MediaWiki I'm not very optimistic that these keywords would actually help much (the idea is interesting, I'm just afraid of the reality out there, and I cannot provide a better idea either so take this as rather destructive criticism).

This sounds like a perfect job for a quick tool using the bugzilla API (based on the configuration / extension setup of all wikis inside a family, a specific wiki, or all wikis, even). Manually maintaining keywords is not only a maintenance nightmare, it also blurs the keyword landscape with keywords relatively meaningless/unrelated to software development, and lots of notifications for keyword changes that we don't want.

One way or another a wiki is a consumer of the software, and though we have had a fair injection towards Wikimedia wikis, attaching keywords to software bugs is no doubt crossing the line imho. Next up we'll be having keywords for individual wiki users.

(In reply to comment #10)

This sounds like a perfect job for a quick tool using the bugzilla API (based
on the configuration / extension setup of all wikis inside a family, a specific
wiki, or all wikis, even).

I can't understand how this is related.

Manually maintaining keywords is not only a
maintenance nightmare, it also blurs the keyword landscape with keywords
relatively meaningless/unrelated to software development,

Tracking bugs don't have this problem, maybe they're enough? The problem is, they're definitely less usable (e.g. in search).
I'm not sure what tracking bugs deserve/need to be "promoted" to keywords: the distinction is quite confused currently, I think.

and lots of
notifications for keyword changes that we don't want.

This applies to tracking bugs as well: do you think they are annoying as well?

(In reply to comment #11)

This applies to tracking bugs as well: do you think they are annoying as well?

Yes, any unrelated stuff cluttering bug management is annoying.

(In reply to comment #11)

(In reply to comment #10)

This sounds like a perfect job for a quick tool using the bugzilla API (based
on the configuration / extension setup of all wikis inside a family, a specific
wiki, or all wikis, even).

I can't understand how this is related.

That explains the rest of your comment.

Let me elaborate:

  • Create a tool that knows which extensions are installed on a wiki and which mediawiki/core components are especially relevant
  • It can aggregate this for all bugs and for all wikis inside a family
  • Output: List of bugs and their status / dependencies and what not. Lots of possibilities. Heck, if one really feels like it build an HTML5 app that shows changes in real-time and order by recent activity / relevance / priority.

Point is, for the same reason we don't have a watch-Krinkle tag, I don't think we should have a watch-Wiktionary tag. Instead this is done on the user level / project level with outside aggregations. Plus, those tools have the potential to be much more useful to the average user than some internal process inside Bugzilla.

(In reply to comment #12)

(In reply to comment #11)

This applies to tracking bugs as well: do you think they are annoying as well?

Yes, any unrelated stuff cluttering bug management is annoying.

We have hundreds of tracking bugs, though.

(In reply to comment #11)

(In reply to comment #10)

This sounds like a perfect job for a quick tool using the bugzilla API (based
on the configuration / extension setup of all wikis inside a family, a specific
wiki, or all wikis, even).

I can't understand how this is related.

That explains the rest of your comment.

Let me elaborate:

  • Create a tool that knows which extensions are installed on a wiki and which

mediawiki/core components are especially relevant

This doesn't help.

  • It can aggregate this for all bugs and for all wikis inside a family

Useless.

  • Output: List of bugs and their status / dependencies and what not. Lots of

possibilities. Heck, if one really feels like it build an HTML5 app that shows
changes in real-time and order by recent activity / relevance / priority.

Point is, for the same reason we don't have a watch-Krinkle tag, I don't think
we should have a watch-Wiktionary tag. Instead this is done on the user level /
project level with outside aggregations. Plus, those tools have the potential
to be much more useful to the average user than some internal process inside
Bugzilla.

Sure, lots of possibilities, but none seems extremely helpful for what's asked here.

(In reply to comment #13)

(In reply to comment #11)

Let me elaborate:

  • Create a tool that knows which extensions are installed on a wiki and which

mediawiki/core components are especially relevant

  • It can aggregate this for all bugs and for all wikis inside a family

This doesn't help.

Useless.

Erm.. that is exactly what this bug is asking for. A way to index all bugs related to a certain wiki or a wiki family (e.g. bugs that affect "nl.wiktionary" or "wikisource").

Since on a software level a wiki is just a combination of extensions and settings, it makes sense to not maintain everything by hand but make use of that fact. That avoids a lot of duplication.

(In reply to comment #12)

Point is, for the same reason we don't have a watch-Krinkle tag

Just for better understanding, as Krinkle is a user you can watch him though by adding his account ID (email address) to "User Watching" under https://bugzilla.wikimedia.org/userprefs.cgi?tab=email .

In general agreeing with Krinkle, achieving this functionality manually sounds extremely error-prone.

(In reply to comment #15)

In general agreeing with Krinkle, achieving this functionality manually sounds
extremely error-prone.

So I guess we'll need some kind of tool, then.

I'm not sure how best to implement this idea (indexing Bugzilla bugs by Wikimedia wiki family). The HTML5 app idea sounds kind of cool, but maybe doing something in a Bugzilla extension or even inside a MediaWiki extension makes more sense? Not sure.

Talked to Daniel about WikiVoyage migration and he also asked for being able to mark specific tickets as affecting WikiVoyage.
Hence this would require a manual way to add a labelling for specific tickets, instead of going per component etc.

Currently we use either tracking tickets (bug 35925: wikisource; bug 37883: commons; bug 28486: incubator) or keywords ("wikidata").
Another option is using a custom field (Multiple-Selection Box, as dropdown fields only allow one value to be selected) for this.

One potential advantage of keywords is that they are included in the bugmail as "X-Bugzilla-Keywords" headers (for parsing) while "blocks/depends" and custom fields are not included it seems.

(In reply to comment #17)

One potential advantage of keywords is that they are included in the bugmail as
"X-Bugzilla-Keywords" headers (for parsing) while "blocks/depends" and custom
fields are not included it seems.

I'm still not quite clear why keywords are seemingly a non-starter.

Krinkle seemed to be concerned with the ambiguity and edge cases of using keywords, but those ambiguities and edge cases seem to infect every form of categorization ever. I say we stop making perfect the enemy of the done and just use the built-in keywords feature.

If a bug falls within multiple wiki families, tag it multiple times. If it primarily affects a particular wiki family, tag it accordingly. If it doesn't affect any wiki family, don't tag it all (or maybe we could have a "mediawik" tag...).

I don't think you can tell Bugzilla to add keyword X by default when filing a bug report against product/component Y.

Plus I still challenge if there is a real need to have this for *every* family.
Any idea how many people would likely use such a categorization? I see that *some* families might consider it helpful (bug 35925: wikisource; bug 37883: commons; bug 28486: incubator) though I'm not sure if the creations of such tracker bugs had some justification or if somebody "just did it by creating the tracker bug" because s/he thought it might be helpful at some point, and now we all spend time marking items as blocking though nobody really cares in the end.

Plus I think I'm afraid that half of my bugmail will be notifications on keyword changes (though I could switch off bugmail when "The keywords field changes").

(In reply to comment #19)

Plus I still challenge if there is a real need to have this for *every* family.

Wikipedia surely doesn't need it, for one. ;-)

Any idea how many people would likely use such a categorization? I see that
*some* families might consider it helpful (bug 35925: wikisource; bug 37883:
commons; bug 28486: incubator) though I'm not sure if the creations of such
tracker bugs had some justification or if somebody "just did it by creating the
tracker bug" because s/he thought it might be helpful at some point,

In the past, it has sometimes been specifically requested (maybe by Erik M. in person) to compile "wishlists" for some of the sisterprojects, because everybody knows that the WMF doesn't support them well but few know what would actually be needed to improve.

and now we
all spend time marking items as blocking though nobody really cares in the end.

Indeed only people who care about the categorization should perform it, IMHO.
And perhaps keyword changes bugmail should be disabled by default (regardless of this bug)?

(In reply to comment #19)

I don't think you can tell Bugzilla to add keyword X by default when filing a
bug report against product/component Y.

So? There seems to be this great wall of resistance against adding keywords (tags, if you will...) to bugs. In any sane system, adding tags like this would be a non-issue. I think the crux of the issue is that people don't want the bugspam. Why don't we just switch the default to off for the keywords field and then those interested in tagging bugs (with whatever keywords they want...) can.

I guess that's a secondary issue with this flawed Bugzilla keywords design: you need an administrator to make a new keyword. This has hampered keyword usage _for years_ without any good reason. The ability to add arbitrary, searchable metadata to bugs is invaluable and I don't really feel it's necessary to defend that practice here. It just is. If Bugzilla's keywords implementation won't work, let's find something that will.

Plus I still challenge if there is a real need to have this for *every* family.
Any idea how many people would likely use such a categorization? I see that
*some* families might consider it helpful (bug 35925: wikisource; bug 37883:
commons; bug 28486: incubator) though I'm not sure if the creations of such
tracker bugs had some justification or if somebody "just did it by creating the
tracker bug" because s/he thought it might be helpful at some point, and now we
all spend time marking items as blocking though nobody really cares in the end.

Only Wikipedia gets resources from the Wikimedia Foundation currently. You could fairly rename the Wikimedia Foundation the Wikipedia Foundation. And it's been this way for years and anyone with discernible levels of clue understands and acknowledges this. It's not a secret. It's just the way it is right now.

However, the sister projects (or Global South projects, as I've taken to calling them) still exist. Wiktionary, Wikinews, Wikiversity, etc. And they still need resources, some of them desperately. So, as a means of high-level triage, I'd like to be able to tag these bugs (using any system really that doesn't require me to build my own separate database of these bugs) so that I can see how many bugs we have in various areas (for example, Wiktionary). And point developers at bugs in these areas. And sort bugs in these areas by priority and votes and importance (and find bugs with high keyword overlap to conserve/maximize finite resources...) to see where that project is and where it needs to be.

I'm not asking for anyone to, God forbid, work on any of these bugs or focus on the Global South projects. I'm just asking that we able to sort through the pile of bugs related to these projects at a very high level.

Plus I think I'm afraid that half of my bugmail will be notifications on
keyword changes (though I could switch off bugmail when "The keywords field
changes").

Right. Bugspam. We all hate it. The default value for this user preference can be switched, right? Or in the worst case, someone can futz with the database? I don't see an issue here.

(In reply to comment #21)

The ability to add arbitrary, searchable metadata to bugs is invaluable

That's what the Status Whiteboard is for - free random tagging for everybody! :)

(In reply to comment #22)

(In reply to comment #21)

The ability to add arbitrary, searchable metadata to bugs is invaluable

That's what the Status Whiteboard is for - free random tagging for everybody!
:)

Through extensive discussion, I believe we've come to the agreement that the whiteboard is for blobbing, not tagging. Blobs can't be effectively (or sanely) used as tags.

The path forward for this bug remains unclear.

Comment 14 has a proposal to identify which components are used for which wiki. It likely requires Bugzilla hacking as you cannot define default keywords for newly entered reports in a specific component (and I couldn't find an upstream bug report either, so seems not to be a popular construction).
With regard to blobbing vs tagging in the whiteboard, this might be covered by https://bugzilla.mozilla.org/show_bug.cgi?id=725438 .

(In reply to comment #24)

Comment 14 has a proposal to identify which components are used for which wiki.

Even if you could attach components to wiki project families magically, this doesn't solve the problem. Krinkle makes the assertion that if you can figure out which extensions and components apply to a wiki family, you can index by that family. However, this doesn't seem to take into account bugs about JavaScript hacks being used in lieu of a proper MediaWiki extension, which is common on these projects (these non-Wikipedias). Or what about requests to create a new extension or add some functionality that doesn't exist yet (because Wikipedia doesn't need it)? How do you index bugs based on this? Using Bugzilla components alone won't work.

It likely requires Bugzilla hacking as you cannot define default keywords for
newly entered reports in a specific component (and I couldn't find an upstream
bug report either, so seems not to be a popular construction).

I'm not sure why you keep mentioning default keywords. This seems to presume a component <--> wiki family correlation that I'll go ahead and argue does not and cannot exist (cf. the preceding reply directly above). We don't need default keywords; a small (finite) number of keywords (listed below) need to be added to Bugzilla. In addition, we possibly need to adjust how much noise is generated by the modification of a bug's keywords.

This seems like a fairly straightforward job for Bugzilla keywords. This would be a finite set of trackable keywords:

  • wikipedia
  • wiktionary
  • wikibooks
  • wikinews
  • wikiquote
  • wikisource
  • wikiversity
  • wikivoyage
  • wikidata (already exists)

And then we can examine whether it's necesary to simply disable bugspam for keyword changes. Fair?

(In reply to comment #25)

[..] examine whether it's necesary to simply disable bugspam for keyword changes [..]

No. Changes can contain different kinds of modifications (comment, summery, meta data such as keywords). And keyword changes can be for different reasons and types of keywords as well.

Disabling notifications for that in any way would like be undesired, unless limited to changes that only interact with those keywords.

(In reply to comment #23)

Through extensive discussion, I believe we've come to the agreement that the
whiteboard is for [...]

Discussion where?

It would be nice if it was clear to a novice user filing a bug that he can full out the which wiki parameter. Can we add a field like the "web platform" field but for "which wiki". Ideally the choices would be All the major wiki families we host + other + n/a. The only thing that we would need to be careful about is ensure that third parties don't think they need to fill it out.

I think an important question we're forgetting is the intended consumers of the data and their use case.

Is this relevant to developers? (that is, beyond having a url where the bug exhibits, which we already have)
No.

Should the bug reporters care?
No.

There is one more question that should be listed here that answers Yes, but I'd like someone else to write it.

Krinkle, it's clear that opinions differ here, I see little use in repeating it over and over. I'd answer "Yes" to both your questions.

(In reply to comment #29)

I think an important question we're forgetting is the intended consumers of the
data and their use case.

Is this relevant to developers? (that is, beyond having a url where the bug
exhibits, which we already have)
No.

Should the bug reporters care?
No.

There is one more question that should be listed here that answers Yes, but I'd
like someone else to write it.

As Nemo said the answer to those two questions are debatable, especially the first one. I give (or used to anyways, especially at first) much higher priority to bugs from Wikinews [as that is where I come from], and would love a way to be default cc on all Wikinews related bugs [yes I know, none of the proposed solutions allow that]. Arguably Should and Do have different meanings, and perhaps I should treat all bugs equal, but as a volunteer, I do what I want ;)

People who are non-developers and non-bug reporters are also consumers of this data. We've got people interested in Stats, people like MzMcBride who are a category on to themselves, random community members, etc.

Do random members of editing communities care about the status of bugs/feature requests that are specific to their communities?
Yes. [As evidenced by bug 35925. People don't do that much busy work if they don't care]

I now have a very rough draft here: https://meta.wikimedia.org/wiki/Wish_list.

I'm not sure what to do with this bug at this point. I'm still annoyed that these tags haven't been added to Bugzilla, but I don't care enough to continue fighting this fight. Time to mark this bug resolved?

(In reply to comment #32)

Time to mark this bug resolved?

Could we keep it open please with low prio? I'm not against fixing the problem (hence no WONTFIX), it's just that I don't see a good implementation yet.

(In reply to comment #33)

(In reply to comment #32)

Time to mark this bug resolved?

Could we keep it open please with low prio? I'm not against fixing the
problem (hence no WONTFIX), it's just that I don't see a good implementation
yet.

+1

Aklapper claimed this task.

Wikimedia has migrated from Bugzilla to Phabricator. Hence closing as "declined".

In Phabricator terms, this is covered in T802.

MZMcBride renamed this task from Implement a way to index Bugzilla bugs by Wikimedia wiki family to Implement a way to index Phabricator tasks by Wikimedia wiki family.Nov 26 2014, 10:08 PM
MZMcBride set Security to None.

Wikimedia has migrated from Bugzilla to Phabricator. Hence closing as "declined".

In Phabricator terms, this is covered in T802.

Bugzilla bugs are directly analogous to Phabricator tasks, so it seems wrong to me to "throw away" all of the discussion here and mark this task as resolved/declined when the actual issue clearly remains unresolved. A simple tweak to the task summary is all that's needed here, in my opinion (now done).

A simple tweak to the task summary is all that's needed here, in my opinion (now done).

Plus points if you also summarise all the use cases from the discussion into the task description. :)

When it comes to "Implement a way to index Phabricator tasks by Wikimedia wiki family", the enablers here could be either projects (which is being discussed in T802) or a custom field in the task creation/edition form (which I consider an overkill and prone to all the misbehavior described by others above). Can we consider T802 a blocker of this one?

Creating projects (tags ;) ) for Wikimedia families is easy. Assuring that those tags are used by maintainers of those projects so they become useful is a more complex task. Getting the usage so reliable that you could extract meaningful metrics out of it looks like almost mission impossible to me, but this is just my personal opinion (and I really mean it) after more than a year trying to extract meaningful metrics from relatively better framed contexts.

I don't see a difference to T802. Can we merge this (or can someone explain what the difference is)?