Page MenuHomePhabricator

MediaWiki's info action should include number of languages that a Wiktionary entry includes
Open, LowPublicFeature

Description

The output of MediaWiki's info action should include an entry that lists "number of languages" for Wiktionary entries, possibly listing the languages. For example, https://en.wiktionary.org/wiki/taxi?action=info would list:

Languages (10): English, Dutch, French, Italian, Latin, Norwegian, Romanian, Spanish, Swedish, Walloon


Version: unspecified
Severity: enhancement

Details

Reference
bz41441

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:47 AM
bzimport set Reference to bz41441.
bzimport added a subscriber: Unknown Object (MLST).

This request is not restricted to Wiktionary and applies to any MediaWiki with language links.

The request makes sense although I'm not sure about the importance. What is the usefulness? Setting as Low unless someone has a better argument.

Also, is it a MediaWiki core feature? Not many 3rd party MediaWikis have the setup to use those language links, and in the Wikimedia context they are being handled now by Wikidata. CCing Siebrand just in case he can provide some advice.

(In reply to comment #1)

This request is not restricted to Wiktionary and applies to any MediaWiki
with language links.

Sorry, you're confused. This isn't about MediaWiki langlinks, this is about Wiktionary entry languages.

Sorry, I was confused indeed. What is the need for this request?

Being so specific for Wiktionary, I would be surprised if this feature would be included in MediaWiki Core. Maybe it is possible to create an extension for this purpose?

(In reply to comment #3)

Sorry, I was confused indeed. What is the need for this request?

It's interesting information to know that [[wikt:taxi]] is a word in ten languages on the English Wiktionary, while [[wikt:boner]] is a word in two languages.

Being so specific for Wiktionary, I would be surprised if this feature would
be included in MediaWiki Core. Maybe it is possible to create an extension
for this purpose?

Sure. It's only filed in MediaWiki core because it's about the info action and there's no Wiktionary extension or similar (yet).

From 40,000 feet, yes, Wiktionary and many other [[m:Global South projects]] will likely need more of their own specific extensions at some point.

From ground-level, this is basically a similar request to bug 38534 ("Add LiquidThreads status to MediaWiki's info action"), if an extension approach is taken. You could just count the number of <h2> headers on page save and store that info into the page_props table, via a MediaWiki extension. That would resolve this bug.

Looking at https://en.wiktionary.org/w/index.php?title=taxi&action=edit the only way to identify a language seems to be the fact that they are sections with "==" level headings. In any other wiki that would be just the title of a section.

To me this means that either a hackish solution is developed for Wiktionary only or we wait until Wikidata arrives to Wiktionary and is able to identify those sections as entries in other languages... But even in this case the solution would be quite Wiktionary centric.

I would say this is a WONTFIX for MediaWiki core.

Just counting the number of <h2>'s seems wrong, as I'm almost certain there would be exceptions, and lots of them.

However, that said (And if we're going on the ultra-wiktionary specific. This is in essence an extension request), could we tie into the language categories ("Norwegian nouns", "English nouns", etc) maybe. Given how template happy wiktionary is, maybe a custom parser func could be added to the relavent template {{#has_lang:en}}, which could be looked at.

(In reply to comment #5)

I would say this is a WONTFIX for MediaWiki core.

You seem to currently have a fundamental misunderstanding of the meaning behind resolved/wontfix. Consequently, I'd strongly encourage you not to use this resolution for the foreseeable future.

So yeah, this is definitely an extension request. I don't think we usually close bugs like this though (even if they have a "Review queue" entry like Score, which I think started off as a bug).

[Mid air collision]
In fairness, we are not going to put something so domain specific in core. If this happens it is going to happen as an extension. That said the "how" of the bug being fixed is independant of the validity of the bug. At worse this bug is just mis-filed.

Okay, moved to "MediaWiki extensions" --> "Extensions requests". I honestly could give a damn where this bug is filed, I just don't like to see valid bugs marked resolved (and in particular, resolved/wontfix) because they're difficult or the path to resolution isn't yet completely clear. We can always set up an RFC as necessary. :-)

(In reply to comment #6)

Just counting the number of <h2>'s seems wrong, as I'm almost certain there
would be exceptions, and lots of them.

Is this provable across Wiktionaries? I think this is an important point. As I understand at least the English Wiktionary, they've gone a little buck-wild and they do enforce rules like this, usually with bots.

However, that said (And if we're going on the ultra-wiktionary specific. This
is in essence an extension request), could we tie into the language
categories ("Norwegian nouns", "English nouns", etc) maybe. Given how template
happy wiktionary is, maybe a custom parser func could be added to the relavent
template {{#has_lang:en}}, which could be looked at.

Don't we already have a {{#language:}} parser function? I wonder how often that's used already in Wiktionary entries. Hmm.

(In reply to comment #11)

(In reply to comment #6)

Just counting the number of <h2>'s seems wrong, as I'm almost certain there
would be exceptions, and lots of them.

Is this provable across Wiktionaries? I think this is an important point. As
I
understand at least the English Wiktionary, they've gone a little buck-wild
and
they do enforce rules like this, usually with bots.

That's insane ;) , but if they actually do that, by all means might as well use it. I guess we could further refine it by checking the section name against a list of valid languages (although that could be problematic for non en)

The other downside is im not sure how easy it is to get such info out of the parser. Maybe its recorded somewhere (presumably to make the ToC) but if not would have to resort to regex which is icky.

CCing Lydia to keep the Wikidata team in sync. Is this an extension request or a Wikidata request about Wiktionary?

(In reply to comment #14)

Only you seem to have some sort of strange fixation with Wikidata. This bug is about the core "info" action + Wiktionary entries.

Andre sent me a private e-mail admonishing what he viewed as a personal comment in comment 15. Quim (and only Quim) has repeatedly tried to inject Wikidata into this bug (cf. comment 1, comment 5, comment 13, and comment 14) when this bug has almost nothing to do with Wikidata. The repeated Wikidata comments risk derailing efforts toward fixing this bug, which is about the core "info" action + Wiktionary entries.

This request is in fact about this use case:

(In reply to comment #4)

It's interesting information to know that [[wikt:taxi]] is a word in ten
languages on the English Wiktionary, while [[wikt:boner]] is a word in two
languages.

You propose to obtain this information through the info action, which is a possible implementation approach. I'm suggesting that Wikidata could play a role solving this this problem, and I wonder whether that is another implementation approach.

That's all.

This report should probably be renamed to define the feature requested, not a specific way to implement the feature.

Wikidata does not (at least in its current form, which I believe is by design) work well for metadata that is functionally determined by the main data (the actual page in question). As it stands, ?action=info is the only unified place we currently have for displaying functionally dependent metadata, so it would make sense to display that information there, should the feature request be implemented.

Wikidata could come into this if wiktionary abandoned the whole raw page approach, and used wikidata for recording their content exclusively (kind of like OmegaWiki). In that case this type of information would be a query on wikidata type data, which would make it in scope of wikidata. I feel like that's probably not going to happen.

The other scenario where I could see Wikidata coming into this, is if they expanded from their Wikidata -> client wikis, model, and made the data flow go both ways (to include automatically extracting properties from client wikis). That would add a lot of complications to everything, and expand wikidata's scope significantly. I have not heard anyone say they have plans to do anything like that, but I'm not quite up on latest goings-on in the Wikidata dev community (I'm sure someone can correct me if I'm wrong)

(Intentionally avoiding commenting on the core content of comment 16 as I don't want to be drawn into any dispute that may be happening)

If Wikidata can and should be involved in this in any way really depends on what you actually want to achieve. Yes the number can be displayed there but what for? What will it then be used for? What is the actual usecase we're trying to solve?

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM
Aklapper removed a subscriber: wikibugs-l-list.
Restricted Application added a subscriber: Strainu. · View Herald TranscriptFeb 4 2022, 11:14 AM