Page MenuHomePhabricator

[Story] Mono-lingual text datatype should support "no linguistic content" and "undetermined language"
Closed, ResolvedPublic

Description

It should be possible to add language = no linguistic content (ISO 639-2 : "zxx") and language = undetermined " (ISO 639-2 : "und").

Usecase property:P1684 for inscriptions on artworks or ancient artefacts

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:38 AM
bzimport set Reference to bz70205.
bzimport added a subscriber: Unknown Object (MLST).

Can you please link to a specific example? Thanks!

Zoloinwiki wrote:

Yes, for "unknown language", not that many, but for example:
https://www.wikidata.org/wiki/Q13534364 inscription: 'ΒΟΥΗΛΑ·ΖΟΑΠΑΝ·ΤΕΣΗ·ΔΥΓΕΤΟΙΓΗ·ΒΟΥΤΑΟΥΛ·ΖΩΑΠΑΝ·ΤΑΓΡΟΓΗ·ΗΤΖΙΓΗ·ΤΑΙΣΗ', language: unknown. (qualifier:writing system: Greek alphabet)

For "no linguistic content" that would be short texts like signatures or inventory numbers that may have significance for art historians but are not really in any language, like https://commons.wikimedia.org/wiki/File:Egon_Schiele_060.jpg : inscription : 'S10' (writing system: latin alphabet)

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).

For "no linguistic content", the "string" data type seems like a better match. But I suppose the intention here is to always use the same property to represent the inscription text, no matter whether it's natural language or not (what about symbols though? QR codes? pictograms?..)

In general, it should be possible to configure any additional languages the admin of a wikibase install likes. We have to think about how the names for such languages get localized, and how we handle things like directionality marks. Also, handling the language list as such is not a trivial challenge if it gets big.

If we can discuss about the necessity to have no linguistic content (ISO 639-2 : "zxx) which can be described with the string datatype, we need the undetermined " (ISO 639-2 : "und").

Could you add the two other special ISO 639-2 codes : "mul" (when a string contains multiples languages) and maybe "mis" too ?

Jonas renamed this task from Mono-lingual text datatype should support "no linguistic content" and "undetermined language" to [Story] Mono-lingual text datatype should support "no linguistic content" and "undetermined language".Sep 10 2015, 7:39 PM
Jonas set Security to None.
adrianheine updated the task description. (Show Details)
adrianheine removed a project: patch-welcome.

Change 257316 had a related patch set uploaded (by Adrian Lang):
Allow special language codes in monolingual text values

https://gerrit.wikimedia.org/r/257316

Change 257316 merged by jenkins-bot:
Allow special language codes in monolingual text values

https://gerrit.wikimedia.org/r/257316

und, mis, mul and zxx are now available at Wikidata.org. They are not present in the suggester, but if you enter them in the input field and save, they are recognized.

Apparently, a fix around this did not get deployed, so the suggester might not even allow these four language codes yet. Should get better next week.