Page MenuHomePhabricator

Wrong parsing of centuries and millennia
Open, HighPublic

Description

Currently, entering "20. century" will result in the timestamp +00000002000-01-01T00:00:00Z with the precision set to "century"[1].
Similarly, entering "3. millennium" will result in the timestamp +00000003000-01-01T00:00:00Z with the precision set to "millennium"[2].

This is clearly wrong: the 20th century is the one from the start of 1901 to the end of 2000,
the 3rd millennium is the one from the start of 2001 to the end of 3000.

So, "20. century" should result in the timestamp +00000001901-01-01T00:00:00Z, with precision "century" (and before=0 and after=1 [3]), to accurately represent the century between the start of 2001 and the end of 2100.
Similarly, "3. millennium" should result in the timestamp +00000002001-01-01T00:00:00Z.

An alternative fix would be to set the before=1 and after=0 - but then, the timestamp would still be off by a year (the 20th century ended 2000-12-31T23:59:59, not 2000-01-01T00:00:00).

When fixing this, we should also investigate how many century/millennium dates we already have in the database. Many of these are likely to have the wrong timestamp.

[1] https://www.wikidata.org/w/index.php?title=Q4115189&diff=160572560&oldid=160505889
[2] https://www.wikidata.org/w/index.php?title=Q4115189&diff=160674691&oldid=160586967
[3] we actually set both before and after to 0 at the moment. That's bug 65253. Per default, after should be 1, otherwise the precision would be meaningless (as before and after are factors to be applied to the precision).


Version: unspecified
Severity: critical
Whiteboard: u=dev c=backend p=0

Details

Reference
bz71459

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:50 AM
bzimport set Reference to bz71459.
bzimport added a subscriber: Unknown Object (MLST).

oops, should be "to accurately represent the century between the start of 1901 and the end of 2000."

setting to "critical" since this behavior causes silent loss of information (the bad timestamp is only visible in the diff). The incorrect interval will only become apparent once we start to run queries against dates.

This is an issue of the back-end parser. Submitting "20. century" to the parser results in returning {"time":"+0000000000002000-00-00T00:00:00Z","timezone":0,"before":0,"after":0,"precision":7,"calendarmodel":"http://www.wikidata.org/entity/Q1985727"} as data value.

This will require substantial changes to the way such days are calculated.

The code is in:

Right now, the precision of dates is calculated by extracting only the significant digits. The insignificant digits are always 0. Turning "20. century" into 1901 instead of 2000 (and back again) will require additional special logic.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:33 PM

This was discussed at https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Years/Archive_11#When_do_centuries_and_millennia_begin.3F

It seems that most modern official and academic sources in English-speaking countries prefer the view that the 3rd millennium began January 1, 2001 (with the striking exception of the Oxford English Dictionary). But the popular position favors January 1, 2000. The English language is a consensus language, not defined by any government or official body. In a variety of documents, speeches, and celebrations around the change from the 2nd to 3rd millennium, both President Clinton and Pope John Paul II avoided taking any position about which year the new millennium began. Furthermore, Wikidata is a multi-lingual environment. So I suggest banning the words "century" and "millenium" and find some other words or phrases that will convey the concept of the interval that begins in years ending in two or three zeros.

Screenshot from 2017-10-18 14-21-57.png (183×325 px, 11 KB)

As an unsuspecting editor, in order to state "20. century" I typed "1900" and selected "century" without thinking further, and at first I was surprised to see "19. century" appear.

From the user's point of view, it is not a complex problem, and only the UI makes it complex.
If the UI was 1. Select "century" 2. Select the century from a dropdown list, then there would be zero chance for mistake.
The dropdown list would need "More..." buttons to go arbitrarily far in the past and future, so implementation sounds a bit complex, though.
Cheers!

Larske unsubscribed.
Larske subscribed.
Larske unsubscribed.
Larske subscribed.

An example reported today of consequence of this issue.
In Wikidata https://www.wikidata.org/wiki/Q336070 "6. century" is displayed but in French Wikipedia https://fr.wikipedia.org/wiki/M%C3%A9nandre_le_Protecteur on the infobox linked to Wikidata , "7. century" was displayed.

"600"/precision:century was stored in Wikidata but the common convention in French (and English) for such date is to considered it from 7th century. The issue does not only concern display in Wikidata but also reuse of the data in other environments. Beginning the century in 600 or 601 is a convention but maybe the choice could be leaded by data processings.

Another issue is the partial display of such date in Wikidata, where finding what is really stored is only accessible to power users. The person who reported the problem of wrong century in Wikipedia based on Wikidata, could not understand and even less fix it.

My bad, 600, stored in Wikidata, should be considered from 6th century by making centuries begin in x01.
But the issue is why is the last year of the century stored? When a date is specified only with a year, it's January the first which is stored, not December 31st. Furthemore the choice of this year could create issues on reuse as it happens on infoboxes in Wikipedia -FR,
For "6. century" having year 501 stored, instead of 600, should resolve the issue, whatever the choice of reuser to make centuries begin in 500 or 501.

Would it be difficult to display this as a range, like "some point in time between 1 January 2001 and 31 December 2100"? Or is there something in the way these are displayed which would make this technically impossible? (Doing this for dates with year precision, although not necessary, would be helpful in encouraging editors to increase the precision of the data they input.)