Page MenuHomePhabricator

Datatype quantity does not recognize comma
Closed, ResolvedPublic

Description

Author: info

Description:
For a German wiki, numbers should be working like this: 1000,12 (onetousand decimal seperator twelfe)

On http://de.wiki-products.org/Audi_A1_1.6_TDI the value " Mix-Verbrauch (l/100km): 3,8" is entered. SWM seems only to recognize 3, omitting the places after the comma.

When browsing this through the semantic browser at http://de.wiki-products.org/Spezial:Browse/Audi-20A1-201.6-20TDI
I'm getting only "Mix-Verbrauch 3 l/100km", not the expected "3,8 l/100 km".

http://de.wiki-products.org/MediaWiki:Smw_kiloseparator is empty and http://de.wiki-products.org/MediaWiki:Smw_decseparator is set to ","

The property is set to http://de.wiki-products.org/Attribut:Mix-Verbrauch
[[Datentyp::Menge]]
[[Entspricht::1 l/100km]]

The English equivalent would be:
[[Has type::Quantity]]
[[Corresponds to::1 l/100km]]

http://de.wiki-products.org/Spezial:Version
SMW 1.6.1 SF 2.2.1 MW 1.17.0

There seems to be a problem in using the comma, that could either refer to SMW


Version: unspecified
Severity: critical

Details

Reference
bz32661

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:01 AM
bzimport set Reference to bz32661.
bzimport added a subscriber: Unknown Object (MLST).

info wrote:

or to the mediawiki-parser.

For the moment I have not assigned the importance to 'high' simply for the reason, that this is my first bug report and I didn't want to exaggerate. Anyway for wiki-products it means that the site is not working properly and that some of its major comparison charts do not work properly.

info wrote:

updated to SMW 1.7.0.2 SF 2.3.2 MW 1.18.1, problem persists.

Turns out, the "Mixverbrauch" attribute's value is shown well with a comma on the Factbox, see here http://de.wiki-products.org/Audi_A1_1.6_TDI So we can assume, it is stored as decimal value in the database.

But it is still not shown with a comma on the attributes page. http://de.wiki-products.org/Attribut:Mix-Verbrauch

And it is still not shown with a comma on the Special Browse page. http://de.wiki-products.org/Spezial:Browsen/Audi-20A1-201.6-20TDI

If we are using #ask as in http://de.wiki-products.org/index.php?title=Antitranspirante_im_Vergleich
the "UVP" Property should definetly have a comma, like "26,95 Euro" or so. But it doesn't. it only shows "26 Euro". And this is a major problem for the wiki, because it needs these exact comparisons.

  • Factbox (this one works)

Generating in SMW_factbox.php starts at line 77 and the value gets called through
$dataValue = SMWDataValueFactory::newDataItemValue( $dataItem, $propertyDi );

*Attributes Page (not working)
Generating in SMW_PropertyPage starts around line 169 and the value gets called through
$dv = SMWDataValueFactory::newDataItemValue( $di, $this->mProperty );

  • Special Browse (not working)

Generating in SMW_SpecialBrowse.php starts around line 144
$dv = SMWDataValueFactory::newDataItemValue( $di, $diProperty );

  • #ask (not working)

I wasn't able to track this down. At least for our purposes, this is the most important.

Thanks in advance

info wrote:

Hi,

following up the problem I noticed when looking into the mysql table smw_atts2, which stores the actual values for the attributes, that values for e.g. the attribute UVP (p_id:2351) get stored in two fields:

value_xsd in varbinary format (this contains the comma)
value_num in double format (here the comma is ommitted)

For me it would go to far to find a fix for this, but I'm guessing, that part of the problem, that some values get processed from value_xsd and some from value_num. This seems the explanation, why some parts of SWM include a comma in processing the results (Factbox) and others not (results of #ask query).

e.akerboom+bugzillamediawiki wrote:

Hi guys,

At the request of Francis I've looked into this problem, and found an actual bug and an actual solution.

In extensions/SemanticMediaWiki/includes/datavalues/SMW_DV_Number.php, line 67, change

static protected function parseNumberValue( $value, &$number, &$unit ) {

to

static public function parseNumberValue( $value, &$number, &$unit ) {

And then (here's the bug), in extensions/SemanticMediaWiki/includes/dataitems/SMW_DI_Number.php, from line 55 onwards, change

/**
 * Create a data item from the provided serialization string and type
 * ID.
 * @note PHP can convert any string to some number, so we do not do
 * validation here (because this would require less efficient parsing).
 * @return SMWDINumber
 */
public static function doUnserialize( $serialization ) {
    return new SMWDINumber( floatval( $serialization ) );
}

to

/**
 * Create a data item from the provided serialization string and type
 * ID.
 * @return SMWDINumber
 */
public static function doUnserialize( $serialization ) {
    if (SMWNumberValue::parseNumberValue( $serialization, $value, $unit ) == 0) {
        return new SMWDINumber( $value );
    }
    return new SMWDINumber( floatVal( $value ) );
}

As you can see, the problem here was the indiscriminatory 'floatVal' that was blindly applied regardless of internationalization settings. I've added 3 lines of code that prevents that, and... tadaa! The 'Attributen anzeigen' page now shows the correct number - no longer "3 l/100km" but "3.8 l/100km" (for the Audi A1 1.6 TDI example). Also note that I removed the snarky & incorrect comment from the function's heading ("we don't do any validation here because it's inefficient").

Can you please confirm this problem and this solution? Thanks.

All the best,
Ed

Edward, where you able to reproduce this issue on a wiki different then the one it was originally reported for? If so, how, and what software versions did it have? I still have not been able to reproduce this.

Although this change might fix the issue, it's not the correct place to do this. The DataItem classes just hold the data and have methods for (un)serialization. They do not contain any parsing or formatting code, and thus ought to be language independent. If your change fixed the problem, this suggests that something is passing incorrectly parsed values to the DataItems. That should be fixed.

e.akerboom+bugzillamediawiki wrote:

Hello Jeroen,

Nope, I was hired to fix the issue on the wiki-products site and (therefore) only looked at that. This code was called from the SMWCompatibilityHelpers class, if I'm not mistaken.

The issue here was that the localized value was stored in the database, in the value_xsd column. I don't think that the localized data should have ever ended up in the database, but that's another point entirely. In any case, the value_xsd column contained the value "3,8" and the value_num column contained the value "3". The value "3,8" was read from the db, and (incorrectly) unserialized to "3", and that value was passed on and ended up in all overview and comparison pages. Note that the value in "value_num" is also incorrect - the numerical value of "3,8" is not "3" but "3.8".

Anyway, if you want to reproduce, I think that's easy: take an existing SMW install, find a key with a numerical value, and put a localized string ("3,8" for example) in the 'value_xsd' column. Then go to the corresponding 'attributes' page, and you will (most likely, didn't test) see that the listed value is incorrect.

I'm not sure if that helps. I understand that the above fix might not be the 'correct' way to fix it, but I'm only superficially familiar with your code architecture. If there is a better way to fix it, please do. My guess would be that the 'real' bug is that the value "3,8" ended up in the database (instead of 3.8). I think this could be caused because the wiki-products site runs in two different locales (English and German) that have different and incompatible rules for how to format a number. Just my two cents. If you need more info, shoot.

Best,
Ed

This sounds to me like an upgrade issue. In current SMW, the database should always use "." in the database, whatever the surface language. The value is generated using PHP's strval() and recovered with floatval(); no localisation is involved. This is intentional and ensures that values are exchanged between PHP and SQL as simple and fast as possible, while more complex language-specific handling may come later (and only if needed). This also means that, unless PHP would for some reason localise the output of strval(), the smw_atts2 table should never contain a value in pretty printing (neither localised nor with thousands separators). Therefore, the "fix" suggested in Comment 4 is not required or desired in SMW, and the current behaviour is not a bug.

My suggestion is to refresh all stored data using Special:SMWAdmin to make sure that outdated values are fixed. Besides stale data from earlier SMW versions, another possible reason for "number-like" values with localised decimal separators to appear in the database is that the datatype of the respective properties has been changed at some point (e.g., to String). The value will then be stored verbatim, but using the same table (however, without using the numerical value_num cell as reported above). When changing the type back to Number, such stored values cannot be interpreted properly. This is normal and to be expected (it is more obvious for other types, such as Date). SMW automatically triggers update jobs that should fix all affected database records after a while, but how fast this is depends on the wiki's job queue.

SMW can only have one localisation setting for wiki text (just like MediaWiki), i.e., all pages should be in the same language. I am sorry that something got mixed up on your wiki, but as long as this cannot be reproduced anywhere else and analysis of code does not suggest any problem, I can only close this as worksforme.

e.akerboom+bugzillamediawiki wrote:

Unfortunately, I would have to disagree with you here. This was an actual problem for my customer and these four lines of code resolved it. I am not going to argue one side or the other here, I just hope you or someone else might find this small adjustment useful.

All the best,
Ed

(In reply to comment #8)

Unfortunately, I would have to disagree with you here. This was an actual
problem for my customer and these four lines of code resolved it. I am not
going to argue one side or the other here, I just hope you or someone else
might find this small adjustment useful.

Sure, I am not saying that your changes did not fix the symptoms of the problem, which no doubt has been useful for keeping the site running. I am saying that it is not a good idea to introduce code for working around corrupted database content. Instead, you should use the method I described to fix the database content, or, if this method fails, find out why your database has wrong entries in the first place.

e.akerboom+bugzillamediawiki wrote:

Ah, yes - ok. Well, the database is corrupted because it's a bilingual install and (therefore) requires more than one localisation setting, and -as you say- that's not supported by SMW. My guess is that this is the actual source of the problem. However, the only way to properly address that is to do structural architectural changes (to allow for more than one localisation setting) and I'm not sure if I'm up for that. Sorry?

(In reply to comment #10)

Ah, yes - ok. Well, the database is corrupted because it's a bilingual install
and (therefore) requires more than one localisation setting, and -as you say-
that's not supported by SMW. My guess is that this is the actual source of the
problem. However, the only way to properly address that is to do structural
architectural changes (to allow for more than one localisation setting) and I'm
not sure if I'm up for that. Sorry?

What I was trying to explain is that this is not related to localisation. The way in which SMW stores data does not depend on your localisation setting. That's why the presence of "," in database entries that represent numbers is always an error, which should correct itself when updating the data. If not, then there is some other issue that is the real source of your problem.

Also, I don't know what a "bilingual install" of MediaWiki is. As far as I know, MediaWiki supports only one content language at a time. If there is something like an official "bilingual" mode, then SMW should of course support it; if not, then there is nothing to support.

info wrote:

First of all thanks to everybody for replying. Thanks a lot to Edward for the provisional fix.

Markus, I think, with "bilingual" Edward in comment #9 meant simply, that we have an english install of MW at en.wiki-products.org and a German install of MW at de.wiki-products.org. But both run SMW. The latter (the German) has the problem with the comma values. Edward is however mistaken, when he assumes that these both version use the same database. They are two completely separated MW installs – one running with an English localisation and one with a German localisation.

Regarding Markus' comment #7
I have again run SMW database update and repair via the Mediawiki SMW Admin Panel and in table "swm_atts2", row "value_xsd" there are still values containing a comma. So it unfortunately seems, that outdated values didn't get fixed (as suggested by Markus).

In addition I have entered a new value containing a comma, which got written directly into the database. So if users enter a value containing a a comma as decimal seperator, the decimal seperator is not changed to whatever the database expects. In my database install, the "value_xsd" row is of a varbinary (255) type. For the record: the server runs with PHP 5.2.17 and
MySQL 5.0.77

Also I have created a new value of the datatype "Quantity" and created a new wikipage and entered "23,23" for that value. Again the comma appears in the database in the smw_atts2 table, value_xsd row. So we can assume, that the problem doesn't lie in a changed datatype at some point in the past, as also suggested in comment #7.
Of course I could replace all appearances of comma with a point directly in mysql, but mediawiki users are able to enter decimal comma values at any time.

So we are still at the state of Ewards Comment #6.

I hope that you understand, that under these circumstances I'm reopening this as a bug in the hope that a general solution and much better solution than "works for me" can be found.

info wrote:

As suggested by Markus in comment #4 I have replaced all occurences of commata with points (if headed and followed by a digit) directly using mysql in smw_attributes value_unit and smw_atts2 'value_xsd'. Let me know, if I should look for occurences in other tables/rows.

Also I have deleted setlocale( LC_ALL, "de_DE.UTF-8" ); from localsettings.php, so commata do not get written directly into the database any more. (I found that after testing again against a local install)

http://de.wiki-products.org/Attribut:Mix-Verbrauch and http://de.wiki-products.org/Spezial:Browse/Audi-20A1-201.6-20TDI
now shows the values with commata, which is good. (compare comment #2)

So I'm omitting Eds "worksforme" fix, because it leads into the wrong direction. (Anyway, thanks to Ed's commitment and Markus' comments, we were able to track this down)

I have further updated to the latest SMW version and run a MW Database update and a SMW database update.

What remains is a problem with sorting, but I'll ask the mailinglist first, before opening another ticket. Thanks a lot everybody for helping.