Page MenuHomePhabricator

Validate data structure
Closed, ResolvedPublic

Description

In the present version of Wikidata it is possible to add all kinds of stuff into an item by creating a valid JSON structure and saving it. This could create serious problems with data integrity and consistency.

There are several ways to clean up the structure, and one of the simplest is perhaps to traverse the structure and flag those branches and leafs that can't be somehow validated as legal. Note that the focus should be on letting a legal structure pass after successful tests, and not on failing an illegal one.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431
https://bugzilla.wikimedia.org/show_bug.cgi?id=38234

Details

Reference
bz36519

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:23 AM
bzimport set Reference to bz36519.
bzimport added a subscriber: Unknown Object (MLST).

This is only true for the wbsetitem module, right? All other modules are safe?

It will only happen if a module (or whatever else) can create an object that is later unserialized. In the module wbsetitem this is done regularly, but will be removed at some point in the future. If its not removed and/or some other module (or special page) allow unserialization (typically because it allow bulk upload to be unserialized) it will be necessary to have a cleanup and/or validation routine.

Do you mind if I rename the bug then to "Validate data structure"? As far as I understand it, this would be sufficient to perform such a validation before actually commiting a save?

This ties in with the isValid method I proposed at the secondary storage consistency bug. Need to be added to Content class.

This is right. I rename this bug and make it at the same time a dependency for the Bug #36431 the one about the secondary storage.

Seems to be more appropriate as it is really more about data integrity than data security. Unless unserialize is allowed to do some strange things, or that interpretation of the data structure do some strange things.

The module wbsetitem is changed in https://gerrit.wikimedia.org/r/#/c/14762/ so it is somewhat more difficult to create weird structures. The item itself should although validate its internal structure before save, even if all API modules now tries to be more strict on whats goes into the item.

Add a test for trying not well-defined inputs.

Fixed in the new tests for the API.