Page MenuHomePhabricator

review backend part of entity suggester (php code)
Closed, ResolvedPublic

Description

The entity suggester is getting ready for a first deployment. The code at https://github.com/Wikidata-lib/PropertySuggester/pull/44 needs review.


Version: master
Severity: major
Whiteboard: u=dev c=backend p=8 s=2014-05-20

Details

Reference
bz63224

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 2:54 AM
bzimport set Reference to bz63224.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to Thiemo Mättig from comment #1)

Note: There is also Python code to review at
https://github.com/Wikidata-lib/PropertySuggester-Python

Review of the python code is bug 63368.

Review is done from Wikidata-Team side.
Pending WMF review now.

In general, please follow the style guide (unless Wikibase has it's own)

The build directory shouldn't be web accessible-- is this meant to be removed in production? Otherwise, add an htaccess rule for it.

The code uses PropertyId::newFromNumber, which looks like isn't supposed to be used.

./PropertySuggesterHooks.php

  • It seems like the JS doesn't need to be added on every page view

./maintenance/UpdateTable.php

  • Not a security issue, but using do { } while would make the code more readable.

./src/PropertySuggester/Suggesters/SimpleSuggester.php
Line 75 - please escape $minProbability
Please add profiling

./src/PropertySuggester/ResultBuilder.php
Nitpick, but the "createJSON" function returns an array. Naming consistency makes reviewing easier.

./src/PropertySuggester/GetSuggestions.php
The performance of using continue + limit as your db limit seems like it may cause issues. It's efficient, but running profiling will tell if it's a serious issue.

./src/PropertySuggester/UpdateTable/Importer/MySQLImporter.php

  • For production, imports need to be broken up into chunks. Talk with Sean, but Asher used to recommend importing in blocks of about 10k to avoid overloading the slaves.

./src/PropertySuggester/SuggesterParamsParser.php
Line 48 - test is_null first

./PropertySuggester.php
Please don't require_once /vendor/autoload.php

moritz.finke wrote:

Thanks for the review - by now we should have adressed all of the important issues. For more details you might want to have a look at this issue
https://github.com/Wikidata-lib/PropertySuggester/issues/70

Moritz, please keep discussion here so we have the conversation all in one place.

(In reply to Chris Steipp from comment #5)

In general, please follow the style guide (unless Wikibase has it's own)

This wasn't copied to https://github.com/Wikidata-lib/PropertySuggester/issues/70, and I'm guessing wasn't addressed.

The build directory shouldn't be web accessible-- is this meant to be
removed in production? Otherwise, add an htaccess rule for it.

Fixed.

The code uses PropertyId::newFromNumber, which looks like isn't supposed to
be used.

Dacry said, "PropertyId::newFromNumber is not supposed to be used, because usually one shouldn't care about (numeric) ids but we naturally do - so i think we agreed that it should be ok in our special case ..."

Are wikidata folks ok with that? I want to make sure the Wikibase interface doesn't change in a way that will break this, if no one is supposed to be using that function.

./PropertySuggesterHooks.php

  • It seems like the JS doesn't need to be added on every page view

Fixed (I'll assume that's the right way in wikibase to do that)

./maintenance/UpdateTable.php

  • Not a security issue, but using do { } while would make the code more

readable.

Fixed.

./src/PropertySuggester/Suggesters/SimpleSuggester.php
Line 75 - please escape $minProbability

xchrdw said "minProbability is checked to be float in line 56", which isn't the point. Please follow [[Security_for_developers#Demonstrable_security]] and escape as close to the output as possible. Additionally, is_float may not be sufficient (I honestly don't have time to try and exploit your implementation).

Please add profiling

fixed

./src/PropertySuggester/ResultBuilder.php
Nitpick, but the "createJSON" function returns an array. Naming consistency
makes reviewing easier.

fixed

./src/PropertySuggester/GetSuggestions.php
The performance of using continue + limit as your db limit seems like it may
cause issues. It's efficient, but running profiling will tell if it's a
serious issue.

Dacry said, "I do not really see issues caused by using continue + limit as your db limit. We did already do some profiling and performance seems to be fine"

Can you point add or point to your profiling data?

./src/PropertySuggester/UpdateTable/Importer/MySQLImporter.php

  • For production, imports need to be broken up into chunks. Talk with Sean,

but Asher used to recommend importing in blocks of about 10k to avoid
overloading the slaves.

xchrdw said, "the basic importer uses chunks and is the default importer now".

That's fine. Are the deployment issues documented anywhere, so that whoever at the wmf/wmde who is deploying this will know not to override the default?

./src/PropertySuggester/SuggesterParamsParser.php
Line 48 - test is_null first

Fixed

./PropertySuggester.php
Please don't require_once /vendor/autoload.php

Dacry said, "requiring autoload.php ist needed for testing purposes and it is only included if an autoloader exists. Is the way of doing it problematic / does a better way exist? ;)"

If it's only needed for testing, can you also check that PHP_SAPI == 'cli'? We should minimize places where an exploit that allows creating a new file can be turned into remote code execution.

moritz.finke wrote:

Wikibase has it's own style guide and we followed it

The usages of PropertyId::newFromNumber are fine, expect for the one I just pointed out here: https://github.com/Wikidata-lib/PropertySuggester/pull/44

./PropertySuggester.php
Please don't require_once /vendor/autoload.php

If it's only needed for testing, can you also check that PHP_SAPI == 'cli'?

Good point here. You indeed do not need to have this in the enty point file at all. Including it in the test bootstarp is better. Example:

https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/bootstrap.php

christian.dullweber wrote:

I'm sorry we forgot to mention the autoloader include is also required when the extension is not installed via composer as we need to include the generated psr-4 loader and the classmap.

But e.g. WikibaseLib.php or ValueView.php do this the same way.

I also think that someone who is able to create files in a directory could just delete existing files and replace them?

when the extension is not installed via composer

Is that something we actually need to support? We do not support it for Wikibase itself...

(In reply to Jeroen De Dauw from comment #12)

when the extension is not installed via composer

Is that something we actually need to support? We do not support it for
Wikibase itself...

IMHO not. The extension isn't useful without Wikibase at this point.

christian.dullweber wrote:

So the solution is to just remove the file_exists condition?

All other issues should be fixed now. Changes are here: https://github.com/Wikidata-lib/PropertySuggester/compare/2c409e676...269a1734a

(In reply to Christian Dullweber from comment #14)

So the solution is to just remove the file_exists condition?

All other issues should be fixed now. Changes are here:
https://github.com/Wikidata-lib/PropertySuggester/compare/2c409e676...
269a1734a

Regarding the escaping in SimpleSuggester.php, can you use addQuotes()? The threat is an attacker finds a string that satisfies is_float criteria, but allows adding extra commands when it's cast back to a string. Scientific notation is handled correctly, but I'm not sure if php accepts other formats that might include a space, and strencode won't help in that situation.

christian.dullweber wrote:

I tried to use addQuotes() but it didn't work with sqlite. shouldn't is_float check the variable to be a float and not a string that looks like a float? I could add an additional cast to float as safety?

(In reply to Christian Dullweber from comment #16)

I tried to use addQuotes() but it didn't work with sqlite.

addQuotes to what? Field names? This can't work in SQLite. addQuotes is for values, not for identifiers. There are other methods like addIdentifierQuotes that may be more suitable.

shouldn't is_float check the variable to be a float and not a string that
looks like a float?

Yes, it does. Chris seems to confuse this with is_numeric. To be sure you can always add an extra floatval( $var ) or (float)$var to the places where the variable is used inside of a string, especially if it's a possible SQL injection.

(In reply to Chris Steipp from comment #15)

I'm not sure if php accepts other formats that might include a space

Simple answer: No. http://php.net/language.types.float.php

christian.dullweber wrote:

(In reply to Thiemo Mättig from comment #17)

addQuotes to what? Field names? This can't work in SQLite. addQuotes is for
values, not for identifiers. There are other methods like
addIdentifierQuotes that may be more suitable.

I added addQuotes to $minProbability in our HAVING clause:
'HAVING' => "prob > " . $dbr->addQuotes($minProbability)

I will change it floatval to put the escaping as near to the statement as possible:
'HAVING' => "prob > " . floatval($minProbability)

christian.dullweber wrote:

I fixed the autoloader include and changed the way $minProbability is escaped:
https://github.com/Wikidata-lib/PropertySuggester/pull/72

christian.dullweber wrote:

(In reply to Christian Dullweber from comment #19)

I fixed the autoloader include and changed the way $minProbability is
escaped:
https://github.com/Wikidata-lib/PropertySuggester/pull/72

oops, wrong url:
https://github.com/Wikidata-lib/PropertySuggester/pull/76

(is it impossible to delete/edit a comment?)

(In reply to Thiemo Mättig from comment #17)

Yes, it does. Chris seems to confuse this with is_numeric. To be sure you
can always add an extra floatval( $var ) or (float)$var to the places where
the variable is used inside of a string, especially if it's a possible SQL
injection.

Not confusion, but concern that it might suffer from the same issues.

(In reply to Chris Steipp from comment #15)

I'm not sure if php accepts other formats that might include a space

Simple answer: No. http://php.net/language.types.float.php

Thanks for the link, that does clarify my concern was invalid. So yes, floatval looks like it should be fine.