Page MenuHomePhabricator

security review of WikibaseQuery
Closed, InvalidPublic

Description

We want to deploy simple query functionality for Wikidata. For this a security review of WikibaseQuery is needed. The code is at https://github.com/wmde/WikibaseQuery.


Version: master
Severity: normal
Whiteboard: u=dev c=backend p=0

Details

Reference
bz67533

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:35 AM
bzimport set Reference to bz67533.

There is a very serious problem with offset parameter. "offset" parameter means not key/value offset, but rather row number offset. Because of that it is possible to create queries that creates full-index-scans using large offset values.

Current state-of-art is usage of values anchor, not pages (even in mediawiki API). It may be entity id anchor or claim id anchor.

See also:

  • details of doModifyLimitQuery implementation for MySQL:

http://www.doctrine-project.org/api/dbal/2.1/source-class-Doctrine.DBAL.Platforms.MsSqlPlatform.html#_doModifyLimitQuery

As far as I can tell, the offset parameter is limited to 50, and can thus not cause full index scans. Is that wrong?

If we would want to allow further pagination, using a continuation parameter would indeed be a much better approach.

Sorry, i didn't notice the limit in API declaration. In this case... well, it's just unusable from my point of view. But it is not a security concern, of course :-)

I hope changed limit value can't be passed to SimpleQuery::getResult, because there is no max value check.

Hi guys, can you explain the reasoning for using doctrine's DBAL and Symphony's console, instead of the standard MediaWiki classes? Reviewing those (~80 kloc) is going to take some time, and so far, I haven't seen anything you're doing with them that you can't do with the MediaWiki classes.

The MediaWiki code is not reusable - it's bound to the rest of the MediaWiki framework. Both the code itself and the things it's bound to have serious design issues, little test coverage and low quality overall.

Those issues are not present in Doctrine DBAL or Symfony Console. Furthermore, both these components are relied on by literally thousands of others, including some of the most popular tools and frameworks in the PHP world. This means that their code is looked at by more developers, and used by more users, then the equivalent MediaWiki code.

Given that, I'm not sure it makes sense to do a real security review of these components. Is WMF doing security reviews of other tools it uses, such as Lucene? Of course it's always better to do a review then not, yet there are limited resources. So does it really make sense to spend them on this?

(In reply to Jeroen De Dauw from comment #5)

Given that, I'm not sure it makes sense to do a real security review of
these components. Is WMF doing security reviews of other tools it uses, such
as Lucene? Of course it's always better to do a review then not, yet there
are limited resources. So does it really make sense to spend them on this?

For code included in MediaWiki, we do. Lucene we can segment on a different server / network, so the attack surface and risk from exploitation is lower. That being said, I did review several pieces of our Hadoop infrastructure, and we generally want to make sure the organizations backing the components we use have security programs.

(In reply to Chris Steipp from comment #6)

(In reply to Jeroen De Dauw from comment #5)

Given that, I'm not sure it makes sense to do a real security review of
these components. Is WMF doing security reviews of other tools it uses, such
as Lucene? Of course it's always better to do a review then not, yet there
are limited resources. So does it really make sense to spend them on this?

For code included in MediaWiki, we do. Lucene we can segment on a different
server / network, so the attack surface and risk from exploitation is lower.
That being said, I did review several pieces of our Hadoop infrastructure,
and we generally want to make sure the organizations backing the components
we use have security programs.

I just had this conversation with Daniel and Katie. My take is that we're extra super paranoid about PHP code that runs in MediaWiki. We're less paranoid about other code more out of habit and bandwidth issues then any other reason. I'm of the opinion that the foundation has a right to be as paranoid about code running on the cluster as it pleases. Personally, I think that if we sat down and thought really hard about how paranoid we should be we'd either continue doing exactly what we do now or hire another Chris or two and be more paranoid.

Regardless, I'm pretty sure we're not going to change our minds about security review by debating in a bug. So the issue stands - this pulls in a few external libraries that would also require review. Our options are to review them, make them optional and not use them on the cluster, or rip them out entirely. If there are more options please reply with them.

I think we've talked on other bugs (ping bd808) about symfony console so it might be worth reviewing it and using it is more places. Doctrine DBAL I'm personally less excited about because I'm prejudiced against DB abstraction layers. I hate the one in MediaWiki. I hate Hibernate. I loath JPA. JPQL is evil. I could go on but I don't want to relive the horrors.

Regardless, I'm pretty sure we're not going to change our minds about security review by debating in a bug.

Not sure debate is happening. I never even asked to change the relevant policies, only that one would consider if doing so in this case makes sense or not. That has been done now, giving us more clarity on what the next steps can be.

Our options are to review them, make them optional and not use them on the cluster, or rip them out entirely.

Creating replacement code will result in more maintenance cost, less nice architecture and also has an initial cost. And it will leave us with some self writien db abstraction code, which I really rather avoid for security reasons. Hence the review option is the only really logical one from my point of view.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:31 PM
Lydia_Pintscher lowered the priority of this task from Unbreak Now! to Low.Feb 19 2015, 11:49 AM
Lydia_Pintscher set Security to None.

We decided to not work on this given that the SPARQL endpoint seems to be working well and covering most usecases.