Page MenuHomePhabricator

YAML: strings that are the same as boolean literals
Closed, ResolvedPublic

Description

Author: wgh

Description:
For example, http://en.wikipedia.org/w/api.php?action=query&titles=False&format=yamlfm

Try to parse it using pyyaml, and you'll get
{'query': {'pages': [{'ns': 0, 'pageid': 228749, 'title': False}]}}

That's because False is unquoted, and, according to specs,
http://www.yaml.org/spec/1.2/spec.html#id2803629
it's boolean literal, when it should be a string instead.

It also doesn't escape other boolean literals found in older 1.1 specs
http://yaml.org/type/bool.html


Version: unspecified
Severity: normal

Details

Reference
bz28586

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:30 PM
bzimport set Reference to bz28586.

This is really a spyc bug, and probably should be dealt with upstream (ie http://code.google.com/p/spyc/)

It does list that it supports the 1.0 spec, no mention of 1.1 or 2. The code does also say ""It currently supports a very limited subsection of the YAML spec."

Although in core we are using a base of 0.23, I just borrowed the 0.4.5-svn from Translate, and it makes some minor changes to the output

I know we run a condensed version in core due to security constraints etc (See r42547), we have also made a few minor bugfixes to it

Might be worth getting Tim (or looking what he did) to look over the newer version, and look at forward porting it in some capacity...

Other options include using a different library (seemingly only Symfonys YAML library in PHP), else, the rest are all PHP extensions...

As far as I know the security concerns related to the behavior of the parse function have long since been fixed. My experience with different yaml implementations is that they are all broken in different, annoying ways.

Seemingly symfony does this correctly

require_once( "./sfYamlDumper.php" );

$yaml = new sfYamlDumper();

$php = 'a:1:{s:5:"query";a:1:{s:5:"pages";a:1:{i:228749;a:3:{s:6:"pageid";i:228$

var_dump( $yaml->dump( unserialize( $php ) ) );

reedy@ubuntu64-esxi:~$ php test.php
string(75) "{ query: { pages: { 228749: { pageid: 228749, ns: 0, title: 'False' } } } }"

The simple fix is to remove YAML support. It's a really terrible format for computer-to-computer communications, because the spec is difficult to implement and there are lots of silly little cases like this one that need special handling. I noticed this bug in spyc last time I reviewed it, but I didn't bother reporting it because there were so many other similar bugs.

If we really have to have YAML support, I recommend writing our own formatter which uses a JSON-like subset of YAML and skips all the cute features which trip up parsers. It should stick to quoted strings and "flow collections", i.e. arrays delimited with brackets or braces. In fact if we move to YAML 1.2, we can just use FormatJson::encode(), since JSON is a subset of YAML 1.2.

Yup. No need to reopen for that

wgh wrote:

Reedy, I'm deeply sorry. Somehow selector below had "REOPENED" choice by default.