Page MenuHomePhabricator

Unescaped quote in YAML output
Closed, ResolvedPublic

Description

Author: patrick.sinclair

Description:
Quotes are not escaped properly in the YAML output, e.g.
http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=%27N_Sync&format=yamlfm

This breaks the YAML parser in Ruby:

require 'yaml'
require 'open-uri'
p YAML::load( open('http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=%27N_Sync&format=yamlfm') )


Version: unspecified
Severity: normal

Details

Reference
bz12120

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:56 PM
bzimport set Reference to bz12120.
bzimport added a subscriber: Unknown Object (MLST).

patrick.sinclair wrote:

I have also encountered the following pages that are causing errors to the Ruby YAML parser:

http://en.wikipedia.org/w/api.php?action=query&format=yamlfm&prop=info%7Crevisions%7Ccategories&titles=Lalo%20Schifrin&rvprop=timestamp%7Cids
http://en.wikipedia.org/w/api.php?action=query&format=yamlfm&prop=info%7Crevisions%7Ccategories&titles=Lisa%20Gerrard&rvprop=timestamp%7Cids

In both cases this seems to be an error with the formatting of the categories YAML (one has a line break, one has a ':' in the title).

willemo wrote:

I did take a look at this one, since I also figured out that titles containing ": " fails parsing in Ruby.

According to the YAML 1.0 specification (http://yaml.org/spec/history/2004-01-29/2004-01-29.html#id2569840) " #" and ": " (also string starting with "!!", "[" and some others) are forbidden in so-called 'plain style' scalar syntax.

When I take a look at ApiFormatYaml_spyc.php, function _dumpNode only supports plain style:

// It's mapped
$string = $spaces.$key.': '.$value."\n";

This is a too simplistic approach to render YAML, in some situations.

To solve this, the _dumpNode function needs to be extended with a kind of YAML escape algorithm when plain style is not possible.

I've committed a fix in r31927 which (hopefully, don't have a YAML parser handy) fixes this issue. Requesting api.php?action=query&prop=info&titles=Main_Page|Talk:Main_Page now results in what I hope is correct YAML (those with YAML parsers, please test!). Note the difference between:

title: Main Page

and:

title: |

Talk:Main Page

The entire YAML output of the sample request is at the end of this message for completeness's sake. The criteria I used are:

  • If the string contains newlines, use literal syntax (with the | character and all that) (was already present)
  • If the string starts with : or # use literal syntax
  • If the string starts with any of - ? , [ ] { } ! * & | > ' " % @ ` also use literal syntax
  • In all other situations, use plain syntax (folded if the string is longer than 40 characters)

YAML CODE STARTS HERE


query:

normalized: 
  - 
    from: Main_Page
    to: Main Page
  - 
    from: |
      Talk:Main_Page
    to: |
      Talk:Main Page
pages: 
  - 
    pageid: 54
    ns: 0
    title: Main Page
    touched: |
      2008-03-06T17:36:33Z
    lastrevid: 440
    counter: 86
    length: 76
  - 
    pageid: 12
    ns: 1
    title: |
      Talk:Main Page
    touched: |
      2008-03-11T15:09:07Z
    lastrevid: 448
    counter: 64
    length: 173

YAML CODE ENDS HERE

(In reply to comment #3)

  • If the string starts with : or # use literal syntax

That should be: "If the string *contains* : or #" (good catch, Loek)