Page MenuHomePhabricator

We should support XML-valid delimiters for query string parameters
Closed, InvalidPublic

Description

I some situations, especially related to forms constructed in wikitext, or in strict XHTML markup, it is necessary to use escaped ampersands or semicolons to delimit query string parameters. (See http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2 for more info on this.) Currently the Wikimedia servers only accept non-escaped ampersands as delimiters for query string parameters.

Two possible solutions to this are:

  1. In our php.ini files, set arg_separator.input = ";&". This will allow people to use either ampersands or semicolons as delimiters for query string parameters (as recommended by the w3c).
  1. Replace line 50 of WebRequest.php with:

$requestPairs = $_POST + $_GET;
// Accomodate escaped ampersands as query string delimiters
foreach ( $requestPairs as $key => $val) {
if ( strncmp( $key, 'amp;', 4 ) === 0 ) {

		$goodKey = substr( $key, 4 );

} else {

		$goodKey = $key;

}
$this->data[$goodKey] = $val;
}

This will allow use of query string parameters separated by escaped ampersands.

The impetus for this bug is a problem in the DonationInterface extension related to forms that live in wikitext (thus forcing ampersands to be escaped). There are several workarounds available, but I thought I would propose a deeper solution to see what people thought.


Version: unspecified
Severity: enhancement

Details

Reference
bz25578

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:22 PM
bzimport set Reference to bz25578.
bzimport added a subscriber: Unknown Object (MLST).

As a workaround, using the hex values seem to work (at least in Javascript inside of wikitext). eg:
var url = 'http://foo.com/index.php?bar=baz&banana=sandwich';

can be written as:
var url = 'http://foo.com/index.php?bar=baz\x26banana=sandwich';

But still, the user-friendly solution would be to find a way to allow users to explicitly use '&' in query strings written in wikitext.

I don't understand what the problem is here. Can you give examples?

(In reply to comment #2)

I don't understand what the problem is here. Can you give examples?

For instance, on one of the WMF's landing pages for donations, we send users who wish to make a donation to a credit card form using a URL that has a query string with the ampersand character in it, something like:
https://payments.wikimedia.org/index.php/Special:PayflowProGateway?uselang=en&test=1

Because the landing page is written in a Wiki page, the '&' gets escaped to its htmlentity when the page is rendered, which makes the URL look like:
https://payments.wikimedia.org/index.php/Special:PayflowProGateway?uselang=en&test=1

This throws off the processing/interpreting of items in the query string because suddenly the parameter 'test' gets interpreted as 'amp;test'.

Sounds like you just gotta fix the link here to make sure you're doing it correctly. Find and fix the wiki page generating the link?

Changing query string processing to use ; as a separator could open up whole new exciting cans of worms such as breaking things that try to use, well, semicolons. :)

I recommend either resolving this bug INVALID or renaming it to "A particular wiki page has a double-escaped link to PayflowProGateway".

The URL isn't actually a link in this case. It's a javascript string that gets conditionally used as the form action. Obviously we could move this out of wikitext, but since these URLs get changed frequently, it's convenient to be able to edit it without a code deploy.

I agree the semicolon solution is potentially dangerous. Any thoughts on the 2nd proposed solution? Handling escaped ampersands in query strings is pretty common these days. This site (Wikimedia Bugzilla), for example, handles them fine, as do most other large websites I've worked for.

Since we have workarounds, I'm fine with WONTFIX or INVALID, but I'm wondering if there's any downside to the adding the proposed fix.

(In reply to comment #5)

The URL isn't actually a link in this case. It's a javascript string that gets
conditionally used as the form action. Obviously we could move this out of
wikitext, but since these URLs get changed frequently, it's convenient to be
able to edit it without a code deploy.

I'm still not entirely clear on why the URL needs to contain & , could you clarify that?

The URL should NOT contain & - rather it needs to have just a '&' and NOT have it escaped (which turns it into '&').

We have solved it by using '\x26' in the wikitext, which gets output as & rather than & in the source. Anyway, the point of this bug was not fixing the URL, but "Do we want to support escaped ampersands in query strings?" A lot of web applications do, a lot don't. If we don't, I'll close the bug. If we do, I'll check in the code above.

I'm gonna take the initiative and resolve INVALID (I really want a 'WONTIMPLEMENT' to supplement 'WONTFIX', which implies it's a known problem we won't fix, rather than a valid, but not desired in this case, option).

Messing with the options to accept semicolons as separators is possible, but has a bunch of problems:

  • would likely break things that submit freeform text in forms, since text can contain semicolons, so mixed-mode parsing gets confusing
  • the settings in php.ini aren't always editable
  • even when they are adjustable, PHP code can't change it as it's too late
  • it would be possible to reimplement the GET parameter extraction from scratch in WebRequest, but that doesn't seem worth the effort
  • more code paths and more URL variants just sounds icky

The sort of case where this originally came up -- hacky wikitext/HTML/JS mixture pages that use several layers of processing meant for other layers of processing to try to whip stuff together -- are good cases for more specialized systems of template generation that would reduce the need for manual intervention on the markup level.