Page MenuHomePhabricator

Template misses its parameter when containing a link with "=" (equal sign) character in the URL, or HTML attributes
Open, LowPublic

Description

Templates using unnamed parameters break when those parameters contain:

  • An external link that has an equal sign (example: http://example.com/?p=1)
  • An HTML attribute (example: <span title="Tooltip">Text</span>)

This is because the parser interprets the equal sign as the separator between the parameter name and the value, taking precedence over those elements.

The current workaround is to use index-named parameters, changing {{template|param1|param2}} to {{template|1=param1|2=param2}}


Original report:

See http://test.wikipedia.org/wiki/Template:Warn/tests and its source. It appears that when the parameter contains a link with "=" character in its URL, the parmater is totally missed! However, if "1=" is added to the beginning of the parameter to make the template understand it, things are better.


Missing comments

All comments on this task are missing due to T284397. Comments before 2015 can be read on the old bugzilla instance

Details

Reference
bz14235

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:10 PM
bzimport set Reference to bz14235.
bzimport added a subscriber: Unknown Object (MLST).

This is an elementary lesson about templates: if one of your numbered parameters contains an =, you need to explicitly use 1= (or 2= or whatever). This is not a bug, because there's no real reason there couldn't be a variable called {{{your edit [http://localhost/index.php?diff}}} in that template; in fact, add that code to your template and you'll see that it'll actually expand to " here] is vandalism" just like it should.

Thanks Roan, but I think the whole point of this bug is that we should have a way to prevent this. For example, we can disallow parameter names to contain "http://" (etc).

With all due respect, I think you went to fast towards closing this bug; I'm reopening it, seeking more comments from other experienced devs.

Why should we? There's a workaround and it's easy. Recommending WONTFIX to your suggestion.

  • Bug 15069 has been marked as a duplicate of this bug. ***

admin wrote:

I agree with Roan. Saying that a parameter can never begin with "http://" may make using url addresses in implied numbered parameters more intuitive, but it would reduce the maximum functionality one could acheive with a template. The existing solution is not that difficult, as Roan explains, just not well known.

Honestly I can't think of much reason for a named parameter to have a name starting with "http://" :)

It is however potentially a bit open-ended with other protocols and various arbitrary text which might contain a url or HTML fragment at some point in it, however.

A sensible thing might be to just pick an actual format for what parameter names can be...

(In reply to comment #6)

Honestly I can't think of much reason for a named parameter to have a name
starting with "http://" :)

Not only start from 'http://' - URL with '=' in any place of an anonymous parameter will break it.

A sensible thing might be to just pick an actual format for what parameter
names can be...

As I proposed on IRC, certain chars or their combinations could be banned. '?' or '://' come into mind. Also, parameter name length could be limited to a sane number, say, 64.

Making a blacklist of things that a key cannot be is IMHO a extremely poor method of solving this issue.

Blacklisting things just slowly adds more cruft into the parser, testing for random things, which really shouldn't be tested for.

The reason this issue came up is use like so:

{{foo|http://www.example.com/?foo=bar}}
----Template:Foo----

{{bar|{{{1}}}}}

The issue here is NOT that "url syntax should not be considered a parameter name", trying to /solve/ the issue as if that were the problem is just bypassing the real issue with a shitty hack.

The real issue here is in this:
{{bar|{{{1}}}}}

Take this other use:
{{foo|1=foo=bar}}

This will expand to:
{{bar|foo=bar}}
Which will expand Template:Bar with bar in {{{foo}}}.

This is bad. You should not be able to specify the name of a parameter to pass inside of the template in this way. This is what leads to unexpected things happening, and leads to people using poor parser hacks that later cause regressions when people fix the issue.

The correct thing to fix here, is not blacklisting of possible keys. It is that the preprocessor should not expand variables like {{{1}}} before it searches for the =. That should be done AFTER. Just as it is with any strict language.

(In reply to comment #8)

This is bad. You should not be able to specify the name of a parameter to pass
inside of the template in this way. This is what leads to unexpected things
happening, and leads to people using poor parser hacks that later cause
regressions when people fix the issue.

The correct thing to fix here, is not blacklisting of possible keys. It is that
the preprocessor should not expand variables like {{{1}}} before it searches
for the =. That should be done AFTER. Just as it is with any strict language.

So you're basically suggesting to break all named parameters? Remember there are quite a few cases where {{template|foo=bar}} is quite legitimate. Or maybe we should stop skipping named parameters in the numbered parameters count? That would make {{template|foo=bar|baz}} behave like {{{foo}}}=bar, {{{1}}}=foo=bar and {{{2}}}=baz rather than {{{foo}}}=bar and {{{1}}}=baz (which is the current behavior).

ayg wrote:

(In reply to comment #8)

Making a blacklist of things that a key cannot be is IMHO a extremely poor
method of solving this issue.

However, requiring a specific format isn't *necessarily* a bad idea. For instance, we could prohibit all ASCII punctuation characters except whitespace/underscore/hyphen, like:

~`!@#$%^&*()+=[]\;',./{}|:"<>?

As well as newlines, if those aren't already verboten. This would be more consistent and comprehensible. We would then have no problems with URLs, because the bit before the = sign wouldn't be valid, and the same would probably be true for some other cases. Banning *just* URLs is a poor way to go about this, certainly.

Unfortunately, we're stuck with lousy MediaWiki error reporting here. In a real language this would be sensible because you would raise a syntax error if an invalid identifier were used. Here the template would just mysteriously not work, if for some reason someone uses a parameter with a weird name. It's kind of a lose-lose scenario. (So how about sandboxed Python again? :) )

(In reply to comment #9)

Or maybe
we should stop skipping named parameters in the numbered parameters count? That
would make {{template|foo=bar|baz}} behave like {{{foo}}}=bar, {{{1}}}=foo=bar
and {{{2}}}=baz rather than {{{foo}}}=bar and {{{1}}}=baz (which is the current
behavior).

In addition to breaking like all templates ever, this makes the ordering of named and unnamed parameters significant: {{template|foo=bar|baz}} is different from {{template|baz|foo=bar}}. I don't think that's expected. It does have a certain appeal, but it replaces an obscure gotcha with a very non-obscure one. Basically you couldn't mix numbered and named parameters in a template, which is a pretty major restriction compared to present.

(In reply to comment #8)

The correct thing to fix here, is not blacklisting of possible keys. It is that
the preprocessor should not expand variables like {{{1}}} before it searches
for the =. That should be done AFTER. Just as it is with any strict language.

What are you talking about? It *is* done after. You can't uncover a name/value separator in MW 1.12+.

(In reply to comment #9)

(In reply to comment #8)

This is bad. You should not be able to specify the name of a parameter to pass
inside of the template in this way. This is what leads to unexpected things
happening, and leads to people using poor parser hacks that later cause
regressions when people fix the issue.

The correct thing to fix here, is not blacklisting of possible keys. It is that
the preprocessor should not expand variables like {{{1}}} before it searches
for the =. That should be done AFTER. Just as it is with any strict language.

So you're basically suggesting to break all named parameters? Remember there
are quite a few cases where {{template|foo=bar}} is quite legitimate. Or maybe
we should stop skipping named parameters in the numbered parameters count? That
would make {{template|foo=bar|baz}} behave like {{{foo}}}=bar, {{{1}}}=foo=bar
and {{{2}}}=baz rather than {{{foo}}}=bar and {{{1}}}=baz (which is the current
behavior).

I think you're misreading what I meant.

My notes weren't for the use of:
{{Foo|http://example.com/?foo=bar}}
There you do explicitly know that there is a = inside of there and you are going to need to use a 1= to escape it.

Mine was about the use like:
{{Foo|1=http://example.com/?foo=bar}}
Where Foo contains:
{{Bar|{{{1}}}}}

However, doing some tests and it appears that the parser does actually act as I'd expect.

I read the bug and it appeared as if the reporter was saying that something inside of a template was being expanded wrong.

So since things aren't being expanded wrong as I expected, and the reporter actually appears merely discontent with the age old requirement of "if your parameter uses a =, prepend it with the param number" then I recommend WONTFIX as well.

(In reply to comment #10)

(In reply to comment #8)

Making a blacklist of things that a key cannot be is IMHO a extremely poor
method of solving this issue.

However, requiring a specific format isn't *necessarily* a bad idea. For
instance, we could prohibit all ASCII punctuation characters except
whitespace/underscore/hyphen, like:

~`!@#$%^&*()+=[]\;',./{}|:"<>?

As well as newlines, if those aren't already verboten. This would be more
consistent and comprehensible. We would then have no problems with URLs,
because the bit before the = sign wouldn't be valid, and the same would
probably be true for some other cases. Banning *just* URLs is a poor way to go
about this, certainly.

Unfortunately, we're stuck with lousy MediaWiki error reporting here. In a
real language this would be sensible because you would raise a syntax error if
an invalid identifier were used. Here the template would just mysteriously not
work, if for some reason someone uses a parameter with a weird name. It's kind
of a lose-lose scenario. (So how about sandboxed Python again? :) )

Blacklisting those characters is something you really need to be careful about. Believe it or not, but people DO name parameters using many of those characters. Some wiki prefer human readable parameters over key like parameters. Characters like &/!@+(), are used sometimes as they would be when writing, especially in infoboxes where it makes things a bit more readable.
{{Infobox/Character

| Name (English)   = 
| Name (Kanji)     = 
| Name (Romanji)   = 
| Birthdate        = 
| Species & Gender = 
| Birth/Death Date =

}}
That's just a very rough and fictional example. But it is highly possible that people are already relying on the fact that they can include characters like / and & in parameter names in the same way they would normal text. Same goes for punctuation characters.
And as for, : think about the wiki on things like C or another language where they may use : as a visual separator to create a bit of a hierarchy.

With those likely use cases I fail to see what characters we can blacklist to prevent urls from being treated as keys without causing a regression on some wiki that causes them to need to alter a number of their templates, and change the use of those templates on a large number of pages.

Actually, rather than restricting of the key. I'd like to see some way to escape the current numeric parameter without needing to know exactly what numeric parameter is next.
{{Foo|foo|bar|=Foo=Bar}}

ayg wrote:

(In reply to comment #12)

Blacklisting those characters is something you really need to be careful about.
Believe it or not, but people DO name parameters using many of those
characters. Some wiki prefer human readable parameters over key like
parameters. Characters like &/!@+(), are used sometimes as they would be when
writing, especially in infoboxes where it makes things a bit more readable.

Yeah, good point. *Probably* no one's using question marks, but that's a little narrow . . .

Actually, rather than restricting of the key. I'd like to see some way to
escape the current numeric parameter without needing to know exactly what
numeric parameter is next.
{{Foo|foo|bar|=Foo=Bar}}

That's a sensible idea. Then people could adopt the convention of using |= to delimit numbered parameters, instead of just |, and the problem wouldn't arise in the first place. If we weren't too concerned with BC, we might even enforce it at some point In the Distant Future.

For now, as a matter of template design, it would be an excellent idea to use only named parameters for things that might plausibly be URLs. That would entirely avoid the specific issue brought up in this bug.

(In reply to comment #13)

(In reply to comment #12)

Blacklisting those characters is something you really need to be careful about.
Believe it or not, but people DO name parameters using many of those
characters. Some wiki prefer human readable parameters over key like
parameters. Characters like &/!@+(), are used sometimes as they would be when
writing, especially in infoboxes where it makes things a bit more readable.

Yeah, good point. *Probably* no one's using question marks, but that's a
little narrow . . .

Perhaps, ;) unless you consider sentences. FWIW templates are sometimes used in say, surveys.
{{Survey/August 2008

| What type of bugs do you prefer claiming on bugzilla? = 

| Do you prefer using http://bugzilla.wikimedia.org or http://bugs.wikimedia.org to access the site? = 

| Have you ever reported a bug with yourself already assigned to it? = 

| If so, for what reason did you decide to do this? =

}}

^_^ Oh the evil things we do with WikiText.

Actually, rather than restricting of the key. I'd like to see some way to
escape the current numeric parameter without needing to know exactly what
numeric parameter is next.
{{Foo|foo|bar|=Foo=Bar}}

That's a sensible idea. Then people could adopt the convention of using |= to
delimit numbered parameters, instead of just |, and the problem wouldn't arise
in the first place. If we weren't too concerned with BC, we might even enforce
it at some point In the Distant Future.

For now, as a matter of template design, it would be an excellent idea to use
only named parameters for things that might plausibly be URLs. That would
entirely avoid the specific issue brought up in this bug.

Yes, named parameters are the best way to avoid any issues with the =.

The tricky thing is {{{}}} actually does contain the value of |=, I have no clue if anyone is actually making use of that.

Though, there are probably less people using {{{}}} than there are using {{{#}}} so it is probably the best option. And that idea of using = at the start to escape does fit in line with some other languages.

darklama wrote:

What about first checking for uses of {{{name}}} in templates and if a named parameter is passed which doesn't exist in the template it is treated as if it were a numbered argument?

Template:Foo:

{{{first}}} {{{1}}}}

So that given {{foo|first=John|bar=Doe}} it would make {{{first}}}=John and {{{1}}}=bar=Doe, since bar isn't a named parameter in the template.

ayg wrote:

That's more or less what Roan suggested in comment #9. See my response to that. If it's not clear, consider if Template:Foo were "{{{1}}} {{{first}}}" instead, and you use {{foo|first=John|bar=Doe}}. The parser reaches {{{1}}} and . . . what does it do? It doesn't necessarily know yet if there's going to be any {{{bar}}} or {{{first}}} later on. Even if it did, and I don't know if that's feasible, this would mean that parameter behavior would be very non-localized. Adding a new named parameter anywhere could change the interpretation of numeric parameters in an entirely different part of the template.

It might be doable, but it seems pretty scary.

Well...
Firstly, that requires complex parsing of the template beforehand.
Secondly, it causes an inconsistency in what happens with templates and makes things harder to debug.
Thirdly, there is nothing saying that an extension can't insert a variable call on it's own in a way we can't check for it's existence.

{{#default:foo|''default''}}
Could expand itself to something like:
{{#if:{{{foo|}}}|{{{foo}}}|''default''}}
Or even, it could just expand the first parameter on it's own and never let the {{{foo|}}} it tests show up in the page.
Just as a shorthand to make syntax easier.

Those kind of things can never be tested for. In fact, what about this
{{{ {{{1}}} }}}
It's perfectly valid syntax:
{{Foo|foo|foo=bar}}
From what I remember that should expand into {{{ foo }}} and then expand into 'bar'.

There is no feasible way to test if a variable appears inside of a template or not.

On the contrary, one of the things I'd like is for parserfunctions to be able to access the variable list of a parent template. That way a complex extension can make a number of useful things possible, and more efficient. That'll never happen if things get insanely limited in the way you note.

Disallowing parameter names starting with http:// would break compatibility with template hacks which check for the http://wiki.domain/w/index.php?title parameter to see if they are dealing with a page name or a diff link. It would be a better solution to allow templates to give an exhaustive list of the parameters they except (this will be done for a lot of templates anyway to support the new usability features), and unexpected template names could be handled as unnamed parameters.

*** Bug 5138 has been marked as a duplicate of this bug. ***

darklama wrote:

(In reply to comment #18)

... unexpected template [parameter] names could be handled as unnamed parameters.

This seems like a sane and simple solution to me.

Than given template A:

Hello {{{name}}} {{{1}}}!

Given template call:

{{A|name=MediaWiki|foobar=World}}

Render as:

Hello MediaWiki World!

M8R-cyc3n3 wrote:

remember that the offending use case involves a string meant to appear in "=",
something like:

{{A|http://domain.lol/some.cgi?time=money}}

so for this to be helpful where it matters the new template behavior must take
"foobar=World" in its entirety as the first unnamed parameter {{{1}}} if/when
the template contains no references to parameter {{{foobar}}}. so the output
would have to be:

Hello MediaWiki foobar=World!


one thing i noticed, wiki-links are special in this regard:

{{A|[[Foobar=World|Example link 1]]}}

here the entire link is interpreted as parameter {{{1}}}.

but external links are not:

{{A|[http://domain.lol/?Foobar=World Example link 2]}}

this input is interpreted as one parameter named

[http://domain.lol/?Foobar=

with value

World Example link 2]

seems like the easiest thing would be to make both cases "special" and (beyond
consideration as part of a parameter name).

M8R-cyc3n3 wrote:

(In reply to comment #21)

meant to appear in "=",

should have read:

meant to appear in spite of containing "="

darklama wrote:

(In reply to comment #21)

would have to be:

Hello MediaWiki foobar=World!

I didn't think that through enough. You are right.
Either behavior could be useful though. If
implemented the way I had suggested, using named
parameters that don't exist could be a used as
aliases for unnamed parameters without having to
include them directly. While your correction would
give the expected behavior.

seems like the easiest thing would be to make both
cases [internal and external links] "special".
(beyond consideration as part of a parameter name).

That could work too. I think = inside tags should be
a special case too then. Use of custom signatures
seems to be the common case where I have seen a
problem with this.

{{A|<span style="font-family:monospace">[[User:Foobar]]</span>}}

Implementing no parameter by that name, include the
whole thing as a unnamed parameter approach would
mean special cases would no longer be needed though.

(In reply to comment #21)

one thing i noticed, wiki-links are special in this regard:

{{A|[[Foobar=World|Example link 1]]}}

here the entire link is interpreted as parameter {{{1}}}.

but external links are not:

{{A|[http://domain.lol/?Foobar=World Example link 2]}}

this input is interpreted as one parameter named

[http://domain.lol/?Foobar=

with value

World Example link 2]

seems like the easiest thing would be to make both cases "special" and (beyond
consideration as part of a parameter name).

That's easily doable by adding a new preprocessor rule for one [.
However, it would change the AST of hundreds of templates, and still not fix the basic problem: {{A|http://domain.lol/?Foobar=World}}.

{{A|http://domain.lol/some.cgi?time=money}}

so for this to be helpful where it matters the new template behavior must take
"foobar=World" in its entirety as the first unnamed parameter {{{1}}} if/when
the template contains no references to parameter {{{foobar}}}.

That's the same as proposed in comment #15.

I'm sympathetic with the idea of a potential {{{ {{{1}}} }}} breaking change, since wikitext is not a programming language, and dereferencing goes quite too far.

An efficient list of all the of that would mean bumping the preprocessor version to additionally generate a list with used parameters, not only to avoid the O(n) search of parameters, but also so that expanding just a subnode is easier.

It'd be much easier to make numeric parameters indexes absolute (cf Roan comment #9) but the huge template breakage prevents it.

M8R-cyc3n3 wrote:

(In reply to comment #24)

That's easily doable by adding a new preprocessor rule for one [.
However, it would change the AST of hundreds of templates, and still not fix
the basic problem: {{A|http://domain.lol/?Foobar=World}}.

yeah but splitting a bare url like this one:

{{A|http://domain.lol/?Foo=Bar}} <!-- needs "1=" -->

or a page title like this one

{{A|Foo=Bar}} <!-- needs "1=" -->

...is still more expectable (i.e. less astonishing) than splitting a long-form
external link syntax like this one:

{{A|[http://domain.lol/?Foo=Bar http://domain.lol/?Foo=Bar]}}

but not splitting a long-form internal link syntax like this one:

{{A|[[Foo=Bar|Foo=Bar]]}}

the first examples must be interpreted as plain text so that links may be
formed of them inside the template (even though the first renders identically
to the third in isolation). however i see no reason not to consider both the
third and fourth as fully-formed links.

disallowing certain punctuation in parameter names is a great shoulda-thought-
about-that-earlier idea, but is independent of the disparity between internal-
and external-style links in the effective order of syntax precedence.

perhaps the latter issue ought to be raised on a separate ticket.

You may find it more logical taking into account images:
{{A| [[File:example.jpg|link=http://www.example.org|E=mc²]] }}

*** Bug 28252 has been marked as a duplicate of this bug. ***

I understand from comment 2 that this is a feature request, not a bug report. Altering severity.

AIUI, it's a request for the template feature to not to be buggy :-) If we cannot put a certain character in a template, we should admit that it's a shortcoming and (if it cannot be fixed) issue some kind of warning when editing templates. Dunno if "altering severities" does any good 3 years after the bug was reported.

happy.melon.wiki wrote:

People understand perfectly well what the equals character does in templaes when they are consciously thinking about it, in exactly the same way as they recognise that | and } are characters with special meaning in the same context. The fact that some people do not consciously think of the equals sign in {{t|foo=bar}} and {{t|http://eg.com/foo=bar}} as the same character with the same special meaning is a bug in their thought processes, not in our parser. It's certainly a shortcoming we can add a feature to work around, but it's an enhancement, not a bugfix.

(In reply to comment #30)

The fact that some people do not consciously think of the equals sign in
{{t|foo=bar}} and {{t|http://eg.com/foo=bar}} as the same character with the
same special meaning is a bug in their thought processes, not in our parser.

It depends on how you view the problem. A different approach is to think that the parser should be accommodated with the common though processes of users, so for example, if two = signs are found within the context of one parameter.

However, as the person reporting this bug originally, I think there's no definite solution to this problem. One might say we should modify the parser to not treat = signs as parameter assignment operators, if they are preceded by http:// (or other URI schemes), which is what I thought at the time of submitting this bug; another person may say this will disable us from having template parameters whose names start with http:// (although I can't think how this can be useful).

So, again, as the person who originally opened this bug report, I'm totally fine with Tim's alteration of severity, and I even think this can be marked as a WONTFIX again.

It's something to take into account if/when we do a graphical template inserter. We shouldn't make grammar for this case. I encourage instead template authors to use named parameters for urls, which avoids the unexpected outcome when using the shorthand.

{{foo|url=http://example.com/foo?bar=baz}}

(In reply to comment #31)

another person may say this will disable us from having
template parameters whose names start with http:// (although I can't think how
this can be useful).

See comment 18 for a use case (which is common on hu.wikipedia, for example).

Ricordisamoa, can you please elaborate why you wanted to close this one as WONTFIX?

pashev.igor wrote:

I think I hit this bug with such text:

{{foo| <span class="whatever">hello, world</span>}}

This is no synthetic case. If one has MathJax extention installed

{{foo|<code>bar</code>}}
will be replaced with
{{foo|<span class="tex2jax_ignore"><code>bar</code></span>}}

Ciencia_Al_Poder renamed this task from Template misses its parameter when containing a link with "=" character in the URL to Template misses its parameter when containing a link with "=" (equal sign) character in the URL, or HTML attributes.Jan 24 2015, 12:08 PM
Ciencia_Al_Poder updated the task description. (Show Details)
Ciencia_Al_Poder set Security to None.
Ciencia_Al_Poder removed a subscriber: Unknown Object (MLST).

I think this is a matter of parsing precedence.

External links could be parsed before it gets the parameters of the templates. Then, if the equal sign is inside an external link, simply don't treat it as a parameter separator. The same may apply to HTML attributes: If it's going to be interpreted as an HTML tag first, then it shouldn't treat it as a parameter separator.

Note that the parser already parse some things before template parameters, preventing them from being interpreted as such:

{{Warn|This text contains 
== A header ==
That is not being interpreted as a parameter separator
}}

The equal signs in the header aren't being interpreted as a parameter separator.

While, on the other hand, this one is still breaking the template because the header syntax isn't allowed inline, so the parser doesn't interpret it as such, and then the first equal sign is being interpreted as a parameter separator:

{{Warn|This text contains == A fake header == That breaks the template }}

For a complex document, like the reports we are generating, such a roundabout seems complicated. I believe that even if there is a roundabout, this is a bug, editing should be as simple as possible for users, and things like this do not make it easier. Thanks for the answer, and I'll see if I can try it out!