Cite: 'Cite error ref too many keys' not generated if name chunk contains other than A–Z, a–z, 0–9
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Gadget850
	Nov 12 2012, 10:54 AM

Description

When a reference name is not enclosed in quotes and contains an invalid character, then it should trigger Cite error ref too many keys. Example:

But, if the the name fragment after the space contains (not just begins) any other character than A–Z, a–z, 0–9, then the error is not triggered. Example:

This should generate the error message, but instead it truncates the name at the space.

Version: unspecified
Severity: normal

Details

Reference: bz42040

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:00 AM

• bzimport added a project: Cite.

• bzimport set Reference to bz42040.

• bzimport added a subscriber: Unknown Object (MLST).

Gadget850 created this task.Nov 12 2012, 10:54 AM

Example at:

http://en.wikipedia.org/wiki/Help:Cite_errors/Cite_error_ref_too_many_keys#Bug

Izno moved this task from Unsorted backlog to Defect backlog on the Cite board.Aug 22 2016, 5:40 PM

I can confirm this is still an issue. I understand why it appears like a bug in the Cite extension. But it is an issue with MediaWiki's Parser, specifically with the way it processes attributes for so called "tag hooks". The <ref> tag is one of these.

This is the most minimal example I could come up with:

This shows the expected error message

<ref x>a</ref>

This does not

<ref !>a</ref>

I was able to track the issue down to the line https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/parser/Parser.php$4039, which calls Sanitizer::decodeTagAttributes(). This function is called with the expected string !, but returns an empty array. From this point onward, later code does not have any idea there was an invalid attribute in the tag.

There is a line that intentionally skips attributes with invalid characters, without reporting anything. This line was introduced just recently via https://gerrit.wikimedia.org/r/471363, in 2018. However, a closer look shows that the same issue existed before. The old code relied on a regular expression that skipped invalid characters, creating the exact same result.

I'm really not sure if there is anything we can do about this. I believe it would be wrong to add the ! to the list of attributes. This will most certainly create a lot of new issues in code consuming such a set of attributes. Another idea is to let the Parser report such invalid character sequences. However, this would be a breaking change in the Parser. Historically, there was no such thing as "invalid" wikitext.

Aklapper removed subscribers: Anomie, • wikibugs-l-list.Oct 16 2020, 5:02 PM

Cite: 'Cite error ref too many keys' not generated if name chunk contains other than A–Z, a–z, 0–9Open, MediumPublicActions

Description

Details

Event Timeline

Cite: 'Cite error ref too many keys' not generated if name chunk contains other than A–Z, a–z, 0–9
Open, MediumPublic
Actions