16:43 <DarTar> ori-l: SELECT event_targetTitle FROM GettingStarted_5243394 WHERE uuid = "51c6149554d85e77b665f303f28adf25";
16:43 <DarTar> Héctor Elizondo
Version: unspecified
Severity: major
16:43 <DarTar> ori-l: SELECT event_targetTitle FROM GettingStarted_5243394 WHERE uuid = "51c6149554d85e77b665f303f28adf25";
16:43 <DarTar> Héctor Elizondo
Version: unspecified
Severity: major
It turns out this is not a bug in json2sql but in the instrumentation of GettingStarted, updating the ticket accordingly.
I tested again with a page called 'Some contrivéd page name!'()*~' (no quotes).
The JSON is:
{"event":{"action":"gettingstarted-click","funnel":"gettingstarted","targetTitle":"Some contrivéd page name!'()*~","experimentId":"ob3","userId":1,"isNew":false},"isValid":true,"revision":5219269,"schema":"GettingStarted","webHost":"127.0.0.1","wiki":"testwiki"}
Note that the logging for GettingStarted is in E3Experiments. So if it were a client-side bug, it would probably be there.
For the record, the page in question above is https://en.wikipedia.org/wiki/H%C3%A9ctor_Elizondo
à is http://www.fileformat.info/info/unicode/char/00c3/index.htm
© is http://www.fileformat.info/info/unicode/char/00a9/index.htm
é (the correct one) is http://www.fileformat.info/info/unicode/char/00e9/index.htm
If you follow the last link, you will see the UTF-8 is:
UTF-8 (hex) 0xC3 0xA9 (c3a9)
So it looks like the UTF-8 bytes are being separated and projected out to UTF-16 (the format that site happens to use for the URL).
But for now, back to EventLogging.
Nope, it wasn't GettingStarted. Fixed in change I0f4ea76b911e572405bcfbde23be74d29f7fd783.
Adding a bit of documentation for future reference. If we run into unicode / URL issues in the future, we can try replacing all code points above the ascii range with unicode escape sequences:
function escapeChar( char ) {
var codePoint = '0000' + char.charCodeAt(0).toString(16); return "\\u" + codePoint.slice(-4);
}
function toSafeJSON( obj ) {
var json = $.toJSON( obj ); return json.replace( /[\u007f-\uffff]/g, escapeChar );
}
If this problem does crop up again, let's try to figure out the underlying cause before trying something like toSafeJSON.