Page MenuHomePhabricator

compressOld.php does not work on Postgres (gzip binary causes syntax error)
Open, LowPublic

Description

Author: overlordq

Description:
Escaping the Gzip'd text is broken when running CompressOld. It's not properly escaping apostrophes for some reason so it generates a syntax error.


Version: 1.17.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=30987

Details

Reference
bz24607

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:10 PM
bzimport set Reference to bz24607.
bzimport added a subscriber: Unknown Object (MLST).

overlordq wrote:

php maintenance/storage/compressOld.php

Compressing database wikidb

Starting from 0 of 8301
1 Main_Page .xxx/
2 Help:Assigning_permissions
[...]
150 MediaWiki:Common.js/watchlist.js ..PHP Warning: pg_query(): Query failed: ERROR: syntax error at or near "NPwF2"
LINE 1: ...N!Ć{J©ZRMUtZvkyп6}=b̛}vt&89꯺\t]\q.t;''z'NPwF23?=...

^ in /var/www/com/w/includes/db/DatabasePostgres.php on line 607

Warning: pg_query(): Query failed: ERROR: syntax error at or near "NPwF2"
LINE 1: ...N!Ć{J©ZRMUtZvkyп6}=b̛}vt&89꯺\t]\q.t;''z'NPwF23?=...

^ in /var/www/com/w/includes/db/DatabasePostgres.php on line 607

A database error has occurred
Query: UPDATE pagecontent SET old_text = 'O:27:"ConcatenatedGzipHistoryBlob":4:{s:8:"mVersion";i:0;s:11:"mCompressed";b:1;s:6:"mItems";s:1001:"▒V▒o▒H▒▒_1▒N▒!Ć▒▒▒▒{J©▒ZR▒MU▒tZv▒▒▒ky▒▒п▒▒6▒▒▒}=▒▒b▒̛}▒vt▒&89꯺\t▒]▒\▒▒▒▒q▒▒▒▒.▒t▒;▒▒''▒z▒▒'▒NP▒▒wF▒2▒3▒▒?▒=▒▒3▒▒▒▒Z▒o6▒▒$▒Ϭ▒ʀ▒0g▒Oci▒Q▒▒▒_^5▒▒',old_flags = 'object,utf-8' WHERE old_id = '151'
Function: Database::update
Error: 1 ERROR: syntax error at or near "NPwF2"
LINE 1: ...N!Ć{J©ZRMUtZvkyп6}=b̛}vt&89꯺\t]\q.t;''z'NPwF23?=...

^

Occurs with r107680 and PostgreSQL 8.4.4 as well:

[tim@passepartout /var/www/html/w/maintenance]$ php storage/compressOld.php
Compressing database tim
Starting from 0 of 68
1 Main_Page
[...]
10 Template:Mapsources .....PHP Warning: pg_query(): Query failed: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0xecbd5b
HINT: Dieser Fehler kann auch auftreten, wenn die Bytesequenz nicht mit der Kodierung übereinstimmt, die der Server erwartet, welche durch »client_encoding« bestimmt wird. in /var/www/html/w/includes/db/DatabasePostgres.php on line 254
A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE "timipedia"."pagecontent" SET old_text = 'O:27:"ConcatenatedGzipHistoryBlob":4:{s:8:"mVersion";i:0;s:11:"mCompressed";b:1;s:6:"mItems";s:73995:"��[s� �޿c����x���j5u���n#���
X�2�օ*Q������/��p6N��y�s6bw6���?���?�L��X7�%Y�ձ;�����D"��L$2i}����Wj�GkՕ�^mu��N�V�Uk�:�N��Q�=���������������*����1�׸��%g�Gzdlyd��q-B�]B�Am[3�޹�TFtF/▒�3�i��$��U�s�2�.ՇĆ�����shGC���h&wOu�5�_��Գ����7g��HT�r��I]�K�}W,��݋�vAT�r�<r-U�t�s�#�ݱΔG=�t�\�b�����G?}���c��.m����Λ�����n����İ�dC���.@Ƃ:T��3�u]��1�uk��`GN��C�s���S�ڗ�[��%���ן�h�',old_flags = 'object,utf-8' WHERE old_id = '11'
Function: DatabaseBase::update
Error: 1 FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0xecbd5b
HINT: Dieser Fehler kann auch auftreten, wenn die Bytesequenz nicht mit der Kodierung übereinstimmt, die der Server erwartet, welche durch »client_encoding« bestimmt wird.
[tim@passepartout /var/www/html/w/maintenance]$

The "culprit" in the background is "pagecontent.old_text" with the type "text". I don't see how binary data can be stored there without escaping:

tim=# CREATE TEMPORARY TABLE tmpTest (t TEXT);
CREATE TABLE
tim=# INSERT INTO tmpTest (t) VALUES (E'\0');
ERROR: invalid byte sequence for encoding "UTF8": 0x00
TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
tim=# INSERT INTO tmpTest (t) VALUES (encode(E'\\000'::BYTEA, 'escape'));
INSERT 0 1
tim=#

If it were just for compressing old pages, I'd suggest leaving this problem to PostgreSQL which is much better at that while not bothering the user. But this can also occur with serialized objects in compressWithConcat ().

Instead of trying to mimic bad habits here, I would refer PostgreSQL users to External Storage and close up compressOld.php for PostgreSQL databases without External Storage so that it doesn't try to store binary data in text attributes.
Jdforrester-WMF subscribed.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

Krinkle renamed this task from compressOld does not run on Postgres to compressOld.php does not run on Postgres.Jul 29 2018, 12:16 AM
Krinkle renamed this task from compressOld.php does not run on Postgres to compressOld.php does not work on Postgres (gzip binary causes syntax error).
Krinkle added subscribers: bzimport, Bawolff, HappyDog.