Page MenuHomePhabricator

MySQL syntax error in function tableNamesWithUseIndexOrJOIN when further tables are added. MySQL requires parentheses in FROM (table1,table2) if a JOIN follows
Closed, DeclinedPublic

Description

During my work using the hook "SpecialRecentChangesQuery" (code and detailed analysis see [1]) I found a reproducible problem which arise only under the following conditions:

  • if MySQL >= 5.0.12 AND
  • if the hook function for SpecialRecentChangesQuery adds table(s) to $table[].

Analysis:

The code in [1] modifies the Recent Changes main SQL statement to this

SELECT * FROM recentchanges FORCE INDEX (rc_timestamp),page LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20100211000000') AND rc_bot = '0' ORDER BY rc_timestamp DESC LIMIT 50

This throws an error Unknown column 'rc_id' in 'on clause' (localhost) (MySQL >= 5.0.12 due to new JOIN processing)

This ad-hoc modification works (parentheses added around the table names outside the JOIN)

SELECT * FROM (recentchanges FORCE INDEX (rc_timestamp),page) LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20100211000000') AND rc_bot = '0' ORDER BY rc_timestamp DESC LIMIT 50

I add an ad-hoc and very hacky - only experimental - patch, which corrects the problem in a certain case for [1]. The patch is not mentioned for SVN submisson.

Citing [2]: Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.

Citing [3]:
SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);

Previously, the SELECT was legal due to the implicit grouping of t1,t2 as (t1,t2). Now the JOIN takes precedence, so the operands for the ON clause are t2 and t3. Because t1.i1 is not a column in either of the operands, the result is an Unknown column 't1.i1' in 'on clause' error. To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for the ON clause are (t1,t2) and t3:

SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);

Alternatively, avoid the use of the comma operator and use JOIN instead:
SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);

[1] http://www.mediawiki.org/wiki/Extension:OnlyRecentRecentChanges
[2] MySQL Manual Join Processing Changes in MySQL 5.0.12
http://dev.mysql.com/doc/refman/5.0/en/join.html
[3] Bug #19053 MySQL Unknown column in 'on clause'
http://bugs.mysql.com/bug.php?id=19053


Version: 1.22.0
Severity: normal
URL: http://dev.mysql.com/doc/refman/5.0/en/join.html

Details

Reference
bz22613

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:04 PM
bzimport set Reference to bz22613.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 7157
ad-hoc and hackish patch - experimental and for showing a solution for a specific query - not to be committed to SVN

Attached:

  • CORRECTION ***

SELECT * FROM recentchanges FORCE INDEX (rc_timestamp),page LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20100211000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY rc_timestamp DESC LIMIT 50

This throws an error Unknown column 'rc_id' in 'on clause' (localhost) (MySQL

5.0.12 due to new JOIN processing)

This ad-hoc modification works (parentheses added around the table names
outside the JOIN)

SELECT * FROM (recentchanges FORCE INDEX (rc_timestamp),page) LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20100211000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY rc_timestamp DESC LIMIT 50

T. Gries: In case this is still an issue, willing to put that patch into Gerrit?

Adding "patch-reviewed" as it says "not to commit".

(In reply to comment #3)

T. Gries: In case this is still an issue, willing to put that patch into
Gerrit?

Adding "patch-reviewed" as it says "not to commit".

uh, this patch is old, from 2010.

Leave open, perhaps someone of the database experts can check my observations which are in detail explained here.

Hi. I started investigations and found, that the problem still exists.

The reason is that the MySQL JOIN syntax changed in MySQL 5.0.12 (!) see

Corresponding changes have never been done in $IP/includes/database/db.php or - I mean - in the MySQL driver

Suggested solution:

add parentheses around tables FROM (recentchanges .., page) in all database statements for MySQL.

this does NOT work:

SELECT rc_id,rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,ts_tags FROM recentchanges FORCE INDEX (rc_timestamp),page LEFT JOIN watchlist ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20130802000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY rc_timestamp DESC LIMIT 50

With the correct parentheses around -- see the FROM () -- it DOES work

SELECT rc_id,rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,ts_tags FROM (recentchanges FORCE INDEX (rc_timestamp),page) LEFT JOIN watchlist ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20130802000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY rc_timestamp DESC LIMIT 50

the above is generated by my extension as published in
http://www.mediawiki.org/wiki/Extension:OnlyRecentRecentChanges#Code

basically:

$dir = dirname( FILE );
$wgExtensionMessagesFiles['onlyrecentrecentchanges'] = $dir . '/OnlyRecentRecentChanges.i18n.php';
$wgHooks['SpecialRecentChangesQuery'][] = 'onSpecialRecentChangesQuery';

// see http://www.mediawiki.org/wiki/Manual:Hooks/SpecialRecentChangesQuery
function onSpecialRecentChangesQuery( &$conds, &$tables, &$join_conds, $opts, &$query_options = array(), &$select = array() ) {

if ( !in_array( 'page', $tables ) ) $tables[] = 'page';
$conds[] = 'page_latest=rc_this_oldid';
return true;

}

tl;dr:

Suggested solution:

Add parentheses around tables FROM (recentchanges .., page) in all database
statements for MySQL at the last stage before committing the query.

Source: http://dev.mysql.com/doc/refman/5.0/en/join.html

http://i.imgur.com/wVjBBqY.png

Previously, the comma operator (,) and JOIN both had the same precedence, so the join expression t1, t2 JOIN t3 was interpreted as ((t1, t2) JOIN t3). Now JOIN has higher precedence, so the expression is interpreted as (t1, (t2 JOIN t3)). This change affects statements that use an ON clause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.

Example:

CREATE TABLE t1 (i1 INT, j1 INT);
CREATE TABLE t2 (i2 INT, j2 INT);
CREATE TABLE t3 (i3 INT, j3 INT);
INSERT INTO t1 VALUES(1,1);
INSERT INTO t2 VALUES(1,1);
INSERT INTO t3 VALUES(1,1);
SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);

Previously, the SELECT was legal due to the implicit grouping of t1,t2 as (t1,t2). Now the JOIN takes precedence, so the operands for the ON clause are t2 and t3. Because t1.i1 is not a column in either of the operands, the result is an Unknown column 't1.i1' in 'on clause' error.

  • IMPORTANT

To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for the ON clause are (t1,t2) and t3:

SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);


Alternatively, avoid the use of the comma operator and use JOIN instead:

SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);

This change also applies to statements that mix the comma operator with INNER JOIN, CROSS JOIN, LEFT JOIN, and RIGHT JOIN, all of which now have higher precedence than the comma operator.

Source: http://dev.mysql.com/doc/refman/5.0/en/join.html

update to comment #6 https://bugzilla.wikimedia.org/show_bug.cgi?id=22613#c6

this does NOT work:

SELECT
rc_id,rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,ts_tags
FROM recentchanges FORCE INDEX (rc_timestamp),page LEFT JOIN watchlist
ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT
JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >=
'20130802000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY
rc_timestamp DESC LIMIT 50

With the correct parentheses around -- see the FROM () -- it DOES work

SELECT
rc_id,rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,ts_tags
FROM (recentchanges FORCE INDEX (rc_timestamp),page) LEFT JOIN watchlist
ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT
JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >=
'20130802000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY
rc_timestamp DESC LIMIT 50

The following also works. It is an alternative which does not require a core code change, i.e. this version does NOT require additional parentheses (I swapped the order: the additional table page ist listed as the first FROM table name):

SELECT
rc_id,rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,ts_tags
FROM ,page,recentchanges FORCE INDEX (rc_timestamp) LEFT JOIN watchlist
ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT
JOIN tag_summary ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >=
'20130802000000') AND rc_bot = '0' AND (page_latest=rc_this_oldid) ORDER BY
rc_timestamp DESC LIMIT 50

This is done by changing

function onSpecialRecentChangesQuery( &$conds, &$tables, &$join_conds, $opts, &$query_options = array(), &$select = array() ) {

if ( !in_array( 'page', $tables ) ) $tables[] = 'page';
$conds[] = 'page_latest=rc_this_oldid';
return true;

}

to

function onSpecialRecentChangesQuery( &$conds, &$tables, &$join_conds, $opts, &$query_options = array(), &$select = array() ) {

if ( !in_array( 'page', $tables ) ) array_unshift( $tables, 'page' );
$conds[] = 'page_latest=rc_this_oldid';
return true;

}

Problem for extension http://www.mediawiki.org/wiki/Extension:OnlyRecentRecentChanges#Code is solved, so I am closing this bug report, even when the general statement applies.