Page MenuHomePhabricator

Database layer should automagically add GROUP BY columns on backends that need them (postgres)
Closed, DeclinedPublic

Description

Postgres is pedantic when it comes to GROUP BY queries and demands that all selected columns that aren't used with an aggregate function be in the GROUP BY clause, even if the GROUP BY already covers a unique index (meaning grouping by more columns would have no effect).

MySQL is more flexible, causing developers to write queries that fail on Postgres for this reason. Instead of developers having to go through a lot of trouble to construct GROUP BY clauses with the right columns with the ORDER BY columns at the beginning (r68100), the database layer should do this for them.

Because the semantics of GROUP BY described here are actually in the SQL standard IIRC, there's likely to be more DB backends, so I think this should be implemented in DatabaseBase::makeSelectOptions() with DB backends needing this functionality overriding needsAllColumnsInGroupBy() or whatever to return true.


Version: 1.17.x
Severity: enhancement

Details

Reference
bz26273

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:18 PM
bzimport set Reference to bz26273.
bzimport added a subscriber: Unknown Object (MLST).

You know what would be really nice for this bug? Some examples!

If you could point to a query that needs to be rewritten for Pg or another, more-strict-than-MySQL, database or a query that was re-written that would help a lot.

Mark: The r68100 shows an example. I agree this would be a nice feature, but I will point out that MySQL is *too* flexible, and allows creation of queries without unique coverage. :)

There is also a implicitGroupby() attribute used by the database classes: see specials/SpecialListfiles.php for an example.

(In reply to comment #2)

Mark: The r68100 shows an example. I agree this would be a nice feature, but I
will point out that MySQL is *too* flexible, and allows creation of queries
without unique coverage. :)

In that case there isn't a unique index on oldimage, which sucks terribly. I'd agree that GROUP BY shouldn't automatically be *that* loose.

sumanah wrote:

We discussed this bug today in IRC and did not come to any particular conclusions on how to design this. This is a large project, one that developers who want a substantial MediaWiki improvement task should take on.

<G_SabinoMullane> There is a related bug to 26273 but I don't know where it is offhand. Tim and I made some hand-waving solutions for this.
<G_SabinoMullane> Basically, we need to have MW gather the list of columns from the tables and create the GROUP BY on the fly.
<blobaugh> G_SabinoMullane: why not have a var that lists, per table, the correct group by?
<G_SabinoMullane> blobaugh: Because schema changes (new columns) would break it
<djbauch> This bug (26273) used to rear its ugly head a lot. Not so much any more. It was just one special page the last time I checked. Staying away from GROUP BY 1,2,3, etc. and sticking to the more stringent rules enforced by SQL SERVER (e.g., making sure to name the fields by name rather than by alias) has helped a lot.
<blobaugh> G_SabinoMullane: could this be a global that gets updated on schema changes manually?
<^demon> blobaugh: We can barely keep people updating all the proper places when they do a schema change anyway...a global they'd have to keep in check too would be impossible.
<blobaugh> ^demon: fair enough
<G_SabinoMullane> blobaugh: That seems like a lot of work compared to simply reading the db cols directly
<blobaugh> G_SabinoMullane: maybe, but it would make the system much faster than needing another db call
<sumanah> djbauch: do you think it would be fairly easy to generate a list of the places in MediaWiki that name fields by alias?
<djbauch> Sumanah: Yes, there were only two places that caused a problem that I found. One special page and one obscure place.
<sumanah> is this a longterm project for some interested developer, then?like, GSoC level?  or intractable? or what?
<Nikerabbit> not sure if it is worth full GSoC, but looks like non-trivial amount of work indeed
Qgil subscribed.

This task was mentioned in https://www.mediawiki.org/w/index.php?title=Outreach_programs/Possible_projects&oldid=1404823#Very_raw_projects as a possible candidate for Google Summer of Code or similar programs. Do you think it is a good candidate?

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Is this really a good candidate for GSoC/Outreachy?

This is a message posted to all tasks under "Re-check in September 2015" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

If ONLY_FULL_GROUP_BY mode is enabled in mysql (T112637), then we need to do this for all backends, and I think that makes this bug something we no longer need to do, as people would specify all the columns in the first place.

Shouldn't this bug be closed because of https://phabricator.wikimedia.org/T112637 as @Bawolff mentioned ?

The other bug is a proposal. This bug can be closed when the other one happens (Or if it happens, but it seems pretty likely it will happen)

I'm moving this to "Needs Discussion" on Possible-Tech-Projects as it is not sure whether we want it for a outreachy or gsoc project. Whether it's needed or not depends on T112637

Qgil removed a project: Possible-Tech-Projects.
Qgil set Security to None.

This task doesn't look like a good candidate for Possible-Tech-Projects. Depending on a RfC, and it looks too niche.

Umherirrender subscribed.

Now the unit tests covers this part - see T281329