Page MenuHomePhabricator

Unable to explain queries on replicated databases
Closed, ResolvedPublic

Description

Although not essential, it would be very useful to be able to explain queries run against the replicated Wiki databases and examine table indexes without having to refer to external docs.

Transcript follows:

tb@tools-login:~$ mysql -h enwiki.labsdb;
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 618336
Server version: 5.5.30-MariaDB-mariadb1~precise-log mariadb.org binary distribution

Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use enwiki_p;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [enwiki_p]> show index in page;
Empty set (0.03 sec)

MariaDB [enwiki_p]> explain select count(*) from page where page_namespace = 0;
ERROR 1345 (HY000): EXPLAIN/SHOW can not be issued; lacking privileges for underlying table
MariaDB [enwiki_p]>

Workaround

From http://thread.gmane.org/gmane.org.wikimedia.labs/3991/focus=4002:

Open 2 SQL sessions

In session 1:

  • SELECT CONNECTION_ID() AS conid;
  • Note the number returned.
  • Run the query to be explained.

In session 2:

  • Use the number noted above for <conid>
  • SHOW EXPLAIN FOR <conid>;

Details

Reference
bz48875

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:35 AM
bzimport added projects: Toolforge, Upstream.
bzimport set Reference to bz48875.

We had (have :-() the same problem on Toolserver (cf. https://jira.toolserver.org/browse/TS-1585), and there fale found that users need to be given the "SHOW VIEW" privilege on the views.

Sadly, the SHOW VIEW privilege does not suffice; using explain requires having SELECT privilege on the /underlying/ table (i.e., the one with the data that cannot be shared publically)

viz.: http://bugs.mysql.com/bug.php?id=64198

In other words, the reason the toolserver could not grant privileges to explain is also the reason we cannot.

(As a note, while the right cannot be given, operations staff can run explain on queries that are taking overly long in order to give you the information you need. It sucks, but it's a change in mysql we cannot circumvent).

I talked to Marc about this on IRC, and asked why we can't get upstream to allow us to do that.
Apparently people have asked MySQL for it and failed (WONTFIX - 'by design', related to a non-public worklog task required by their agreement to "SAP"(?)), and he thinks that MariaDB got the same change from a backport.

I'm reopening this with the suggestion that we file a request upstream with MariaDB.

Thinking outside the box, can sufficient spare disk capacity be allocated to the enwiki.labsdb host to create a static copy of enwiki_p containing just the set data available in the dumps (http://dumps.wikimedia.org/), loaded into plain-old tables and indexed per the replicated database? This would suffice to check the operation of some queries, particularly those touching the larger tables.

I did a little digging into this issue today.

http://bugs.mysql.com/bug.php?id=7014 (the one mentioning "SAP") makes it sound like the behavior we *want*: SHOW VIEW on the view *or* SELECT on the underlying tables will allow EXPLAIN.

http://bugs.mysql.com/bug.php?id=64198 isn't really related. The complaint there is that having SELECT on everything is no longer enough, SHOW VIEW is now required too. This is the opposite of what we want, although the same underlying change probably caused it.

So I tracked down the commit that it looks like actually made it be required to have SELECT on the underlying tables: https://bazaar.launchpad.net/~maria-captains/maria/10.0/revision/1810.4002.32. You can see the change in behavior in the unit test file mysql-test/t/view_grant.test: look at added line 206, before having SELECT and SHOW VIEW on the view and nothing on the underlying table was wanting a successful query and there it's being changed to expect the error.

It references a private (ugh) bug, but from the commit summary and comments added in the patch it seems that the "problem" being fixed there is that allowing EXPLAIN with only SHOW VIEW on the view was allowing people to get an estimate of how many rows were in the underlying table. Oh noes!

That patch also points us right at the bit of code that would need to be changed if someone wants to try getting the MariaDB people to change this.

(In reply to comment #7)

Thank you for doing this research.

It references a private (ugh) bug, but from the commit summary and comments
added in the patch it seems that the "problem" being fixed there is that
allowing EXPLAIN with only SHOW VIEW on the view was allowing people to get
an estimate of how many rows were in the underlying table. Oh noes!

That patch also points us right at the bit of code that would need to be
changed if someone wants to try getting the MariaDB people to change this.

Copying Sean P. on this bug as he may be interested in pursuing this.

Getting MariaDB to fix this behavior would be nice if it's just a matter of "leaking" row count info. We could also run our own MariaDB fork (if we're not already), but given the Labs data leak... there's probably some understandable wariness to mucking around with this particular code. :-) I believe Wikimedia now has additional protections in place to avoid a repeat, even if the views break and users can perform unfiltered SELECTs.

metatron wrote:

As tb mentioned before I would suggest to have a database in place with:

  • up to date structure and indeces
  • fake / redacted data

as a workaround until there is a better solution.

metatron wrote:

To work around this issue I created a script that copies the current schema and fills it with sample data from xxwiki_p.

The script is available under:
https://tools.wmflabs.org/wikiviewstats/misc/optimizer.sh.txt

A sample database is on enwiki.labsdb as u3710__enwiki_optimizer_p.

Currently this is _p accessible, but created per-user basis. Maybe Coren or anyone else has a better solution in the future.

daniel wrote:

Are the explain results independent of row counts? Or will MariaDB change optimization strategies (or which indexes to pick) based on that?

(In reply to comment #11)

Are the explain results independent of row counts? Or will MariaDB change
optimization strategies (or which indexes to pick) based on that?

explain sometimes changes based on row counts (Its noticeable sometimes when developing mediawiki where locally queries do different things than they do in production, which I've usually blamed on having 5 rows in my db vs 5 million)

Besides row counts, the database also maintains information on the distribution of keys in the database, which may change the query plans if the distribution on one instance of the table is different from another. See https://dev.mysql.com/doc/refman/5.5/en/analyze-table.html for some details.

Yeah, at best this partially populated schema offers an approximation; but that's arguably better than /no/ information.

@metatron: I'm okay with making this available with a simpler name; but do you have a process in place to /keep/ the schema synchronized with prod?

IIRC, with the move to the MariaDB 10 setup we can now EXPLAIN currently running queries (i. e. long-running queries). Could someone please document how to do that either here or at [[wikitech:Nova Resource:Tools/Help]]?

metatron wrote:

Yes. With MariaDB 10 it is now possible to run > SHOW EXPLAIN with same permission as running SHOW PROCESSLIST (ergo everyone can)

As posted on labs-l, I've created a tool to make this feature easy-to-use, even for short running queries.
https://tools.wmflabs.org/tools-info/optimizer.py

If it stands the test, this bug may be closed.

https://mariadb.com/kb/en/mariadb/mariadb-documentation/sql-commands/administration-commands/show-explain/

Given the availability of the new tool, and the lack of desire upstream to change what they perceive as WAD, closing this.

I'll also note that MariaDB 10.1 has an 'ANALYZE' feature coming up that is open to everyone and that lets you evaluate queries without having to run them, so whenever it is that we upgrade we'll get this as well.

My reading is that the essence of this bug, that is explaining queries on replicated databases, is now available.

yuvipanda added a subscriber: coren.
scfc assigned this task to coren.

Closing this task again (as @chasemp suggested in T141095#2511565). The scope of this task was being unable to explain queries on replicated databases. This is possible: Run your query in one session, run SHOW PROCESSLIST; and SHOW EXPLAIN FOR ${Id}; in another, and you get the result (if the query does not run long enough, it does not need to be explained either :-)).