Page MenuHomePhabricator

Cloned tables for unittests do not have references and constraints
Closed, DeclinedPublic

Description

Looks like tables used for unit tests do not have all relationships of the original ones:

(below was obtained from running unit tests with temporary tables disabled)

minitest=# \d pagelinks
Table "mediawiki.pagelinks"

ColumnTypeModifiers
pl_fromintegernot null
pl_namespacesmallintnot null
pl_titletextnot null
Indexes:
    "pagelink_unique" UNIQUE, btree (pl_from, pl_namespace, pl_title)
    "pagelinks_title" btree (pl_title)
Foreign-key constraints:
    "pagelinks_pl_from_fkey" FOREIGN KEY (pl_from) REFERENCES page(page_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

minitest=# \d page
minitest=# \d unittest_pagelinks
Table "mediawiki.unittest_pagelinks"

ColumnTypeModifiers
pl_fromintegernot null
pl_namespacesmallintnot null
pl_titletextnot null

Version: 1.20.x
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57724

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:21 AM
bzimport set Reference to bz37702.
bzimport added a subscriber: Unknown Object (MLST).

looks like includes/db/CloneDatabase.php needs some love. Congratulations on finding this issue!

I think there is more work needed, whole $wgDBPrefix handling should be abstracted out of MediaWikiTestCase.php...

I don't see a feasible way to clone a database (i. e., schema) in PostgreSQL that reproduces all necessary foreign keys and triggers and ends up with them pointing at the right tables.

Wouldn't it be much easier to just create new tables from scratch? What is the reasoning behind the current approach?

I think we have two solutions:

  1. Upload ready-to-use schema for unit testing. I think Selenium has some MediaWiki dump stored for that purpose:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/tests/selenium/data/mediawiki118_fresh_installation.sql?view=log

We should certainly not be running updater on this (maybe only once for all tests) since it will take too much time.

  1. Another solution is to snap some code from pg_dump or some ORM or some GUI suolution and actually re-create the schema. Oracle has "duplicate_table" function

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/oracle/tables.sql?revision=109909&view=markup#l739

Maybe our Oracle friends can tell us how it good it works between Oracle releases.

In case of PostgreSQL we have already a pretty good functionality to read definitions of indexes in the PostgresField class which I recently updated to work with older PostgreSQL versions. I think this is a good starting point.

Another advantage of solution #2 is that we could actually use the same code
in the updater and even rewrite schema definition in the PHP objects (PostgresTable, PostgresField, etc..) instead of pure SQL - this way
we could easy compare "desired" schema with the "current" one.

When working on the updater I am doing diffs between SQL dumps and this is not a very good method.

I'm concerned with two issues here:

a) Differences between "original" and "cloned" schema: How to keep the ready-to-use schema in sync with the schema of an actual installation? How to ensure that all bits and bytes in the "original" schema really get cloned?

b) Developers' workload: Regularly updating the ready-to-use schema or even just maintaining a working clone schema function will consume a *lot* of manpower that seems to be scarce in PostgreSQL support.

So I would certainly prefer to use a "fresh install" method: The code is already in the installer (maybe refactor?), it gets regularly updated and the tests are run in a "controlled environment".

Another thing I was thinking about as you mentioned the updater (and not related to this bug): Why use a tables.sql at all? Why not just treat a new installation as an update of an empty database? We could do away with the mixture of SQL and PHP logic and ... Hmmm.

(In reply to comment #5)

Another thing I was thinking about as you mentioned the updater (and not
related to this bug): Why use a tables.sql at all? Why not just treat a new
installation as an update of an empty database? We could do away with the
mixture of SQL and PHP logic and ... Hmmm.

Merging the Doctor and God. The one that upgrades the sick shall have the power to create new life.

I like it!

(btw. that was totally not a Doctor Who reference)

(In reply to comment #6)

Another thing I was thinking about as you mentioned the updater (and not
related to this bug): Why use a tables.sql at all? Why not just treat a new
installation as an update of an empty database? We could do away with the
mixture of SQL and PHP logic and ... Hmmm.

Merging the Doctor and God. The one that upgrades the sick shall have the power
to create new life.

I like it!

Actually, this approach seems to be followed already by Extension:AbuseFilter and Extension:Math, probably inter alia. So that would explain what inspired me :-).

Another thing that came to my mind: We seem to have hardcoded the prefix "unittest_". This makes it impossible to run database tests in parallel. This should probably be something like '"unittest" . getmypid() . "_"'.

No idea if it is enabled or works with MySQL.

I would say we should use PostgreSQL schema instead of the prefix.
And we could use something like <unittestname> (should be enough).

freak wrote:

i'd go for <unittestname> option as it enables me to manualy generate a short enough prefix that won't break the oracle 30 char object name limit.

And please don't make me go into the "quoted object names have no such limit" debate again until we have the abstract schema.

(In reply to comment #10)

No idea if it is enabled or works with MySQL.

I would say we should use PostgreSQL schema instead of the prefix.
And we could use something like <unittestname> (should be enough).

But this doesn't allow parallelization. Just imagine a couple of jobs creating, modifying and destroying the same objects in one schema. Would "<unittestname><getmypid()>" do any harm?

BTW, this bug bit me again today when I wanted to look why the search tests are failing. I initially thought adding some $this->db->tableName()s to SearchPostgres.php would fix that, but as the search "index" (titlevector/textvector) is maintained by triggers that aren't copied on tests, they still fail.

Another caveat to consider:

MySQL doesn't allow temporary tables to be generated from views. This means that if any extension defines a view in the database, the entire phpunit suite dies with an error "A database query syntax error has occurred....". Sigh.

fallos wrote:

Perhaps the docs on create table LIKE parent_table [ like_option ... ] provide some light.

Default expressions for the copied column definitions will only be copied if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having null defaults.

Not-null constraints are always copied to the new table. CHECK constraints will only be copied if INCLUDING CONSTRAINTS is specified; other types of constraints will never be copied. Also, no distinction is made between column constraints and table constraints — when constraints are requested, all check constraints are copied.

INCLUDING ALL is an abbreviated form of INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS.

http://www.postgresql.org/docs/9.1/static/sql-createtable.html

Unfortunately, INCLUDING ALL does not replicate foreign key checks (this cannot be reliably done on a per-table basis).

An early fix for this is in gerrit change I326bb4a189bf881299b9fb678033a927b916efac (this just replays some SQL
with ALTER TABLE ... ADD CONSTRAINT plus makes a copy of the "mwuser"
table). We might actually want to split tables.sql into two files
to make it work this way :)

Before this change is applied I had 3 errors and 3 failures in the unit tests:

  1. TestORMRowTest::testSaveAndRemove fails because of test_id (bug 37601)
  2. SiteSQLStoreTest::testGetSites fails because of bug 37601
  3. SiteSQLStoreTest::testSaveSites fails because of bug 37601

3 failures are:

  1. WikiPageTest::testDoDeleteArticle most probably this bug
  2. WikiPageTest::testDoDeleteUpdates as well
  3. SiteSQLStoreTest::testReset I didn't check yet

After change I326bb4a189bf881299b9fb678033a927b916efac is applied
I get 16 errors and 2 failures:

  1. BlockTest::testBlockedUserCanNotCreateAccount
  2. BlockTest::testCrappyCrossWikiBlocks
    • those fail because the user id they use is not yet in the DB (easy)
  1. LinksUpdateTest::testUpdate_pagelinks
  2. LinksUpdateTest::testUpdate_externallinks
  3. LinksUpdateTest::testUpdate_categorylinks
  4. LinksUpdateTest::testUpdate_templatelinks
  5. LinksUpdateTest::testUpdate_imagelinks
  6. LinksUpdateTest::testUpdate_langlinks
  7. LinksUpdateTest::testUpdate_page_props
    • those fail becaue page_id they refer to in those tables is not (yet) in the database
  1. 11) 12) and 13) are RevisionStorageTest::testUserWasLastToEdit
    • fail because page_d they refer to is not yet in the database (fixed with gerrit change I653a8bccdaa748a9bea453cd1dbf609a30e1ff6f)
  1. TestORMRowTest::testSaveAndRemove
  2. SiteSQLStoreTest::testGetSites
  3. SiteSQLStoreTest::testSaveSites
    • those fail as they failed before the patch (bug 36701)

What is interesting is that WikiPageTest::testDoDeleteArticle no longer
fails there, but WikiPageTest::testDoDeleteUpdates and SiteSQLStoreTest::testReset continue to do so.

Actually I believe testDoDeleteUpdates should be fixed with constraints and triggers since the database can delete dependent records for us.

saper set Security to None.
Jdforrester-WMF subscribed.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

Change 100141 had a related patch set uploaded (by Krinkle; owner: saper):
[mediawiki/core@master] [WIP] Crude way to recreate PgSQL references in tables

https://gerrit.wikimedia.org/r/100141

Umherirrender subscribed.

Constraints are dropped - RFC T164898

Change 100141 abandoned by Aklapper:

[mediawiki/core@master] [WIP] Crude way to recreate PgSQL references in tables

Reason:

Abandoning as https://phabricator.wikimedia.org/T39702 is closed as declined.

https://gerrit.wikimedia.org/r/100141