Page MenuHomePhabricator

updateSearchIndex.php assumes it is the writing for the only wiki using the disk
Closed, ResolvedPublic

Description

Gentlemen, can you please not have updateSearchIndex.php write into
the file searchUpdate.pos?

Why?

  1. Of the thousands of operations on a running Mediawiki wiki, this is

the only one that writes into the local filesystem instead of the
database.

What's wrong with that?

  1. It precludes using a read only filesystem. If it were not for this

one little 14 byte file, one could use a read only filesystem.

  1. One might have several wikis using the same files:

radioscanningtw.jidanni.org -> mediawiki-1.11.0
taizhongbus.jidanni.org -> mediawiki-1.11.0
(yes, even the same LocalSettings.php, with appropriate switch()s
inside it.) There is no way two wikis can use the same
searchUpdate.pos.

Also many users might not just untar the updates on top of the
previous file tree, but instead do the much cleaner
tar tzvf mediawiki-1.11.0.tar.tz;
hardlink LocalSettings.php from the old tree to the new;
then move the above symlinks to the new tree;
cd down each symlink to their maintenance dir and php update.php;

Thus searchUpdate.pos will get left behind anyway... (and I wonder what
will happen now that it is left behind?)

Anyway, you have 99.99% reached the goal of a filesystem I/O one-way
clean infrastructure. Please move that tiny "searchUpdate.pos" piece
of information into the database like all the rest of the read/write
information that needs to be more than readonly. Thanks!

P.S., yes, upon initial wiki installation you write into the
filesystem. But writing to the filesystem is totally unnecessary
for running wikis or updating a running wiki.


Version: 1.15.x
Severity: normal

Details

Reference
bz11654

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:59 PM
bzimport set Reference to bz11654.
bzimport added a subscriber: Unknown Object (MLST).

Changed summary to clarify.

Note that updateSearchEngine.php may be obsolete; we haven't used it ourselves in a long time. Might not work at all, or might be scrapped after review.

I have found an similar item to the above.

$wgReadOnlyFile = false; /// defaults to "{$wgUploadDirectory}/lock_yBgMBwiR";

Here again the administrator with multiple wikis reading off the same
directory tree would end up locking them all, not just one.
(Of course no big deal as one could make it into $wgDB... to lock just one.)

Note also that the documentation is not clear. It defaults to false in
this paragraph to the readers eyes...

I don't see an issue there. $wgUploadDirectory should be different for each individual wiki, otherwise they'll end up writing over top of each others images. So the only case where the lock files will overlap is due to a misconfiguration of MediaWiki.

As for wiki which actually do want to share their images, that's what we have the commons type system for. MediaWiki isn't designed to handle setups where multiple wiki's image directories overlap, quite simply because we also store database information on images in addition to the files.

OK, but some of us have e.g., 3 wikis working off the same filesystem,
only differing by a switch() in LocalSettings.php, and have never
implemented or allowed images on our wikis, aside from the
_obligatory_ logo (different bug).

So for us we see that you are 99.9% filesystem clean, but not 100%

Then use that switch you already have to specify different locations for the lock files, that's what the configuration variable is there for, to tell it where to look for a lock file, if yours isn't in a good place, then put it in one.

As for your idea of putting the lock into the database...

Firstly, the lock file is not meant for common use, it's meant to lock the database when someone is changing stuff inside the software, so the lock is actually meant to be placed there by a sysadmin who is going to be editing the php files or the database and doesn't want edits or anything to corrupt the data inside the database. The lock and unlock special pages are disabled by default, there is no need for the wiki to write to the filesystem to do a lock unless you configure it that way.

Secondly, the whole point of a database lock is to prevent things from being changed in the database, this is because some software/database changes are going to be made, and it's likely that the data coming from the database is going to be unstable because of changes being made behind the scenes.
So locking via the database is unacceptable because the database is the unstable thing we are trying to protect, there is no guarantee that the lock is going to stay in the same place or not be altered.
I honestly wouldn't trust MediaWiki to keep itself safe if it was using the very unstable thing I'm editing, as a means of knowing if it should write there or not.

And thirdly, just like the many other things inside of MediaWiki, the lock is an extra feature, if you want to be filesystem clean, then don't use it, you're already not using the images that are clearly there for use, so what's the difference between using and not using the lock file, it's not something you need unless you are doing sysadmin work, in which case you hardly need to use the wiki interface.

Marking as INVALID.

Why?

  1. Of the thousands of operations on a running Mediawiki wiki, this is

the only one that writes into the local filesystem instead of the
database.

Because it's faster than using the database, and it's no big deal.

What's wrong with that?

  1. It precludes using a read only filesystem. If it were not for this

one little 14 byte file, one could use a read only filesystem.

If you're really trying to set this up rather than just spouting hypothetical situations, then perhaps I can see the issue. Otherwise we'll end up debating all day over an edge case no one actually has.

  1. One might have several wikis using the same files:

radioscanningtw.jidanni.org -> mediawiki-1.11.0
taizhongbus.jidanni.org -> mediawiki-1.11.0
(yes, even the same LocalSettings.php, with appropriate switch()s
inside it.) There is no way two wikis can use the same
searchUpdate.pos.

You can specify a different position file if you absolutely must, see the docs in the header of updateSearchIndex.php

Created attachment 6071
use wfWikiID()

Any program that changes the filesystem must consider that it is not
alone.

In the following patch, we apply the lessons learned from
maintenance/generateSitemap.php .

Any less would spell disaster for
http://www.mediawiki.org/wiki/Manual:Wiki_family users.

As far as shared hosting situations that offer a mediawiki package
where there is actually only one copy for everybody, there still is
the worry that two users wfWikiID()'s might collide, but that is the
host's fault, not ours.

Attached:

What does everybody think of my patch?

The only cons would be:

  1. A one time starting from scratch, as we are using a new .pos file.
  2. We don't remove the old searchUpdate.pos left behind.

Are these big concerns?

Fixed in r51444. Put in some b/c so next time a wiki runs it will use the old .pos file, then unlinking it and using the new wikiId-based ones.