Page MenuHomePhabricator

Inconsistent state within the internal storage backends
Closed, ResolvedPublic

Details

Reference
bz39221

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:08 AM
bzimport set Reference to bz39221.

Some of these files don't have previous versions. What where you reverting to for those?

Begbroke_Church_-_geograph.org.uk_-_1386361.jpg and St_Tudno%27s_Church_from_the_Lych_Gate_-_geograph.org.uk_-_1419145.jpg had NFS files with bad permissions. They were fixed.

I was reverting those which have a previous version.

  • Bug 39260 has been marked as a duplicate of this bug. ***

(In reply to comment #3)

Begbroke_Church_-_geograph.org.uk_-_1386361.jpg and
St_Tudno%27s_Church_from_the_Lych_Gate_-_geograph.org.uk_-_1419145.jpg had NFS
files with bad permissions. They were fixed.

Error messages is gone.

All of these *_geograph.org.uk_* files have wrong UNIX permissions and are owned by "catrope" instead of "apache".

So, sudo chown apache *_geograph.org.uk_* ??

(In reply to comment #10)

So, sudo chown apache *_geograph.org.uk_* ??

I'm currently running a find to list all files in the public upload directories (i.e. excluding private wikis and excluding thumbnails, but including archived versions) that aren't owned by the right user. This'll take a few more hours to run, so I'll do a chown based on that list tomorrow.

(In reply to comment #11)

(In reply to comment #10)

So, sudo chown apache *_geograph.org.uk_* ??

I'm currently running a find to list all files in the public upload directories
(i.e. excluding private wikis and excluding thumbnails, but including archived
versions) that aren't owned by the right user. This'll take a few more hours to
run, so I'll do a chown based on that list tomorrow.

This chown has now been running for an hour:
18:52 RoanKattouw: Running /root/fixownership < /root/badownershipfiles in a screen on ms7

It'll take a while to finish: it's done 152,000 files in the first hour and there are 1.8 million files to fix, so it'll probably take another 11-12 hours.

Thanks for doing that Roan! I may have found a similar bug affecting undeletions. When I tried undeleting http://commons.wikimedia.org/w/index.php?title=File:Afrikaner_Commandos2.JPG , I received the following error: http://commons.wikimedia.org/wiki/File:FileInconsistentStateCommonsError20120815.png

Will this also be fixed by the mass chown?

(In reply to comment #12)

This chown has now been running for an hour:
18:52 RoanKattouw: Running /root/fixownership < /root/badownershipfiles in a
screen on ms7

It'll take a while to finish: it's done 152,000 files in the first hour and
there are 1.8 million files to fix, so it'll probably take another 11-12 hours.

This is now done, but there are still some bad files left. Running another find to fix those:

21:19 RoanKattouw: find /export/upload/wik*/*/{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,archive,math,temp,timeline} ! -user apache -exec chown apache \{\} \;

(In reply to comment #13)

Thanks for doing that Roan! I may have found a similar bug affecting
undeletions. When I tried undeleting
http://commons.wikimedia.org/w/index.php?title=File:Afrikaner_Commandos2.JPG ,
I received the following error:
http://commons.wikimedia.org/wiki/File:FileInconsistentStateCommonsError20120815.png

Will this also be fixed by the mass chown?

I *think* so, but I'm not sure. Aaron, could you look into that specific file and see what's wrong with it?

$name = "local-public/archive/6/69/20110119133921!Afrikaner_Commandos2.JPG";

...

var_dump( $nfs->getFileStat( array( 'src' => "mwstore://local-NFS/$name" ) ) );

bool(false)

var_dump( $swift->getFileStat( array( 'src' => "mwstore://local-swift/$name" ) ) );

array(4) {

["mtime"]=>
string(14) "20120721184844"
["size"]=>
int(123148)
["sha1"]=>
string(31) "lox3gcuif55humxbiq3exjhcu6dg1wt"
["latest"]=>
bool(true)

}

So the file is in Swift but not NFS.

I've manually resynced Afrikaner_Commandos2.JPG.

(In reply to comment #14)

This is now done, but there are still some bad files left. Running another find
to fix those:

21:19 RoanKattouw: find
/export/upload/wik*/*/{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,archive,math,temp,timeline}
! -user apache -exec chown apache \{\} \;

This is now done

material.scientist wrote:

Please have a look at http://commons.wikimedia.org/wiki/File:D%C4%9B%C4%8D%C3%ADn,_Dlouh%C3%A1_j%C3%ADzda,_pivovarsk%C3%BD_kom%C3%ADn.jpg - this file can't be moved because of: API request failed (unknownerror): Unknown error: "backend-fail-synced". There were more files with the same error a few days ago, but I suppose they became unlocked.

Another example in bug 39483.

(In reply to comment #23)

Again here: http://commons.wikimedia.org/wiki/File:Chordeiles_Gundlachii-.jpg
File can't be deleted.

This is different, it gives a 404 error (completely missing): different bug?
Another example (by esby): http://commons.wikimedia.org/wiki/File:FMLF_2012-4.JPG

(In reply to comment #25)

This is different

I also think this is different. Is there already a bug report open?

(In reply to comment #26)

(In reply to comment #25)

This is different

I also think this is different. Is there already a bug report open?

I search various components (and moved a bunch of bugs under file management/media storage) but I couldn't find any. I'm not really able to distinguish issues though...

(In reply to comment #23)

Again here: http://commons.wikimedia.org/wiki/File:Chordeiles_Gundlachii-.jpg
File can't be deleted.

OK now. File deleted.

(In reply to comment #27)

(In reply to comment #26)

(In reply to comment #25)

This is different

I also think this is different. Is there already a bug report open?

I search various components (and moved a bunch of bugs under file
management/media storage) but I couldn't find any. I'm not really able to
distinguish issues though...

opened bug #39615

Error deleting file: The file "mwstore://local-multiwrite/local-public/6/60/Miley_Cyrus_interpreta_Miley_Stewart.jpg" is in an inconsistent state within the internal storage backends

(In reply to comment #32)

http://commons.wikimedia.org/w/index.php?title=Special:Undelete&action=submit

This bug is now a real big problem :(

That link doesn't go to any specific file.

Turelio001 wrote:

http://commons.wikimedia.org/wiki/File:Miley_Cyrus_interpreta_Miley_Stewart.jpg

Guys, this is really a highly critical bug that needs urgent fixing. The workload of the deleting admins on Commons is already unbearable, it doesn't need to be artifically increased by software bugs.

Bumping down priority. Aaron has another issue which is higher priority than this one.

svenmanguard wrote:

Roan told me to report this here.

I tried undeleting File:Rajeev kumar varshney.jpg over on Commons (per OTRS ticket) and got the message

Error undeleting page

Undelete failed; someone else may have undeleted the page first.

Undeletion will not be performed if it will result in the top page or file revision being partially deleted. In such cases, you must uncheck or unhide the newest deleted revision.

Error undeleting file: The file "mwstore://local-multiwrite/local-public/1/10/Rajeev_kumar_varshney.jpg" is in an inconsistent state within the internal storage backends

First line is larger than the rest, fourth line is in red text.

Sven

(In reply to comment #42)
The file is also in an "inconsistent state" if you try to reupload the file. So this is related?

(In reply to comment #42)

(In reply to comment #40)

https://commons.wikimedia.org/wiki/File:Akong_Rinpoche_and_a_monk.jpg
Error while creating thumbnails:
https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Akong_Rinpoche_and_a_monk.jpg/1280px-Akong_Rinpoche_and_a_monk.jpg

That's very unlikely to be related.

It is exactly the same error message, and it also concerns a media file.
The evidence shows that it is probably related to this bug.

Just to point out that today I had a problem renaming a file which had a previous version. The problem is described [http://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#File:.D0.A6.D0.B5.D1.80.D0.BA.D0.BE.D0.B2.D1.8C_5.JPG here], the file is Церковь5.JPG . I run into a similar problem earlier today, but now I do not remember the file name (it started with Tha and I wanted to rename in into The).

(In reply to comment #44)

(In reply to comment #42)

(In reply to comment #40)

https://commons.wikimedia.org/wiki/File:Akong_Rinpoche_and_a_monk.jpg
Error while creating thumbnails:
https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Akong_Rinpoche_and_a_monk.jpg/1280px-Akong_Rinpoche_and_a_monk.jpg

That's very unlikely to be related.

It is exactly the same error message, and it also concerns a media file.
The evidence shows that it is probably related to this bug.

Probably, since https://upload.wikimedia.org/wikipedia/commons/4/42/Akong_Rinpoche_and_a_monk.jpg loads but http://upload.wikimedia.org/wikipedia/commons/4/42/Akong_Rinpoche_and_a_monk.jpg is a 404.

When I last looked at them I specifically tried purging first to clear the squids, and then viewed the (https) url that was given here. The file loaded fine, which strongly suggested that it existed and the thumbnail errors were unrelated. However, since the regular http url is a 404 (and adding ?x=1 to the https one doesn't either) that means that file really doesn't exist and that the https squid cache purging is also broken (I still can't get that https link to purge).

*** Bug 39952 has been marked as a duplicate of this bug. ***

I completed another commons run of syncFileBackend.php, which should have eliminated all such inconsistencies (except for new ones of course).

My bot recently uploaded http://commons.wikimedia.org/w/index.php?title=File:Snowcreekwater3web_small_-_West_Virginia_-_ForestWander.jpg

I was unable to delete this file: https://commons.wikimedia.org/w/index.php?title=File:Snowcreekwater3web_small_-_West_Virginia_-_ForestWander.jpg&action=delete
-->

Fehler bei Datei-Löschung: Die Datei „mwstore://local-multiwrite/local-public/3/36/Snowcreekwater3web_small_-_West_Virginia_-_ForestWander.jpg“ befindet sich, innerhalb des internen Speicher-Backends, in einem inkonsistenten Zustand.

(In reply to comment #48)

I completed another commons run of syncFileBackend.php, which should have
eliminated all such inconsistencies (except for new ones of course).

It seems https://commons.wikimedia.org/wiki/File:EnwikipediagrowthGom.PNG was also not cleared by this run. (It is an old file)

(12:02:00 PM) liangent: 恢复被删文件时发生错误:文件"mwstore://local-multiwrite/local-public/9/9c/Tencent_QQ.png"在内部存储后端之中处于不一致状态
(12:02:00 PM) liangent: 恢复被删文件时发生错误:文件"mwstore://local-multiwrite/local-deleted/p/h/o/phoiac9h4m842xq45sp7s6u21eteeq1.jpg"在内部存储后端之中处于不一致状态

Two more. The second is on [[zh:File:Monferno.jpg]].

Another file from today: https://commons.wikimedia.org/wiki/File:Bierzgłowo_Windmill.jpg

Margoz said he has gotten this error. He was able to upload a new (working) version of the file. Fun fact: Rotatebot claims to have rotated the image, but he's not in the file history. Fun fact 2: the missing image was (according to the uploader) the same, but rotated by 90 degrees; when visiting the page earlier, I got the thumbnail from that one first, then one refresh later the current one (landscape orientation).

(In reply to comment #48)

I completed another commons run of syncFileBackend.php, which should have
eliminated all such inconsistencies (except for new ones of course).

Still another old instance: https://commons.wikimedia.org/wiki/File:Geslau_St._Kilian_018.jpg

I don't know whether this is related:
https://commons.wikimedia.org/wiki/File:OAB_Neuenbuerg_Ansicht1.png when requesting the full size ( https://upload.wikimedia.org/wikipedia/commons/5/5a/OAB_Neuenbuerg_Ansicht1.png ), I get "404 Not Found" ( File not found: /v1/AUTH_xxxx1b15-ed7a-40b6-b745-47666abf8dxx/wikipedia-commons-local-public.5a/5/5a/OAB_Neuenbuerg_Ansicht1.png )

(In reply to comment #57)
I have added the appropriate cat. (see: bug #39615 )

I'm also encountering this when trying to rename:

(This was uploaded from the cluster, probably hume, by Roan.)

again: 404-bug and inconsistent-bug
https://commons.wikimedia.org/wiki/File:Mahabharata02ramauoft_0022_19.jpg

Maybe it would be helpful to create a category where to put the files. Otherwise the comment section will grow unnaturally big.

(In reply to comment #63)

Maybe it would be helpful to create a category where to put the files.
Otherwise the comment section will grow unnaturally big.

I guess you could create a category called "Category:Bug 39221" or something, but this is kind of absurd. Uploads need to be completely disabled if this is causing data corruption until the problem can be properly addressed.

I'm getting a new error now when trying to rename this file:
https://commons.wikimedia.org/wiki/File:Metrics_9.4.12.theora.ogv

You do not have permission to move this page, for the following reason:
Could not write file "mwstore://local-multiwrite/local-public/e/ec/Metrics_9.4.12.theora.ogv" due to insufficient permissions or missing directories/containers.

(In reply to comment #66)

I'm getting a new error now when trying to rename this file:
https://commons.wikimedia.org/wiki/File:Metrics_9.4.12.theora.ogv

You do not have permission to move this page, for the following reason:
Could not write file
"mwstore://local-multiwrite/local-public/e/ec/Metrics_9.4.12.theora.ogv" due to
insufficient permissions or missing directories/containers.

That's because I screwed up when importing that file. Fixed the ownership just now.

These should basically be gone now, aside from permission errors that require root users to fix, which scripts were run to fix.

That ogv file was recently manually added by an admin with the wrong permissions (which is rare). dcec58e could also be activated even for that case.

This bug cannot be at the same time resolved and blocker of bug 39615: reopening for now, please close if it's not the cause.

(In reply to comment #70)

This bug cannot be at the same time resolved and blocker of bug 39615:
reopening for now, please close if it's not the cause.

I don't see how that makes sense. This bug is about people not being able to do
things due to sync errors. Any other problems, like 404s can be separate (which
39615 is already there for).

Thank you for fixing dependencies.

Still some weird happenings

File:St Clement Eastcheap - sword rest - front, close-up - 1394055.duplicate.jpg

View or restore 11 deleted edits?

https://commons.wikimedia.org/wiki/Special:Undelete/File:St_Clement_Eastcheap_-_sword_rest_-_front,_close-up_-_1394055.duplicate.jpg

File says that it is deleted, yet it still shows as an active link, and shows an image. <shrug>

Please open a separate bug for that (but also try using the "delete all" link first).

  • Bug 37213 has been marked as a duplicate of this bug. ***

Same error when trying to delete [[hu:Fájl:A Qulto felépítése - 003.jpg]] (originally uploaded 3 days ago):

Error deleting file: The file "mwstore://local-multiwrite/local-public/c/c8/A_Qulto_felépítése_-_003.jpg" is in an inconsistent state within the internal storage backends

(In reply to comment #76)

Same error when trying to delete [[hu:Fájl:A Qulto felépítése - 003.jpg]]
(originally uploaded 3 days ago):

Error deleting file: The file
"mwstore://local-multiwrite/local-public/c/c8/A_Qulto_felépítése_-_003.jpg"
is in an inconsistent state within the internal storage backends

Given that it's been a few months, I've split this recent reported issue out into a separate bug: bug 47905. This bug should remain resolved/fixed, as it was fixed in September 2012 (cf. comment 69).