Page MenuHomePhabricator

GWToolset processes just three files and then nothing
Closed, ResolvedPublic

Description

Tounoki launched the GWToolset with an XML holding a few dozen records ; the GWToolset claims it created a background task to process the batch ; but only the first three files are processed and uploaded.

This occurred twice.

See Special:ListFiles/Tounoki [1] for these two attempts.

[1] https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63818

Details

Reference
bz63864

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:19 AM
bzimport set Reference to bz63864.

Could you attach the xml file to the bug (or link to it).

Next step for this bug would probably be to get someone with access to job queue log to see what happened to those jobs.

First try april 11. 3 files loaded https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki from 10:50 to 10:58

Second try, april 12. between 7:47 and 8:07

https://commons.wikimedia.org/wiki/File:Sabot-M0354002647.tif is the first file of this second try and get to a timeout page.

I start again the transfert by refreshing the page and the both following records with the same comment "support DV5_M0354_2006_7 complet" are the second try with the same xml file as source.

For this two trys, only 3 records are loaded each time and even if the GWtoolset tell there is a background task, but nothing happened.

For the second try, I use the same xml file with 3 records less (the three that are aleady loaded)

Created attachment 15092
7 records for Musée Départemental Albert Demard

Attached:

Created attachment 15093
json mapping used during test upload

Attached:

localhost test

i ran a test upload with the attached xml based on the 7 items that did make it
up to commons and the attached json mapping.

• the mediafile file sizes are larger than earlier uploads; > 60mb vs < 3mb

based on the information on this page: 
https://commons.wikimedia.org/wiki/Special:ListFiles/Tounoki.

• based on what i currently understand about upload limits, commons should

accept up to 100mb for form uploads and up to 1000mb for 
background job downloads.

• it took approximately 9 minutes to upload the first 3 items as a preview.
• the background job took approximately 12 minutes to complete the remaining

4 items.

wikilabs

http://gwtoolset.wmflabs.org/wiki/Category:Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview upload took approximately 1 minute
• the remaining items took approximately 3 minutes

beta cluster test

http://commons.wikimedia.beta.wmflabs.org/wiki/Category:Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview failed with the following message, so i couldn’t process the

batch, but the first 3 items did upload as can be seen in the link above.

Our servers are currently experiencing a technical problem. This is probably
temporary and should be fixed soon. Please try again in a few minutes.

If you report this error to the Wikimedia System Administrators, please include
the details below.
Request: POST http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset,
from 127.0.0.1 via deployment-cache-text02 deployment-cache-text02
([127.0.0.1]:3128), Varnish XID 2001837297
Forwarded for: 84.85.134.252, 127.0.0.1
Error: 503, Service Unavailable at Sun, 13 Apr 2014 03:37:12 GMT

moving forward

• is there a way to find out if there’s a timeout limit on:

• form uploads
• each job queue jobs

• how can we alter those timeouts for the toolset?

• it took approximately 9 minutes to upload the first 3 items as a preview.

Thats a lot of time. Maybe the preview thing should change so that if the first upload took say > 45 seconds, to only do 1 file for the preview.

• the background job took approximately 12 minutes to complete the remaining

4 items.

wikilabs

http://gwtoolset.wmflabs.org/wiki/Category:
Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview upload took approximately 1 minute
• the remaining items took approximately 3 minutes

beta cluster test

http://commons.wikimedia.beta.wmflabs.org/wiki/Category:
Mus%C3%A9es_d%C3%A9partementaux_de_la_Haute-Sa%C3%B4ne

• the preview failed with the following message, so i couldn’t process the

batch, but the first 3 items did upload as can be seen in the link above.

Our servers are currently experiencing a technical problem. This is probably
temporary and should be fixed soon. Please try again in a few minutes.

If you report this error to the Wikimedia System Administrators, please
include
the details below.
Request: POST
http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset,
from 127.0.0.1 via deployment-cache-text02 deployment-cache-text02
([127.0.0.1]:3128), Varnish XID 2001837297
Forwarded for: 84.85.134.252, 127.0.0.1
Error: 503, Service Unavailable at Sun, 13 Apr 2014 03:37:12 GMT

moving forward

• is there a way to find out if there’s a timeout limit on:

• form uploads

All web requests have a timeout. Php itself may have an execution time limit (although to be honest i dont usually here about that limit. I think its higher than the other limits). Varnish has a time limit (thats the error you got on beta. Although it should be noted beta is configured a bit different from commons). I believe also that the ssl proxy servers have a timeout too (which sounds like the error described in bug 63818 comment 4).

Im not sure what the timeout is, but i think its to the tune of 120 seconds.

Timeouts are complicated by the fact that the file code tries to extend php timeouts well uploading a file (i think)

• each job queue jobs

i believe each job has a timeout but its much more liberal. More to the tune of an hour. (Really not sure on this part)

• how can we alter those timeouts for the toolset?

Im not sure if altering the varnish timeouts is an option. (There are however things that can be done to get around it if it turns out to be a big issue. E.g. splitting up the operation among multiple requests and using js to make it look like 1, pushing stuff to jobs like upWiz does, etc)

(In reply to Bawolff (Brian Wolff) from comment #1)

Could you attach the xml file to the bug (or link to it).

The xml file contains 36 records at the beginning.

Created attachment 15095
json mapping based on the one i found on the beta cluster for this dataset

Attached:

i’m getting the impression that we need to alter the preview step so that it
can deal with large size mediafiles; e.g., > 3mb.

at the moment, i think i might be best to eliminate the upload of the mediafile
and only upload the metadata and display a preview of that. would also test the
url at this step and make sure its valid and reachable by the toolset, and give
any error feedback to the user in case the domain name needs to be added to
the whitelist or something else. then, once the process batch job button is
clicked, allow the background job to actually download the mediafile to the
wiki.

(In reply to dan from comment #9)

Created attachment 15095 [details]
json mapping based on the one i found on the beta cluster for this dataset

I used this : https://commons.wikimedia.org/wiki/GWToolset:Metadata_Mappings/Tounoki/JOCONDE_M0354-CHAMPLITTE.json

Attached:

Focus on the data mapping for preview and just/only test the accessibility of the files (instead of upload) seems to be a good way for me.
In fact, see the pictures isn't the most important thing with this kind of work.

If you can fix it, I'm ready to test it on monday afternoon.

Or maybe preview can be an option ? (with possibility to be excluded for large size files)

Change 127839 had a related patch set uploaded by Dan-nl:
preview without upload

https://gerrit.wikimedia.org/r/127839

tounoki,

if you have access to http://gwtoolset.wmflabs.org/wiki/GWToolset you can test the above patch on that server. if you don’t, create an account and i will grant you access.

i tested the patch on that server with the files attached to this bug, and the upload of the mediafiles to the wiki succeeded; it took about 20 minutes. you can see the results here: http://gwtoolset.wmflabs.org/wiki/Category:Mus%C3%A9e_D%C3%A9partemental_Albert_Demard.

after a quick glance, the only “issue” i noticed, is that the wiki doesn’t create a thumbnail for the tif. it looks like i would need to adjust http://www.mediawiki.org/wiki/Manual:$wgTiffThumbnailType. would that help with testing? if so, do you know if that’s the correct and only value i need to change? which values would be best to use in that array?

in any case, let me know if you think this patch will take care of this bug. also, feel free to +2 it or let me know if you think it needs adjustment.

(In reply to dan from comment #15)

tounoki,

if you have access to http://gwtoolset.wmflabs.org/wiki/GWToolset you can
test the above patch on that server. if you don’t, create an account and i
will grant you access.

i tested the patch on that server with the files attached to this bug, and
the upload of the mediafiles to the wiki succeeded; it took about 20
minutes. you can see the results here:
http://gwtoolset.wmflabs.org/wiki/Category:
Mus%C3%A9e_D%C3%A9partemental_Albert_Demard.

after a quick glance, the only “issue” i noticed, is that the wiki doesn’t
create a thumbnail for the tif. it looks like i would need to adjust
http://www.mediawiki.org/wiki/Manual:$wgTiffThumbnailType. would that help
with testing? if so, do you know if that’s the correct and only value i need
to change? which values would be best to use in that array?

in any case, let me know if you think this patch will take care of this bug.
also, feel free to +2 it or let me know if you think it needs adjustment.

We use PagedTiffHandler extension on wmf servers instead of mediawiki built in tiff handling.

(In reply to Bawolff (Brian Wolff) from comment #16)

We use PagedTiffHandler extension on wmf servers instead of mediawiki built
in tiff handling.

thanks. installed installed it and it created the thumbnails.

steps to reproduce

login

  1. http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset
  2. once logged in and at Step 1: Metadata detection

step 1

  1. nothing to add
  2. select Art_Photo
  3. GWToolset:Metadata_Mappings/Tounoki/temp_JOCONDE_CHAMPLITTE.json
  4. nothing to add
  5. choose the attached sample dataset, 7 records for Musée Départemental Albert Demard
  6. click Submit

step 2

  1. click the "Preview batch" button

Change 127839 merged by jenkins-bot:
preview without upload

https://gerrit.wikimedia.org/r/127839

tounoki and jean fred,

you should be able to test this patch on the beta cluster,
http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset.

please let me know if it resolves this bug and bug 63818.

Yes it works :D thanks
http://commons.wikimedia.beta.wmflabs.org/wiki/Special:ListFiles/Tounoki

When do you think you can commit on the main WC server ?

BR

it should be part of today's deploy, so it should be available on the production server tomorrow.

concern

one concern though is that the image scalers are having issues with creating thumbnails for large tiffs when several are submitted in sequence, which is what GWToolset does. the multimedia team has a potential solution, but i haven’t heard of any definite fix yet ( see bug 65217 ).

when you do upload, please coordinate the upload with the wmf operations team in irc, #wikimedia-operations.

thanks!

this has been deployed to production. tounoki, jean-fred, are you okay with resolving the ticket as fixed now?

(In reply to dan from comment #23)

this has been deployed to production. tounoki, jean-fred, are you okay with
resolving the ticket as fixed now?

Ok to mark it as fixed

(In reply to tounoki from comment #24)

Ok to mark it as fixed

Doing so.

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:21 AM