Page MenuHomePhabricator

Please upload large file to Commons
Closed, DeclinedPublic

Description

Author: rybec

Description:
Would any administrator be willing to transfer a 1.5 GB file for me? I split it into eight parts with the "split" utility and put them on the Ubuntu One file-sharing service, which asks that the recipient make an account there.

MD5's of the pieces and the original file:
521f9ce8b6a581dc30a76f49089e9304 xaa
79d796f6c5147021c7b5c747d72698f8 xab
c86ff22d0c7469f1c8af18cec84ec000 xac
e08c66299b2be6b824a062a2e7eb7efe xad
8e8bcfa10198b8d116748817c13801b4 xae
fd6d971e86b481508a6724cf2dc11505 xaf
05475ee60ddce1fae6ede0fb1a31e23c xag
bfe6e8c8238f75ae79645fe28a17e3f6 xah
61f30874b0db53661c10d7078571e35d 011714_USIntelligencePrograms_HD.webm

The pieces can be rejoined by doing
"cat xa? > 011714_USIntelligencePrograms_HD.webm".


Version: wmf-deployment
Severity: enhancement

Details

Reference
bz61795

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:03 AM
bzimport set Reference to bz61795.

These things are generally handled by Sam Reed (Reedy), but you'll need to provide a file description and an URL first, I expect -- so that he can actually download the file.

MD5 hashes are of no use if we don't where the file is located.

rybec wrote:

Sorry, I hadn't noticed that the service I chose had an option for public sharing. I mistakenly thought it required that the recipient make an account.
Before noticing my mistake, I went ahead and had Ubuntu One e-mail an invitation to Sam Reed.

xaa is at http://ubuntuone.com/6FoMNIhaBhjuZg9B0MbVpt

xab is at http://ubuntuone.com/3yaQBFSSZaS827Ymdo2mJ5

xac is at http://ubuntuone.com/4iSDngYyxnps7R6sF4BC96

xad is at http://ubuntuone.com/1STWGehiyaCg2GBTmZVwPq

xae is at http://ubuntuone.com/3pZxstwY19PAbkRVzCxEDC

xaf is at http://ubuntuone.com/3fp5sE5HklARddruTxvbnV

xag is at http://ubuntuone.com/4n0DSjEUhIVNIy1PfaFD35

xah is at http://ubuntuone.com/4Rj6E9cDcYBrQ7MGSvZSgn

description.txt is at http://ubuntuone.com/3c49wbns1xX9kCkP8Hrq6Z

Please provide a username the file should be uploaded under, and a more informative file name that is in compliance with the general Commons file naming guidelines.

rybec wrote:

Please use Rybec as the user name and 011714_USIntelligencePrograms_HD.webm as the file name.

I suggest you think of a better file name than that; see comment 3 for more info.

rybec wrote:

I assumed that you took xaa, xab etc. as the suggested file name. Can you explain why you find 011714_USIntelligencePrograms_HD.webm inadequate? Feel free to suggest a different name.

rybec wrote:

I've reviewed Commons' naming guidelines, which are at https://commons.wikimedia.org/wiki/Commons:File_naming .

They say:

  • Media files can be uploaded with names in any language in any script (coded as UTF-8) - see Commons:Language policy.
  • Titles of media files should be meaningful and helpful in the language chosen.
  • Avoid "funny" symbols (control characters, unneeded punctuation, etc.) that might be significant in future wiki markup. It is a good idea to stick to graphemic characters, numbers, underscore (space), ASCII hyphen/minus/dash, plus, and period (dot).
  • The filename extension (eg .jpg) should match the file format (eg JPEG).
  • When the year, date or version may be of importance, it is good to include it in the file name.

The name I provided conforms to those.

"Titles of media files should be meaningful and helpful in the language chosen." I don't immediately see how 011714_USIntelligencePrograms_HD.webm fits to that description with a bunch of numbers *at the beginning* (is that a date?), a combination of CamelCase and underscores, and "_HD" at the end (whatever that stands for), but maybe you could elabrate? Just my personal opinion though.

I think "President Obama speaks on U.S. intelligence programs, 2014-01-17.webm" could be quite good a file name.

rybec wrote:

I don't like the comma, which is discouraged by the proposed naming guideline. Otherwise that would be fine.

To help Reedy, I re-created the file on the Wikimedia Polska toolserver and put it inside a .tar.gz archive.

@sam: I know you enjoy downloading files from tools.wikimedia.pl due to the fast connecting to tin :-))

(In reply to Tomasz W. Kozlowski from comment #11)

@sam: I know you enjoy downloading files from tools.wikimedia.pl due to the
fast connecting to tin :-))

tin has no idea of the external interwebs :P

But yes, a recompiled file is much nicer than me having to find ops to install packages, like that time I needed unzip on a server...

reedy@bast1001:/tmp/uploads$ wget http://tools.wikimedia.pl/~odder/whitehouse/Video.tar.gz
--2014-02-23 02:24:32-- http://tools.wikimedia.pl/~odder/whitehouse/Video.tar.gz
Resolving tools.wikimedia.pl (tools.wikimedia.pl)... 2001:41d0:2:7530:30:48ff:febe:31d6, 94.23.242.48
Connecting to tools.wikimedia.pl (tools.wikimedia.pl)|2001:41d0:2:7530:30:48ff:febe:31d6|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1622452559 (1.5G) [application/x-gzip]
Saving to: `Video.tar.gz'

100%[============================================================================================================================>] 1,622,452,559 11.1M/s in 2m 22s

2014-02-23 02:26:54 (10.9 MB/s) - `Video.tar.gz' saved [1622452559/1622452559]

reedy@bast1001:/tmp/uploads$ sftp tin
Connected to tin.
sftp> cd /tmp/uploads
sftp> put Video.tar.gz
Uploading Video.tar.gz to /tmp/uploads/Video.tar.gz
Video.tar.gz 100% 1547MB 49.9MB/s 00:31
sftp> ^D
reedy@bast1001:/tmp/uploads$ ssh -A tin
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-45-generic x86_64)
tin is a Wikimedia Deployment host (misc::deployment).
The last Puppet run was at Sun Feb 23 02:07:22 UTC 2014 (20 minutes ago).
Ubuntu 12.04.1 LTS auto-installed on Thu Nov 29 22:21:10 UTC 2012.
tin is a Wikimedia application server (wikimedia-task-appserver).
Last login: Sun Feb 23 02:18:31 2014 from bast1001.wikimedia.org
creedy@tin:~$ cd /tmp/uploads/
reedy@tin:/tmp/uploads$ tar -xvf Video.tar.gz
President Obama speaks on U.S. intelligence programs 2014-01-17.webm
President Obama speaks on U.S. intelligence programs 2014-01-17.txt
reedy@tin:/tmp/uploads$ sudo -u apache mwscript importImages.php --wiki=foundationwiki --user="SRoss (WMF)" /tmp/uploads^C
reedy@tin:/tmp/uploads$ sudo -u apache mwscript importImages.php --wiki=commonswiki --user="Rybec" --comment-ext=txt /tmp/uploads
Import Images

Importing President Obama speaks on U.S. intelligence programs 2014-01-17.webm...done.

Found: 1
Added: 1
reedy@tin:/tmp/uploads$

rybec wrote:

Thank you for your efforts, but when I downloaded the video from Commons, it was 1622202800 bytes with an MD5 of 61532cb43f20f04fc6a2fbe3d822ee75. It should be 1578290173 bytes with an MD5 of 61f30874b0db53661c10d7078571e35d.

The pieces were each 197286272 bytes, except for the last which was 197286269 bytes.

So what's the problem, again?

rybec wrote:

As I explained in comment #14, a file has been uploaded to Commons, but it differs from the one I provided. Had you intentionally changed it, you would have said so. Hence it is corrupt. I've downloaded http://tools.wikimedia.pl/~odder/whitehouse/Video.tar.gz and it has the wrong file contents, with the byte size and MD5 matching what's on Commons. I'm attempting to upload the file again to the file-sharing service, without splitting it this time.

Uploaded file works for me, hence closing. If there is a user-visible problem with the file available on Commons, please provide steps to reproduce.

How was it split? How was it rejoined?

It doesn't take much difference for the hash to be different..

I'd agree with the others, as long as the file actually still plays and seems sane, there's no reason to reupload it

rybec wrote:

The file provided by Tomasz W. Kozlowski is nearly 3% larger, a difference in size of 43,912,627 bytes. He hasn't explained how he recreated it.

I've uploaded the correct file in one piece to http://ubuntuone.com/2LEFxyS5v0lQ2ERQiiEq6U and when I download it from there, it has the proper checksum.

rybec wrote:

Comment 20 was my response to comment 19.