Page MenuHomePhabricator

Infrastructure for image/video file uploads by FTP
Closed, DeclinedPublic

Description

For very large files (especially videos), uploading by HTTP is awkward and error prone, and pushing the size limits beyond the ~100mb mark will just get more difficult. Most video-sharing sites include an upload-by-FTP option, which allows you to upload your large file with a dedicated FTP client and then import the video in.

We've talked on and off about implementing this but haven't done it yet. A few notes...

Simplest UI workflow:

  • Click on 'upload by ftp' thingy
  • It assigns you a token to log in with
  • you FTP in with that username/pass (?)
  • you FTP up your file
  • back to your browser... click 'next' or 'refesh' or whatever
  • it sees your file and asks you to confirm
    • check size limits, extensions, etc stuff
  • fetches file and does the usual checks and upload.

To avoid having to restructure the entire workflow for now, this is for one-at-a-time initially; new iface needed for multiple files.

Questions:

  • what infrastructure needed to allow logins on the ftp?
  • setting upload limits on the ftp?
  • do we want secure ftp? (ssl or sftp?)
  • how hard to send your token to someone else?
  • how to actually fetch the file? (nfs, ftp, webdav)

Version: unspecified
Severity: enhancement

Details

Reference
bz17957

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:31 PM
bzimport set Reference to bz17957.
bzimport added a subscriber: Unknown Object (MLST).

mdale wrote:

Archive.org had this type of system in place .. it was not fun and far to complicated for average user. But for the technical users archive.org is great for uploading up-to multi-gigabyte files. We will have archive interoperability since we have worked with archive.org to deploy some ogg tools that make inter-archive interactions possible:
http://metavid.org/blog/2008/12/08/archiveorg-ogg-support/

so we should have commons and wikis working better with archive.org soon :)

Moving forward with uploads for commons we are working on integrating of a firefogg extension that is a one click install and then hooks into upload system. see bug 16927

mike.lifeguard+bugs wrote:

(In reply to comment #1)

Archive.org had this type of system in place .. it was not fun and far to
complicated for average user. But for the technical users archive.org is great
for uploading up-to multi-gigabyte files. We will have archive interoperability
since we have worked with archive.org to deploy some ogg tools that make
inter-archive interactions possible:
http://metavid.org/blog/2008/12/08/archiveorg-ogg-support/

so we should have commons and wikis working better with archive.org soon :)

Moving forward with uploads for commons we are working on integrating of a
firefogg extension that is a one click install and then hooks into upload
system. see bug 16927

That doesn't address non-ogg file uploads. TIFF files (which are currently not permitted, but hopefully will soon - see bug 17714) for restoration can run to 200MB, and bug 16927 does nothing to permit uploading those, since they are not video files. Equally PNGs (a currently legal filetype) can run much larger than 100MB and will not be uploadable after bug 16927 is fixed. So, that is not a satisfactory conclusion. We need to be able to upload very large files of /any/ type, not just ogg.

mdale wrote:

We can easily add pass-through support to firefogg then it can handle any large file type with the same upload chunk api system with progress indicators and resume on reset POST requests.( making it much more particle to upload 100+ megs for both the server and client ) I will try and get the pass-through patch in firefogg.

Not to say FTP would not be useful for some contexts ... and if its not difficult to add ... then by all means ;).... I just think efforts would be better concentrated on a web browser based solution. A web / mediaWiki api solution uses the existing authentication system; the existing http Apache / php services; existing asset description system. The FTP use case for importing lots of assets could be better covered by copy by url support being integrated into the api. (its already done on the new_upload branch... just finishing up the chunk system then I will push for review on the whole branch)

FTP is complicated... I can't think of any sites that use that for users to contribute content. For FTP support I imagine we would have to invent a key + user_id hash dynamic FTP account generation system. We would be generating temp accounts ( tied to users IP address? ) ; having the user use FTP software with proper account + server ip + port + force a given naming convention? or limit it to one file? probably don't allow them to create folders.... then mediaWiki import asset system can run against the normal api...; then we have to have the user issue a request to close up the ftp account and import the asset back in the web interface. Then we have to hope users are not confused when they try to connect to the old account on a new upload?

Perhaps we just provide a sandboxed FTP server + Web Serving and give away accounts and let users admin it? (rather than try and directly tie it into mediaWiki? ) Then scripts could import via upload api copy by url system? This could address the immediate need for the restorations project in getting some shared network space. They can then just link to the original files in the asset descriptions and limit commons to the 12 megapixel range resolution versions.(while providing links to the uncompressed TIFFs on the sand boxed FTP. So they can coordinate restorations and upload new 12 megapixel versions once an image has been restored.

What's the latest on this? The size limit on uploads should be made visible from the upload form.

Bryan.TongMinh wrote:

(In reply to comment #4)

What's the latest on this? The size limit on uploads should be made visible
from the upload form.

The size limits for regular uploads is visible on the upload form. Don't know if this is since 1.17 or 1.18 though. As for upload by FTP, no progress has been made.

What's the status of this bug, any news if it's *ever* going to be implemented?

It would really, really be useful for some of the Commons people uploading those big files and having to fill Bugzilla bugs and waste developers' time... :-)

mdale wrote:

I think three features are higher priority that would help address this need:

  • Deploy the chunk upload code for more robust http uploads. ( should happen soon with 1.19 )
  • Deploy support for copyByURL to upload from http urls.
  • Support files larger than 100megs. ( swift back end efforts )

If we have all of those 3 things in place and we could then think about adding additional support for other stuff ( like FTP )

Given chunked uploading and multi-selection in the upload dialog, there's much less need for FTP specifically.

The fundamental killer is enabling larger files; this shouldn't really depend specifically on swift, as far as I know -- when we upload larger files today they don't touch swift in any way; we just have to have someone do it manually from the command line to bypass the size limits which exist primarily due to memory limits on single POST uploads.

mdale wrote:

true. Once chunked is deployed you could remove the upload limit or start increasing it, as Brion mentions we have larger files manually inserted today.

I mention swift, because once we do remove the upload size limit this may result in hitting the size limitations of the current single node storage system sooner, so opts may want to ease into enabling larger file uploads as the back end storage becomes more scalable.

Four years later, chunked uploads are stable and allow uploading files up to 4 GB in size. I don't think we'll need this.

I'm happy with that. :)