Page MenuHomePhabricator

Include uncompressed file sizes in data dump download pages
Open, LowPublicFeature

Description

We frequently get asked about the uncompressed sizes of dump files.
It should be possible for the dump generator to count the number of
bytes it's pumping out and then have the backup runner take this
information and include it with the download links.


Version: unspecified
Severity: enhancement
URL: http://download.wikimedia.org/

Details

Reference
bz6064

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:19 PM
bzimport set Reference to bz6064.

I presume you want this only for the files with page content? If we had to we could make this a separate step even. It would be included as a separate file with byte counts, rather than being shovelled into the XML output someplace.

I presume you want this only for the files with page content? If we had to we could make this a separate step even. It would be included as a separate file with byte counts, rather than being shovelled into the XML output someplace.

I think it would be most useful for the largest dumps, which I think corresponds pretty well to the ones with actual page content.

It would be nice to have it for the others as well, but definitely not as important.

Moving this to the Dumps Rewrite project; this feature would take too much time to add to the current dumps and still maintain reasonable run speed.

Aklapper subscribed.

@ArielGlenn: Unassigning task, to avoid cookie-licking. (Feel free to reclaim if you plan to work on this. Thanks.)

Fine by me. (Note that this task is likely infeasible with the current dumps architecture, as mentioned above.)

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM
Aklapper removed subscribers: bzimport, Tfinc.