Page MenuHomePhabricator

Google Books > Internet Archive > Commons upload cycle
Closed, ResolvedPublic

Description

Author: vladjohn2013

Description:
Google Books > Internet Archive > Commons upload cycle

Wikisources all around the world use heavily GB digitizations for transcription and proofreading. As GB provides just the PDF, the usual cycle is:

  1. go to Google Books and look for a book
  2. check if the book is already in IA
  3. if it's not, upload it there
  4. get the djvu from IA and upload it on Commons
  5. use it on Wikisource

For point 4, we have this awesome tool: http://tools.wmflabs.org/ia-upload/ What we miss right now is a tool for points 1–3, that would serve many other users outside the Wikimedia movement too. Eventually, we could think of a bot/script which would do all the work altogether, notifying the user when their help is needed (eg metadata polishing, Commons categories, etc.) Mentors: Aubrey is available for "design" mentorship, paired with a technical expert. We can maybe ask help from a IA expert.

URL:https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Google_Books_.3E_Internet_Archive_.3E_Commons_upload_cycle


Version: unspecified
Severity: enhancement

Details

Reference
bz57813

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:35 AM
bzimport set Reference to bz57813.

vladjohn2013 wrote:

This proposal has been listed at https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects and we are filing a report to gather community feedback and share updates.

8ohit.dua wrote:

(In reply to vladjohn2013 from comment #0)

Google Books > Internet Archive > Commons upload cycle

Wikisources all around the world use heavily GB digitizations for
transcription and proofreading. As GB provides just the PDF, the usual cycle
is:

go to Google Books and look for a book
check if the book is already in IA
if it's not, upload it there
get the djvu from IA
upload it on Commons
use it on Wikisource

For point 4, we have this awesome tool:
https://toolserver.org/~tpt/iaUploadBot/step1.php What we miss right now is
a tool for point 2.1, that would serve many other users outside the
Wikimedia movement too. Eventually, we could think of a bot/script which
would do all the work altogether, notifying the user when their help is
needed (eg metadata polishing, Commons categories, etc.) Mentors: Aubrey is
available for "design" mentorship, paired with a technical expert. We can
maybe ask help from a IA expert.

URL:https://www.mediawiki.org/wiki/Mentorship_programs/
Possible_projects#Google_Books_.3E_Internet_Archive_.3E_Commons_upload_cycle

Hi

This is to inform that I am working on Bug 57813 - Google Books > Internet Archive > Commons upload cycle, via GSOC-2014 project.
I'm ready with with the outline of google-books download script.

Rohit Dua
8ohit.dua
New Delhi,India

8ohit.dua wrote:

I have selected a mentor-ship projecthttps://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Google_Books_.3E_Internet_Archive_.3E_Commons_upload_cycle which corresponds to this Bug 57813.

I have proposed a rough proposal (after discussing the steps with mentors/community).

Link to proposal: https://www.mediawiki.org/wiki/User:8ohit.dua/GSoC_proposal_2014

It would be great if we could get the community's valuable suggestions/ feedback for the proposal/project.

Thanks a lot.
Rohit Dua
(8ohit.dua)
Delhi, India

Rohit, Paolo (is he following this report?), your proposals are still missing in Google Melange. Please submit them there as a draft linking to your wiki pages. In any case, we will evaluate your proposals in mediawiki.org. Thank you!

FYI, I accept to mentor Rohit Dua.

(In reply to Yann Forget from comment #5)

FYI, I accept to mentor Rohit Dua.

Tyhank you! Instructions: https://www.mediawiki.org/wiki/Mentorship_programs/Possible_mentors

Rohit's GSoC project was marked as PASSED by his mentors, but the required final blog post is still missing and, in general, it is unclear what we have and what is missing to resolve this report as FIXED.

Please wrap up your project properly.

The tool has worked for years, but people tell me it's now broken: https://tools.wmflabs.org/bub/

Lots of unresolved issues at https://github.com/rohit-dua/BUB/issues with no commits. Does the project have an active maintainer? If not, I am happy to help get it working again.
https://github.com/rohit-dua/BUB/issues/37 suggests that someone needs to clear the queue: https://tools.wmflabs.org/bub/queue/
Is user:8ohit.dua using Phabricator?

Samwilson subscribed.

Is this issue resolved? I mean, I know there are outstanding issues with Bub, but they're tracked on Github. It seems possible that this task here is complete. :-)

(I've just created T154413: Upload/import wizard for Wikisource works which is related to this subject of more easily getting works into Commons and IA.)

I updated the task description above, and it seems that this task just comes down to "a tool to upload from Google Books to the Internet Archive". Does that sound right? Is this something we want to pursue?

I updated the task description above, and it seems that this task just comes down to "a tool to upload from Google Books to the Internet Archive". Does that sound right? Is this something we want to pursue?

What tool are you talking about? For BUB, that description is right. It's not for IA upload ;-)

What tool are you talking about?

The unknown tool that would solve this ticket. From the description above: "What we miss right now is a tool for points 1–3, that would serve many other users outside the Wikimedia movement too."

I think you're right though, and the BUB is the answer. I don't think we're going to aim for a single tool that does all of this.

Closing this, because I don't think there's anything more to do (for this specific phab task; there's still bugs to fix of course).