Page MenuHomePhabricator

[DO NOT USE] Pronunciation recording tool (tracking) [superseded by #PronunciationRecording]
Closed, InvalidPublic

Description

Several people (e.g. T33221, though the original report asks for computer text-to-speech, and http://comments.gmane.org/gmane.org.wikimedia.wiktionary/1265) have requested a tool to simplify the workflow of recording the pronunciation of a word.

The basic idea is to provide a wizard flow for picking a word (which may be the page you're on), recording it, choosing a free license, then uploading it to Wikimedia Commons with the appropriate metadata.


Version: master
Severity: enhancement
URL: http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265
See Also:
T33221: Audio pronunciation: Automatic text-to-speech to convert IPA to sound
T22252: Support for WAV and AIFF by converting files to FLAC automatically.
T55074: Add component for PronunciationRecording MediaWiki extension

Details

Reference
bz46610

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:22 AM
bzimport set Reference to bz46610.

wmf.amgine3691 wrote:

Note: This will also need to take into account the L2 sections, which are used to indicate the language. For example, https://en.wiktionary.org/wiki/chance#English https://en.wiktionary.org/wiki/chance#French etc.

Well, the tool could simply be added by a parser function called with both word and language (and possibly something else for homographs), this seems the least of the problems. :)

Just like the "Edit" link appears next to each section, the [Record button] could be placed next to any word missing voice recorded pronunciation, right?

Question: what is the page describing the current workflow? It is not evident to see how a user can contribute a pronunciation now.

Also: what happens if an audio file already exists but I think I can contribute a better one e.g. because of audio quality or some other defect?

Needless to say, this is a feature that is calling for a mobile UI sooner or later... Think of all those languages spoken in countries with a high penetration of mobile devices.

The current procedure on English Wiktionary is https://en.wiktionary.org/wiki/Help:Audio_pronunciations . Other projects probably have somewhat different procedures.

I suggest the tool initially only show on pages without existing recordings. It would be good to solve that problem eventually, but it is more likely to require discussion (should we keep both because they have slightly different accents?, etc.)

Also, I skipped the final part of the flow, adding the template (e.g. Template:audio on English Wiktionary) to the Wiktionary page.

wmf.amgine3691 wrote:

I suggest the tool initially only show on pages without existing recordings.

Not sure I would agree. The many dialects of English, for example, can be dramatically different. 'Schedule' springs to mind[1].

Although I'd love to get into a discussion about collecting metadata with recordings (geoip location of author, self-identity of dialectic origins, etc.) I think at this point we should focus on the basic mechanics: user button to record a brief audio snippet which is auto-uploaded to commons with authoring/license templates, and the local wiktionary page updated.

[1] https://en.wiktionary.org/wiki/schedule#Pronunciation

I agree. I wasn't proposing complicated metadata, just the basics (license template of course, Category:$LANGUAGE pronunciation, maybe a hidden category to mark recordings from the tool).

The reason I suggested keeping it simple by showing on pages without recordings is to avoid collisions. E.g. what happens if I live in the U.S. but have a different pronunciation of https://commons.wikimedia.org/wiki/File:En-us-associate.ogg ? But it looks like they resolve collisions by just adding a number, https://commons.wikimedia.org/wiki/File:En-us-associate-2.ogg, which is easy enough for a tool to do.

rahul14m93 wrote:

I have prepared a rough project proposal Please do give me your feedback and
suggestions so that i can improve on it
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc

Hi Rahul,

Through the different discussions so far we have seen that this project might be more tricky than what it looked like initially. And the main problem is still that no mentor is stepping in.

I recommend you to wait a couple of days more and the make a decision: bet blindly on this proposal with the hope that things will be solved in the next weeks or put it aside and bet on some other idea for GSoC.

You can still work on a voice recording tool as a pet project, but from my point of view it still lacks some essential factors to consider it for this GSOC: no mentor and no enthusiastic response from Wiktionary community.

On the Wiktionary front, has anyone reached out to them on a prominent place on-wiki?

As far as the technology, there seems to be at least one workable approach using the HTML5 Media Capture API (http://www.w3.org/2009/dap/wiki/ImplementationStatus#HTML_Media_Capture and https://news.ycombinator.com/item?id=4001140). I haven't tested this myself yet; the browsers are mostly mobile. See http://mobilehtml5.org/ts/?id=23 for the syntax and a simple page for testing.

As getUserMedia develops that could become an alternate approach.

So it might be workable if a mentor becomes available.

rahul14m93 wrote:

I will surely have a look at them.Micheal Dale is ready to mentor :)

mdale wrote:

Confirmed. As mentioned on IRC, would be nice to also support the record this article, or paragraph out loud, for the spoken articles project.

(In reply to comment #9)

On the Wiktionary front, has anyone reached out to them on a prominent place
on-wiki?

The central discussion place for Wiktionary is Wiktionary-l. It's not Wikipedia or Wikisource. Anyway I'll send more notifications to all languages.

Sometimes you help by doing something and sometimes you help by NOT doing something. :)

I'm happy to have helped indirectly finding a mentor for this project. The GSoC process continues. Thank you Rahul, thank you Michael and thank you to the rest of people helping to move this feature forward.

Adding dependency to WAV support to document the discussion in the past hours at wikitech-l. Feel free changing this if the plan changes.

rsurratt wrote:

(In reply to comment #13)

I'm happy to have helped indirectly finding a mentor for this project. The
GSoC
process continues. Thank you Rahul, thank you Michael and thank you to the
rest
of people helping to move this feature forward.

Hello all, I have developed a mediawiki extension (not released yet) that plays an ogg file on hover for any word (so far just English... abt 2500 words) that it knows about in any page. Also it inserts a play button on hover to keep playing and highlighting all words it knows. In addition a very short definition is displayed also with words hoverable and playable. Words are from simple wiktionary.

I also have made a javascript sound recorder that uses wami (https://code.google.com/p/wami-recorder/) to record and soundmanager (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe both use HTML5 if it is available with a fallback to Flash if it is not available. I also use ffmpeg and sox to do some server side processing (wav->ogg and trimming silence at start and end of word). This is part of another project I have been working on to help someone learn a new language. I could make this available to anyone who would want it or I could maybe take a shot at implementing Rahul's suggestions. I am totally new to mediawiki so any guidance would be helpful.

(In reply to comment #15)

I also have made a javascript sound recorder that uses wami
(https://code.google.com/p/wami-recorder/) to record and soundmanager
(http://www.schillmania.com/projects/soundmanager2/) for playback. I believe
both use HTML5 if it is available with a fallback to Flash if it is not
available.

As far as I can tell from the source code (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires Flash.

rsurratt wrote:

As far as I can tell from the source code
(https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
Flash.

I no idea, is flash a no-no for mediawiki?

(In reply to comment #17)

As far as I can tell from the source code
(https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
Flash.

I no idea, is flash a no-no for mediawiki?

Flash is very controversial for political reasons. It may be acceptable as a fallback mechanism on old browsers that don't support the latest and greatest html features, however its a no-no to require flash (might be considered acceptable if the particular flash thing works with gnash). Java is much more considered ok, but still not exactly loved either.

mdale wrote:

(In reply to comment #15)

an ogg file on hover for any word (so far just English... abt 2500 words)
that it knows about in any page.

sounds like a fun, would work best as a gadget, and query the real wikitionary.

I also have made a javascript sound recorder that uses wami
(https://code.google.com/p/wami-recorder/) to record and soundmanager
(http://www.schillmania.com/projects/soundmanager2/) for playback.

I like flash fallbacks, I understand we should not require flash, but as a fallback its great. Its much less of a patented, fragmented, and security failure than in-browser java.

I also use ffmpeg and sox to do some server side processing
(wav->ogg and trimming silence at start and end of word). This is part of
another project I have been working on to help someone learn a new language.

We should to try to trim client side if possible.

I could make this available to anyone who would want it or I could maybe take a
shot at implementing Rahul's suggestions. I am totally new to mediawiki so
any
guidance would be helpful.

Cool I am sure Rahul will touch base with you.

rahul14m93 wrote:

(In reply to comment #15)

I also use ffmpeg and sox to do some server side processing
(wav->ogg and trimming silence at start and end of word). This is part of
another project I have been working on to help someone learn a new language.

I am interested. Could you come on the irc where we can have good discussion regarding this.

rsurratt wrote:

(In reply to comment #19)

(In reply to comment #15)

an ogg file on hover for any word (so far just English... abt 2500 words)
that it knows about in any page.

sounds like a fun, would work best as a gadget,

How does one go about getting a user script approved as a gadget?

and query the real wikitionary.

the extension opens up a new tab on the real wiktionary for the definition of any word on mouse click but on hover it just uses the definition (if it exists) in the small dictionary I have made,(a json file about 60k compressed).

The sound files it uses are mostly from what is used in the English wiktionary but I am in the process of recording new ones that "flow" better when spoken one after another in a sentence.

I also use ffmpeg and sox to do some server side processing
(wav->ogg and trimming silence at start and end of word). This is part of
another project I have been working on to help someone learn a new language.

We should to try to trim client side if possible.

Yes that would be a great solution to the problem of getting ffmpeg and sox executables running on a variety of servers but I have no idea on how to do that. Perhaps Java? If anyone knows how to do that I am all ears.

rsurratt wrote:

I am interested. Could you come on the irc where we can have good discussion
regarding this.

Hi Rahul, sure, how do I go about doing that?

(In reply to comment #21)

(In reply to comment #19)

(In reply to comment #15)

an ogg file on hover for any word (so far just English... abt 2500 words)
that it knows about in any page.

sounds like a fun, would work best as a gadget,

How does one go about getting a user script approved as a gadget?

Each wiki approves them separately. See https://en.wiktionary.org/wiki/Wiktionary:Gadgets, though I'm not sure where you ask for it to be approved. You can ask at https://en.wiktionary.org/wiki/Wiktionary:Grease_pit .

Please use a separate bug report for the mini-dictionary (ogg, short definition) on-hover idea. It's interesting, but separate from this.

rsurratt wrote:

(In reply to comment #18)

(In reply to comment #17)

Flash is very controversial for political reasons. It may be acceptable as a
fallback mechanism on old browsers that don't support the latest and greatest
html features, however its a no-no to require flash (might be considered
acceptable if the particular flash thing works with gnash). Java is much more
considered ok, but still not exactly loved either.

I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client side , nothing on server. from them (at https://code.google.com/p/wami-recorder/)

"The WAMI recorder uses a light-weight Flash app to ship audio from client to server via a standard HTTP POST. Apart from the security settings to allow microphone access, the entire interface can be constructed in HTML and Javascript."

sooo is this a deal breaker? Sounds like it. If so, sorry for wasting your time, and perhaps it would be better to wait for a GSoC solution that uses the latest and greatest technology. Unless someone has a suggestion.

rahul14m93 wrote:

(In reply to comment #25)

I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client

side , nothing on server. from them (at
https://code.google.com/p/wami-recorder/)

Its okay Ron, you took interest and wanted to help, that itself is a positive sign!

mdale wrote:

I don't think its a deal breaker, flash makes a great fallback. If we can use the webRTC solution for browsers that support it, then using wami as a fallback is fine.

The restriction against flash for wikimedia projects is based on the idea, that you don't exclusively deliver an experience for proprietary platforms. Using flash or java as a fallback is fine, as long as an open standard / free browser solution is also equally well supported.

wmf.amgine3691 wrote:

Adobe stopped producing Flash for Linux last year or the year before.

rahul14m93 wrote:

v 11.2 is the last version supported for linux

mdale wrote:

Flash support for linux is not relevant. The point is you can get the same experience ( with webRTC ) with free software. The idea is to give an equal experience on flash vs free software platforms.

rsurratt wrote:

(In reply to comment #31)

Flash support for linux is not relevant. The point is you can get the same
experience ( with webRTC ) with free software. The idea is to give an equal
experience on flash vs free software platforms.

I have been able to get a sound recorder working with the HTML5 Web Audio API in Google's Chrome (Canary version). It is much nicer than the Flash version using WAMI I already had in that it allows things such as user controlled silence removal in the browser. I will next try to cram them both together so as to have the Flash fallback work as closely as possible to the HTML5 version.

I also want to be able to do the editing of pre-existing sounds as well as sounds input with a microphone.

rahul14m93 wrote:

(In reply to comment #32)

I have been able to get a sound recorder working with the HTML5 Web Audio API
in Google's Chrome (Canary version).

Please can you specify the version and did you enable the flag "Web Audio Input" via "chrome://flags

rsurratt wrote:

(In reply to comment #33)

(In reply to comment #32)

I have been able to get a sound recorder working with the HTML5 Web Audio API
in Google's Chrome (Canary version).

Please can you specify the version and did you enable the flag "Web Audio
Input" via "chrome://flags

the chrome is Version 28.0.1499.0 canary (https://www.google.com/intl/en/chrome/browser/canary.html) and there is no "Web Audio Input" flag in chrome://flags for that version of Chrome.

I'm removing bug 20252 as a dependency, and moving to see also. It's a nice-to-have, but it's not a blocker in my opinion. These are going to be short files (< 5 seconds, most likely).

Bug 20252 could also be done later, and the files transcoded internally.

rahul14m93 wrote:

I have undertaken this as my GSoC project, Michael Dale and Matthew Flaschen will be my mentors during the course. The primary benefit is laying the groundwork for contributor-created audio to MediaWiki sites in any current browser. I have a done a little bit of research on the method to upload the pronunciations so far and based on that the use of the Upload:API is essential, other API's like the Edit:API will also come handy. The first step that I plan on doing is to add .wav support to the THM extension. Link to my proposal http://www.mediawiki.org/wiki/User:Rahul21/Gsoc

Change 75770 had a related patch set uploaded by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770

Change 75770 abandoned by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770

6f9c18509b858d89e50d145a685eb5308dcdff7e implemented a special page, Special:PronunciationRecording. That includes support for recording a pronunciation and playing it back. The next main step is allowing uploading to the same wiki where the special page is (bug 53127).

There is also now a Bugzilla component for this extension.

GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?

The overall project has not been completed, so Rahul will have to keep working until the final pencil down (September 23, as you noted).

The following parts are complete (parts that still need final review are noted). Rahul can add anything I'm missing:

  • Uploading to the stash is complete. Fitting this into the overall upload flow (initially publishing from the stash to the main File page) is in progress and under review.
  • Extension and special page setup
  • WAV support for TimedMediaHandler
  • Some refactoring to UploadWizard (which PronunciationRecorder is using as a library). Mostly merged, a little more in progress
  • Upload permissions check (not merged)

This is not fully complete. However, it's complete enough that it could be useful. You can try it at http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording .

The main aspect that is not ready is integrating into Wiktionary pages. It also can not currently upload to Commons (it uploads to the current wiki) from another wiki.

However, it does generate the Information template and categories needed for Commons.

Actually, use http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording?debug=true due to bug 54351.

Also, note that you need to use a modern browser with sufficient Web Audio support. Currently, that probably means Chrome, but Firefox is working on the same standards, so it will eventually work in Firefox and other browsers.

If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor.

We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks.

More information in the wiki page. If you have questions you can ask there or you can contact me directly.

Rahul: Are you (still) working on this? If not, please reset the assignee to default and the status to NEW. Thanks!

I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component.

However, if we want to, we can use it to mark when the initial Wiktionary functionality (see https://www.mediawiki.org/wiki/User:Rahul21/Gsoc2013/Proposal#Simple_workflow) is done. Basically, a Minimum Viable Product.

I didn't see any progress here, therefore I re-launched
https://meta.wikimedia.org/wiki/Grants:IEG/Finish_Pronunciation_Recording

You may see this as a competing product product or a chance to get some useful feedback. Cheers!

(In reply to Matthew Flaschen from comment #42)

This is not fully complete. However, it's complete enough that it could be
useful. You can try it at
http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:
PronunciationRecording .

It's moved to http://pronunciationrecording.wmflabs.org/wiki/Special:PronunciationRecording?debug=true . The new server is open for normal account creations.

If anyone would like special access (e.g. an admin account to test gadgets), let me know.

I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component.

What is exactly tracked in this task, in contrast to the PronunciationRecording project itself? All "Blocked by" tasks are also part of the project...

What is exactly tracked in this task, in contrast to the PronunciationRecording project itself? All "Blocked by" tasks are also part of the project.

Mattflaschen-WMF claimed this task.

I suggested last February that we could either close it now, or use it to mark the MVP. I'll go ahead and do the former.

Danny_B renamed this task from Pronunciation recording tool (tracking) to [DO NOT USE] Pronunciation recording tool (tracking) [superseded by #PronunciationRecording].Aug 5 2016, 11:32 AM
Danny_B removed Mattflaschen-WMF as the assignee of this task.
Danny_B lowered the priority of this task from Medium to Lowest.