Page MenuHomePhabricator

Multiline not well detected
Closed, DuplicatePublic

Description

It seems that multiline srt elements are concatenated instead of presented on two lines.

Example in http://commons.wikimedia.org/wiki/File:Elephants%20Dream.ogg at 00:05:44

Presented is:
Why? - Now!

Instead of:
Why?

  • Now!

Version: unspecified
Severity: minor

Details

Reference
bz29126

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 11:29 PM
bzimport set Reference to bz29126.

This is cause by the usage of http://commons.wikimedia.org/w/api.php?action=parse&page=TimedText%3AElephants_Dream.ogg.en.srt&smaxage=300&maxage=300&format=json

which of course squashes single line breaks into oblivion. Alternative output parsing is required for this. Perhaps just get the raw wikitext and interpret with the normal parseSrt() function instead ?

mdale wrote:

We use the HTML output so that we can support wikitext conversion of things like [[links]] in the subtitle text. I would recommend putting a <br> on the line where you want the line break. Normal SRT parsers should strip unknown html tags and we can retain the flexibility of having our timed text read form html.

"I would recommend putting a <br> on the line where you want the line break."

Then it's not SRT. SRT requires no such things.

To further clarify that last point....

There are already 10 thousand subtitle formats out there, and I'm very weary of defining our 'own'. I'd rather see a special 'parser' that converts into a new 'internal' format, then that we make changes in the storage format.

Perhaps we can create a new API module that outputs this, instead of using 'parse'. This new api can have all the time information, language, direction etc in an easily readable format for the Javascript, doing away with the JS side regex parsing, and can have an array of 'lines' that have been MediaWiki parsed individually. The JS client can then concat these lines with <br>

This seems like much more future-proof system.

mdale wrote:

Yes an api module would be best. Maybe a new feature request bug?

With an api module we could output either "clean srt" for srt clients, or "html like" srt for our html based player with any mediaWiki based markup transformations with per subtitle segment json with packaged html bits.

I have some TODO notes scattered in the code base to do exactly this ;)

Created attachment 11481
Screenshot: Still valid after TMH deployment

Attached:

29126.png (530×678 px, 112 KB)