ansaurus

Question

Parsing srt subtitles

Answer 1

A:

Here's some code I had lying around to parse SRT files:

from __future__ import division

import datetime

class Srt_entry(object):
    def __init__(self, lines):
        def parsetime(string):
            hours, minutes, seconds = string.split(u':')
            hours = int(hours)
            minutes = int(minutes)
            seconds = float(u'.'.join(seconds.split(u',')))
            return datetime.timedelta(0, seconds, 0, 0, minutes, hours)
        self.index = int(lines[0])
        start, arrow, end = lines[1].split()
        self.start = parsetime(start)
        if arrow != u"-->":
            raise ValueError
        self.end = parsetime(end)
        self.lines = lines[2:]
        if not self.lines[-1]:
            del self.lines[-1]
    def __unicode__(self):
        def delta_to_string(d):
            hours = (d.days * 24) \
                    + (d.seconds // (60 * 60))
            minutes = (d.seconds // 60) % 60
            seconds = d.seconds % 60 + d.microseconds / 1000000
            return u','.join((u"%02d:%02d:%06.3f"
                              % (hours, minutes, seconds)).split(u'.'))
        return (unicode(self.index) + u'\n'
                + delta_to_string(self.start)
                + ' --> '
                + delta_to_string(self.end) + u'\n'
                + u''.join(self.lines))


srt_file = open("foo.srt")
entries = []
entry = []
for line in srt_file:
    if options.decode:
        line = line.decode(options.decode)
    if line == u'\n':
        entries.append(Srt_entry(entry))
        entry = []
    else:
        entry.append(line)
srt_file.close()

Teddy 2010-04-11 11:15:28

Answer 2

+2 A:

Why not use pysrt?

gnibbler 2010-04-11 11:15:52

I dont see it well documented.

Vojtech R. 2010-04-11 12:26:34

Answer 3

+1 A:

The text is followed by an empty line, or the end of file. So you can use:

r' .... (?P<text>.*?)(\n\n|$)'

interjay 2010-04-11 11:15:59

+1 clean. And to account for whitespace, you could add... `r' .... (?P<text>.*?)\n\s*\n'`

Brendan Abel 2010-04-11 11:46:21

Answer 4

A:

splits = [s.strip() for s in re.split(r'\n\s*\n', text) if s.strip()]
regex = re.compile(r'''(?P<index>\d+).*?(?P<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?P<end>\d{2}:\d{2}:\d{2},\d{3})\s*.*?\s*(?P<text>.*)''', re.DOTALL)
for s in splits:
    r = regex.search(s)
    print r.groups()

Brendan Abel 2010-04-11 11:39:37

ansaurus

tags:

views:

answers:

Parsing srt subtitles

related questions