I have a huge list of video files from a webcam that have that look like this:
video_123 video_456 video_789 ...
Where each number (123, 456, and 789) represents the start time of the file in seconds since epoch. The files are created based on file size and are not always the same duration. There may also be gaps in the files (eg camera goes down for an hour). It is a custom file format that I cannot change.
I have a tool that can extract out portions of the video given a time range and a set of files. However, it will run MUCH faster if I only give the tool the files that have frames within the given range. It's very costly to determine the duration of each file. Instead, I'd like to use the start timestamp to rule out most files. For example, if I wanted video for 500-600, I know video_123
will not be needed because video_456
is larger. Also, video_789
is larger than 600 so it will not be needed either.
I could do an ls
and iterate through each file, converting the timestamp to an int and comparing until we hit a file bigger than the desired range. I have a LOT of files and this is slow. Is there a faster method? I was thinking of having some sort of binary-tree that could get log2n search time and already have the timestamps parsed out. I am doing most of this work in bash and would prefer to use simple, common tools like grep, awk, etc. However, I will consider Perl or some other large scripting language if there is a compelling reason.