ansaurus

Question

How to quickly infer start/end time of files that only show start time?

Answer 1

+1 A:

If you do several searchs with the files, you can pre-process the files, in the sense of loading them into a bash array (note, bash, not sh), order them, and then do a binary search. Assume for a second that the name of the file is just the time tag, this will ease the examples (you can always do ${variable/video_/} to remove the prefix.)

First, you can use an array to load all the files sorted:

files=(`echo * | sort -n`)

Then implement the binary search (just a sketch, searching for the pos $min-$max):

nfiles=${#files[*]}
nfiles2=`expr $nfiles / 2`
if test ${files[$nfiles2]} -gt $max
then
    nfiles2=`expr $nfiles2 - $nfiles2/2`
else
    #check $min, etc.
fi

And so on. Searching several times once you have the files ordered in the array would make faster lookups.

Diego Sevilla 2010-10-18 23:27:26

Answer 2

A:

Because of a quirk of UNIX design, there is no way to search for the name of a file in a directory other than stepping through the filenames one by one. So if you keep all your files in one directory, you're not going to get much faster than using ls.

That said, if you're willing to move your files around, you could turn your flat directory into a tree by splitting on the most significant digits. Instead of:

video_12301234
video_12356789
video_12401234
video_13579123

You could have:

12/video_12301234
12/video_12356789
12/video_12401234
13/video_13579123

or even:

12/30/video_12301234
12/35/video_12356789
12/40/video_12401234
13/57/video_13579123

For best results with this method, you'll want to have your files named with leading zeros so the numbers are all the same length.

Jander 2010-10-21 06:16:08

Interesting idea. Unfortunately, the code used to produce the file does not have an option for such output. Is there an easy way to convert from flat to the idea above after the file is written? Also, do you know if `ls -U|sort` is faster than `ls`?

User1 2010-10-21 18:46:10

`ls -U | sort` vs. `ls` probably won't make a big difference. The big bottleneck here is reading from the disk. You could have the file producing code add the file to a holding directory, and use another script running in the background to look for new files and move them to the proper place. Moving files while they're being written is safe as long as your writer doesn't try to close and re-open the file.

Jander 2010-10-22 00:33:13

ansaurus

tags:

views:

answers:

How to quickly infer start/end time of files that only show start time?

related questions