tags:

views:

196

answers:

3

I'm currently working on a project in ruby, and I hit a wall on how I should proceed. In the project I'm using Dir.glob to search a directory and all of its subdirectories for certain file types and placing them into an arrays. The type of files I'm working with all have the same file name and are differentiated by their extensions. For example,

txt_files = Dir.glob("**/*.txt")
doc_files = Dir.glob("**/*.doc")
rtf_files = Dir.glob("**/*.rtf")

Would return something similar to,

FILECON.txt ASSORTED.txt FIRST.txt

FILECON.doc ASSORTED.doc FIRST.doc

FILECON.rtf ASSORTED.rtf FIRST.rtf

So, the question I have is how I could break down these arrays efficiently (dealing with thousands of files) and placing all files with the same filename into an array. The new array would look like,

FILECON.txt FILECON.doc FILECON.rtf

ASSORTED.txt ASSORTED.doc ASSORTED.rtf

etc. etc.

I'm not even sure if glob would be the correct way to do this (all the files with the same file name are in the same folders). Any help would be greatly appreciated!

A: 

Not sure if this is exactly what you need, but you can try to

# first get all files
all_files = Dir.glob('**/*')
# then you can group them by name
by_name = all_files.group_by{|f| m = f.match(/([^\/]+)\.[^.\/]+$/); m[1] if m}
# and by extension
by_ext = all_files.group_by{|f| m = f.match(/[^\/]+\.([^.\/]+)$/); m[1] if m}

BTW, I don't see any relation of the question with sorting.

Mladen Jablanović
Sorry, perhaps I didn't use the correct terminology for what I would like to do (still new Ruby). What I wanted to do is take the three glob arrays that I have and make another three arrays. The new arrays would contain the filenames of the ones with the same name but different extensions. The reason I didn't use the global search (Dir.glob('**/*')) is that there are other files I don't want to categorize intermixed with the others.
Ruby Beginner
So you could either 1) perform the same logic as for `by_name` above, on each of the three arrays you already have, or 2) instead of getting all the files by `Dir.glob('**/*')`, take just the ones with extensions you need: `Dir.glob("**/*.{txt,doc,rtf}")`, as Glenn suggests.
Mladen Jablanović
+1  A: 

Get all your files into a single array with Dir.glob("**/*.{txt,doc,rtf}")

Don't forget that all the filenames have the directory too, so if you want to sort by the basename, then

files = Dir.glob("**/*.{txt,doc,rtf}").sort_by {|f| File.basename f}
glenn jackman
Thank you, this is exactly what I was trying to accomplish!
Ruby Beginner
A: 

If you are sure each basename has a .txt and a .doc and a .rtf you could zip them:

txt_files = %w(FILECON.txt ASSORTED.txt FIRST.txt)
doc_files = %w(FILECON.doc ASSORTED.doc FIRST.doc)
rtf_files = %w(FILECON.rtf ASSORTED.rtf FIRST.rtf)

files_by_basename = txt_files.zip(doc_files, rtf_files)
p files_by_basename

#=> [["FILECON.txt", "FILECON.doc", "FILECON.rtf"], ["ASSORTED.txt", "ASSORTED.doc", "ASSORTED.rtf"], ["FIRST.txt", "FIRST.doc", "FIRST.rtf"]]
steenslag