tags:

views:

1068

answers:

6

Hi everyone,

I've been stuck on a little unix command line problem.

I have a website folder (4gb) I need to grab a copy of, but just the .php, .html, .js and .css files (which is only a couple hundred kb).

I'm thinking ideally, there is a way to zip or tar a whole folder but only grabbing certain file extensions, while retaining subfolder structures. Is this possible and if so, how?

I did try doing a whole zip, then going through and excluding certain files but it seemed a bit excessive.

I'm kinda new to unix.

Any ideas would be greatly appreciated.

+1  A: 

You could write a shell script to copy files based on a pattern/expression into a new folder, zip the contents and then delete the folder. Now, as for the actual syntax of it, ill leave that to you :D.

barfoon
Copy 4Gb of data? I think this should be used as a last report.
MitMaro
You wouldnt copying the entire directory - only the files that match the pattern. That way when you create the zip you can just say zip all, then delete.
barfoon
@MitMaro the OP says it's just a few of hundres of KiB... anyway, an improved version could be done just links links instead of copying (if symbolic links, then use -h with tar)... and that way you could reuse that view of the folder for further backups :-)
fortran
@fortran: Entirely missed that.my apologies barfoon this is not that bad of an idea.
MitMaro
+6  A: 

Use find and grep to generate the file list, then pipe that into zip

e.g.

find . | egrep "\.(html|css|js|php)" | zip -@ test.zip

(-@ tells zip to read a file list from stdin)

Nick
If you have a large number of non-matching files, it would be slightly more efficient to do something like `find . -iname \*.html -o -iname \*.css -o -iname \*.js -o -iname \*.php` instead of `find . | grep ...`.
Adam Rosenfield
+2  A: 

you may want to use find(GNU) to find all your php,html etc files.then tar them up

find /path -type f \( -iname "*.php" -o -iname "*.css" -o -iname "*.js" -o -iname "*.ext" \) -exec tar -r --file=test.tar "{}" +;

after that you can zip it up

ghostdog74
the problem with this approach is that it will exec tar for each file
kdgregory
yes it does. a better approach is to use zip -R as one of you have posted.
ghostdog74
+5  A: 

Switch into the website folder, then run

zip -R foo '*.php' '*.html' '*.js' '*.css'

You can also run this from outside the website folder:

zip -r foo website_folder -i '*.php' '*.html' '*.js' '*.css'
Curtis Tasker
This won't find matching files in subdirectories.
Ted Percival
per the zip manpage, you have to quote the arguments with single quotes ... otherwise, I suspect that the shell will glob them before passing to zip
kdgregory
Actually, it will.-R flag
Curtis Tasker
@Ted: In the test I performed it did recurse subdirectories.
MitMaro
I've run man zip on a few different flavors of unix and they're all subtly different. You may or may not need single quotes around each of the patterns, as kdgregory states.
Curtis Tasker
Definitely put single quotes around the patterns to match, otherwise the shell will expand them before executing `zip` and you'll just end up with matching files in the current directory.
Ted Percival
This worked a treat with the quotation marks. I didn't try it without. Thanks heaps! Seems like the easiest of all the solutions.
cosmicbdog
+3  A: 

This is how I managed to do it, but I also like ghostdog74's version.

tar -czvf archive.tgz `find  test/ | egrep ".*\.html|.*\.php"`

You can add extra extensions by adding them to the regex.

MitMaro
you may run into issues with the size of the argument list -- see xargs
kdgregory
+1  A: 

I liked Nick's answer, but, since this is a programming site, why not use ant to do this. :)

Then you can put in a parameter so that different types of files can be zipped up.

http://ant.apache.org/manual/CoreTasks/zip.html

James Black
thanks for this. I will investigate it for future.
cosmicbdog
I found ant useful, and the documentation is pretty easy to follow.
James Black