tags:

views:

68

answers:

4

Here's a simple problem that's been bugging me for some time. I often find I have a number of input files in some directory, and I want to construct output file names by replacing beginning and ending portions. For example, given this:

source/foo.c
source/bar.c
source/foo_bar.c

I often end up writing BASH expressions like:

for f in source/*.c; do
  a="obj/${f##*/}"
  b="${a%.*}.obj"
  process "$f" "$b"
done

to generate the commands

process "source/foo.c"    "obj/foo.obj"
process "source/bar.c     "obj/bar.obj"
process "source/foo_bar.c "obj/foo_bar.obj"

The above works, but its a lot wordier than I like, and I would prefer to avoid the temporary variables. Ideally there would be some command that could replace the beginning and ends of a string in one shot, so that I could just write something like:

for f in source/*.c; do process "$f" "obj/${f##*/%.*}.obj"; done

Of course, the above doesn't work. Does anyone know something that will? I'm just trying to save myself some typing here.

A: 

Not directly in bash. You can use sed, of course:

b="$(sed 's|^source/(.*).c$|obj/$1.obj|' <<< "$f")"
Ignacio Vazquez-Abrams
`$1` is Perl syntax. You need to use `\1` with `sed`. Also, you'll need to use the `-r` option or escape the parentheses.
Dennis Williamson
Urgh. Can you tell that I use too many regex tools?
Ignacio Vazquez-Abrams
+1  A: 

Not the prettiest thing in the world, but you can use a regular expression to group the content you want to pick out, and then refer to the BASH_REMATCH array:

if [[ $f =~ ^source/(.*).c$ ]] ; then f="obj/${BASH_REMATCH[1]}.o"; fi
Charles Duffy
You can, if you want, do it as a side effect without the conditional (since the OP isn't using a conditional either). `[[ $f =~ ^source/(.*).c$ ]]; process "$f" "obj/${BASH_REMATCH[1]}.o"`. You could assign a variable instead of using the result directly, of course, but either way at least one intermediate variable is eliminated. (Also, in your answer you're overwriting `f` which is needed as an argument to `process`.)
Dennis Williamson
Okay, I hadn't known about BASH_REMATCH, so this is the best answer I've seen so far, even if it is pretty ugly.
swestrup
@Dennis - dropping the conditional means that you have `obj/.o` if anything doesn't match the regex. If they all go to `obj/.o`, you may not know which input element failed to match when looking at `set -x` output... but if you have a conditional, you can tell what it is that didn't match from the filename passed around later, or put explicit error handling and/or logging in the else clause.
Charles Duffy
@swestrup - if it teaches you something new, how about an upvote?
Charles Duffy
That's true. The OP's version can fail under certain circumstances and should use sanity-check conditionals as well.
Dennis Williamson
+1  A: 

you shouldn't have to worry about your code being "wordier" or not. In fact, being a bit verbose is no harm, consider how much it will improve your(or someone else) understanding of the script. Besides, for performance, using bash's internal string manipulation is much faster than calling external commands. Lastly, you are not going to retype your commands every time you use it right? So why worry that its "wordier" since these commands are already in your script?

ghostdog74
Actually, the whole point is that I HAVE been typing all this from scratch each time. There are just enough differences in what I need to do that its never been worth it to try to write something general enough to be a useful utility.
swestrup
A: 

Why not simply using cd to remove the "source/" part?

This way we can avoid the temporary variables a and b:

for f in $(cd source; printf "%s\n" *.c); do
   echo process "source/${f}" "obj/${f%.*}.obj"
done
bashfu
Interesting idea, and one I hadn't thought of. I'm not sure how much typing it will ultimately save me though.
swestrup
You'd want to make sure you have `IFS=$'\n'` before using this one, or else you'd be word splitting on spaces within your filenames. (Even as it is, you'd be word-splitting on newlines; they *are* valid in filenames, so that's a bug)
Charles Duffy