tags:

views:

73

answers:

5

I have am trying to use sed to get some info that is encoded within the path of a file which is passed as a parameter to my script (Bourne sh, if it matters). From this example path, I'd like the result to be 8

PATH=/foo/bar/baz/1-1.8/sing/song

I first got the regex close by using sed as grep:

echo $PATH | sed -n -e  "/^.*\/1-1\.\([0-9][0-9]*\).*/p"

This properly recognized the string, so I edited it to make a substitution out of it:

echo $PATH | sed -n -e "s/^.*\/1-1\.\([0-9][0-9]*\).*/\1/"

But this doesn't produce any output. I know I'm just not seeing something simple, but would really appreciate any ideas about what I'm doing wrong or about other ways to debug sed regular expressions.

(edit)

In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.

It is also possible to have an input string that the regular expression should not match and should product no output.

+1  A: 

It looks like you're trying to get the 8 from the 1-1.8 (where 8 is any sequence of numerics), yes? If so, I would just use:

echo /foo/bar/baz/1-1.8/sing/song | sed -e  "s/.*\/1-1\.//" -e "s/[^0-9].*//"

No doubt you could get it working with one sed "instruction" (-e) but sometimes it's easier just to break it down.

The first strips out everything from the start up to and including 1-1., the second strips from the first non-numeric after that to the end.

$ echo /foo/bar/baz/1-1.8/sing/song | sed -e  "s/.*\/1-1\.//" -e "s/[^0-9].*//"
8
$ echo /foo/bar/baz/1-1.752/sing/song | sed -e  "s/.*\/1-1\.//" -e "s/[^0-9].*//"
752

And, as an aside, this is actually how I debug sed regular expressions. I put simple ones in independent instructions (or independent part of a pipeline for other filtering commands) so I can see what each does.


Following your edit, this also works:

$ echo /foo/bar/baz/1-1.962/sing/song | sed -e  "s/.*\/1-1\.\([0-9][0-9]*\).*/\1/"
962

As to your comment:

In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.

The two-part sed command I gave you should work with numerics anywhere in the string (as long as there's no 1-1. after the one you're interested in). That's because it actually deletes up to the specific 1-1. string and thereafter from the first non-numeric). If you have some examples that don't work as expected, toss them into the question as an update and I'll adjust the answer.

paxdiablo
Thanks for your suggestion. I hadn't thought about splitting this up like that. You're answer prompted me to edit the question and clarify some things. On some of my test data this doesn't exactly match what I wanted.
nategoose
"(as long as there's no 1-1. after the one you're interested in)"There can be.
nategoose
Then you need to find another pattern. It would be worthwhile asking a different question, giving more examples of what you want to match.
paxdiablo
+2  A: 

The -n option to sed supresses normal output, and since your second line doesn't have a p command, nothing is output. Get rid of the -n or stick a p back on the end

Chris Dodd
Thanks, though I could have sworn that I tried it with the /p earlier and it gave me an error.I needed the -n to keep it from printing lines which didn't match at all.
nategoose
+1  A: 

You can shorten you command by using + (one or more) instead of * (zero or more):

sed -n -e "s/^.*\/1-1\.\([0-9]\+\).*/\1/"
Dennis Williamson
Thanks. I wrote it that way originally, but switched to the way it is in the question when I had trouble getting it to work. If sed were to give me an error message on the version of the pattern in the question I think it would have been easier to make out what the error was referring to. You did remind me to switch it back to + though, now that it's working. Thanks!
nategoose
+1  A: 

don't use PATH as your variable. It clashes with PATH environment variable

echo $path|sed -e's/.*1-1\.//;s/\/.*//'
That was for demonstration purposes only, but good catch. I hadn't noticed that I'd done that in the example.
nategoose
A: 

You needn't divide your patterns with / (s/a/b/g), but may choose every character, so if you're dealing with paths, # is more useful than /:

echo /foo/1-1.962/sing | sed -e  "s#.*/1-1\.\([0-9]\+\).*#\1#"
user unknown