views:

531

answers:

5

Hi,

I'm getting totally lost in shell programming, mainly because every site I use offers different tool to do pattern matching. So my question is what tool to use to do simple pattern matching in piped stream.

context: I have named.conf file, and i need all zones names in a simple file for further processing. So I do ~$ cat named.local | grep zone and get totally lost here. My output is ~hundred or so newlines in form 'zone "domain.tld" {' and I need text in double quotes.

Thanks for showing a way to do this.

J

+4  A: 

I think what you're looking for is sed... it's a stream editor which will let you do replacements on a line-by-line basis.

As you're explaining it, the command `cat named.local | grep zone' gives you an output a little like this:

zone "domain1.tld" {
zone "domain2.tld" {
zone "domain3.tld" {
zone "domain4.tld" {

I'm guessing you want the output to be something like this, since you said you need the text in double quotes:

"domain1.tld"
"domain2.tld"
"domain3.tld"
"domain4.tld"

So, in reality, from each line we just want the text between the double-quotes (including the double-quotes themselves.)

I'm not sure you're familiar with Regular Expressions, but they are an invaluable tool for any person writing shell scripts. For example, the regular expression /.o.e/ would match any line where there's a word with the 2nd letter was a lower-case o, and the 4th was e. This would match string containing words like "zone", "tone", or even "I am tone-deaf."

The trick there was to use the . (dot) character to mean "any letter". There's a couple of other special characters, such as * which means "repeat the previous character 0 or more times". Thus a regular expression like a* would match "a", "aaaaaaa", or an empty string: ""

So you can match the string inside the quotes using: /".*"/

There's another thing you would know about sed (and by the comments, you already do!) - it allows backtracking. Once you've told it how to recognize a word, you can have it use that word as part of the replacement. For example, let's say that you wanted to turn this list:

Billy "The Kid" Smith
Jimmy "The Fish" Stuart
Chuck "The Man" Norris

Into this list:

The Kid
The Fish
The Man

First, you'd look for the string inside the quotes. We already saw that, it was /".*"/.

Next, we want to use what's inside the quotes. We can group it using parens: /"(.*)"/

If we wanted to replace the text with the quotes with an underscore, we'd do a replace: s/"(.*)"/_/, and that would leave us with:

Billy _ Smith
Jimmy _ Stuart
Chuck _ Norris

But we have backtracking! That'll let us recall what was inside the parens, using the symbol \1. So if we do now: s/"(.*)"/\1/ we'll get:

Billy The Kid Smith
Jimmy The Fish Stuart
Chuck The Man Norris

Because the quotes weren't in the parens, they weren't part of the contents of \1!

To only leave the stuff inside the double-quotes, we need to match the entire line. To do that we have ^ (which means "beginning of line"), and $ (which means "end of line".)

So now if we use s/^.*"(.*)".*$/\1/, we'll get:

The Kid
The Fish
The Man

Why? Let's read the regular expression s/^.*"(.*)".*$/\1/ from left-to-right:

  • s/ - Start a substitution regular expression
  • ^ - Look for the beginning of the line. Start from there.
  • .* - Keep going, reading every character, until...
  • " - ... until you reach a double-quote.
  • ( - start a group a characters we might want to recall later when backtracking.
  • .* - Keep going, reading every character, until...
  • ')' - (pssst! close the group!)
  • " - ... until you reach a double-quote.
  • .* - Keep going, reading every character, until...
  • $ - The end of the line!

  • / - use what's after this to replace what you matched

  • '\1' - paste the contents of the first group (what was in the parens) matched.
  • / - end of regular expression

In plain English: "Read the entire line, copying aside the text between the double-quotes. Then replace the entire line with the content between the double qoutes."

You can even add double-quote around the replacing text s/^.*"(.*)".*$/"\1"/, so we'll get:

"The Kid"
"The Fish"
"The Man"

And that can be used by sed to replace the line with the content from within the quotes:

sed -e "s/^.*\"\(.*\)\".*$/\"\1\"/"

(This is just shell-escaped to deal with the double-quotes and slashes and stuff.)

So the whole command would be something like:

cat named.local | grep zone | sed -e "s/^.*\"\(.*\)\".*$/\"\1\"/"
scraimer
Yup, I'm using it right now, but I think there should be easier way to do it, because now I use sed -e 's/zone "//g' | sed -e 's/" {//g' to remove beginning and end of a file instead of just matching the middle.
jpou
Shaving off the beginning and the end is perfectly acceptable. This is no contest – if it works, it’s fine. If you want to do it by matching the text in quotes, take a look at ‘capturing groups’.
zoul
ugh. I spent too long typing that up, and it's still not done... it seems everyone beat me to it. But I'm glad you already figured it out :-)
scraimer
Mwaha! Finished, finally!
scraimer
A: 

You should have a look at awk.

marcog
A: 

1.

zoul@naima:etc$ cat named.conf | grep zone
zone "." IN {
zone "localhost" IN {
    file "localhost.zone";
zone "0.0.127.in-addr.arpa" IN {

2.

zoul@naima:etc$ cat named.conf | grep ^zone
zone "." IN {
zone "localhost" IN {
zone "0.0.127.in-addr.arpa" IN {

3.

zoul@naima:etc$ cat named.conf | grep ^zone | sed 's/.*"\([^"]*\)".*/\1/'
.
localhost
0.0.127.in-addr.arpa

The regexp is .*"\([^"]*\)".*, which matches:

  1. any number of any characters: .*
  2. a quote: "
  3. starts to remember for later: \(
  4. any characters except quote: [^"]*
  5. ends group to remember: \)
  6. closing quote: "
  7. and any number of characters: .*

When calling sed, the syntax is 's/what_to_match/what_to_replace_it_with/'. The single quotes are there to keep your regexp from being expanded by bash. When you “remember” something in the regexp using parens, you can recall it as \1, \2 etc. Fiddle with it for a while.

zoul
A: 

Well, nobody mentioned cut yet, so, to prove that there are many ways to do something with the shell:

% grep '^zone' /etc/bind/named.conf  | cut -d' ' -f2
"gennic.net"
"generic-nic.net"
"dyn.generic-nic.net"
"langtag.net"
bortzmeyer
A: 

As long as someone is pointing out sed/awk, I'm going to point out that grep is redundant.

sed -ne '/^zone/{s/.*"\([^"]*\)".*/\1/;p}' /etc/bind/named.conf

This gives you what you're looking for without the quotes (move the quotes inside the parenthesis to keep them). In awk, it's even simpler with the quotes:

awk '/^zone/{print $2}' /etc/bind/named.conf

I try to avoid pipelines as much as possible (but not more). Remember, Don't pipe cat. It's not needed. And, insomuch as awk and sed duplicating grep's work, don't pipe grep, either. At least, not into sed or awk.

Personally, I'd probably have used perl. But that's because I probably would have done the rest of whatever you're doing in perl, making it a minor detail (and being able to slurp the whole file in and regex against everything simultaneously, ignoring \n's would be a bonus for cases where I don't control /etc/bind, such as on a shared webhost). But, if I were to do it in shell, one of the above two would be the way I'd approach it.

Tanktalus